Getting Started Using CurveFit
Generalized Logistic Model
The model for the mean of the data for this example is one of the following: where , , and are unknown parameters.
Fixed Effects
We use the notation , and for the fixed effect corresponding to the parameters , , respectively. For this example, the link functions, that map from the fixed effects to the parameters, are The fixed effects are initialized to be their true values divided by three.
Random effects
For this example the random effects are constrained to be zero.
Covariates
This example data set has two covariates, the constant one and a social distance measure. While the social distance is in the data set, it is not used.
Simulated data
Problem Settings
The following settings are used to simulate the data and check that the solution is correct:
n_data = 21 # number simulated measurements to generate
beta_true = 20.0 # max death rate at 20 days
alpha_true = 2.0 / beta_true # alpha_true * beta_true = 2.0
p_true = 0.1 # maximum cumulative death fraction
rel_tol = 1e-5 # relative tolerance used to check optimal solution
Time Grid
A grid of n_data points in time, , where where the subscript denotes the true value of the corresponding parameter and is the number of data points. The minimum value for this grid is zero and its maximum is .
Measurement values
We simulate data, , with no noise at each of the time points. To be specific, for Note that when we do the fitting, we model each data point as having noise.
Example Source Code
# -------------------------------------------------------------------------
import pandas
import numpy
from curvefit.core.functions import expit, normal_loss
from curvefit.core.data import Data
from curvefit.core.parameter import Variable, Parameter, ParameterSet
from curvefit.models.core_model import CoreModel
from curvefit.solvers.solvers import ScipyOpt
# for this model number of parameters is same as number of fixed effects
num_params = 3
num_fe = 3
params_true = numpy.array([alpha_true, beta_true, p_true])
# -----------------------------------------------------------------------
# data_frame
independent_var = numpy.array(range(n_data)) * beta_true / (n_data - 1)
measurement_value = expit(independent_var, params_true)
measurement_std = n_data * [0.1]
constant_one = n_data * [1.0]
social_distance = [0.0 if i < n_data / 2 else 1.0 for i in range(n_data)]
data_group = n_data * ['world']
data_dict = {
'independent_var': independent_var,
'measurement_value': measurement_value,
'measurement_std': measurement_std,
'constant_one': constant_one,
'social_distance': social_distance,
'data_group': data_group,
}
data_frame = pandas.DataFrame(data_dict)
# ------------------------------------------------------------------------
# curve_model
data = Data(
df=data_frame,
col_t='independent_var',
col_obs='measurement_value',
col_covs=num_params * ['constant_one'],
col_group='data_group',
obs_space=expit,
col_obs_se='measurement_std'
)
alpha_intercept = Variable(
covariate='constant_one',
var_link_fun=lambda x: x,
fe_init=numpy.log(alpha_true) / 3,
re_init=0.0,
fe_bounds=[-numpy.inf, numpy.inf],
re_bounds=[0.0, 0.0]
)
beta_intercept = Variable(
covariate='constant_one',
var_link_fun=lambda x: x,
fe_init=beta_true / 3,
re_init=0.0,
fe_bounds=[-numpy.inf, numpy.inf],
re_bounds=[0.0, 0.0]
)
p_intercept = Variable(
covariate='constant_one',
var_link_fun=lambda x: x,
fe_init=numpy.log(p_true) / 3,
re_init=0.0,
fe_bounds=[-numpy.inf, numpy.inf],
re_bounds=[0.0, 0.0]
)
alpha = Parameter(param_name='alpha', link_fun=numpy.exp, variables=[alpha_intercept])
beta = Parameter(param_name='beta', link_fun=lambda x: x, variables=[beta_intercept])
p = Parameter(param_name='p', link_fun=numpy.exp, variables=[p_intercept])
parameters = ParameterSet([alpha, beta, p])
optimizer_options = {
'ftol': 1e-12,
'gtol': 1e-12,
}
model = CoreModel(
param_set=parameters,
curve_fun=expit,
loss_fun=normal_loss
)
solver = ScipyOpt(model)
solver.fit(data=data._get_df(copy=True, return_specs=True), options=optimizer_options)
params_estimate = model.get_params(solver.x_opt, expand=False)
# -------------------------------------------------------------------------
# check optimal parameters
for i in range(num_params):
rel_error = params_estimate[i] / params_true[i] - 1.0
assert abs(rel_error) < rel_tol
print('get_started_old.py: OK')