Example of Prior Initializer
Function Documentation
See PriorInitializer for function documentation.
Example
In this example, we will simulate data from the curvefit.core.functions.ln_gaussian_pdf
with n_groups
different
groups random effects, and fit a PriorInitializer
to these groups to get
a new ParameterSet
to be used for fitting other groups. The idea is that if you have more complete time series'
for some groups, then you could first fit a model(s) to these groups alone, extract information about
fixed effects means, random effects variances, etc., and then use that information to set better priors for
the groups that have less complete time series'.
This example shows how to generate a new ParameterSet
using PriorInitializer
that could then be used
to fit other models.
First, we need to import all of the packages and modules that we will use:
import numpy as np
import pandas as pd
from curvefit.core.functions import ln_gaussian_pdf, normal_loss
from curvefit.core.data import Data
from curvefit.core.parameter import ParameterSet, Parameter, Variable
from curvefit.initializer.initializer import PriorInitializer
from curvefit.initializer.initializer_component import BetaPrior
from curvefit.models.core_model import CoreModel
from curvefit.solvers.solvers import ScipyOpt
Simulate Data
Now we will set the simulation parameters. We will simulate 10 groups with 10 time points each, and their
parameters are in the order alpha
, beta
, and p
for the ln_gaussian_pdf
functional form. The
random effects variance will be used to simulate parameters from the mean alpha
, beta
, and p
for each group.
# simulation parameters
np.random.seed(10)
time = np.arange(10)
n_groups = 10
fe_mean = np.array([1, 3., 1.])
re_var = np.array([0.01, 0.1, 0.1])
# fe_gprior and re_gprior for the solver
fe_gprior = [0., np.inf]
re_gprior = [0., np.inf]
# how much noise to add to the data
data_noise = 1.
Define a function to simulate data based on these parameters, and create simulated data.
def simulate_data():
re_mean = np.random.normal(fe_mean, re_var, size=(n_groups, len(fe_mean)))
group_names = [f"group_{i}" for i in range(n_groups)]
groups = []
for i in range(n_groups):
group_sim = ln_gaussian_pdf(time, re_mean[i, :]) + np.random.normal(0, data_noise, size=len(time))
groups.append(pd.DataFrame({
'group_name': group_names[i],
'obs': group_sim,
'time': time
}))
group_data = pd.concat(groups)
group_data['intercept'] = 1
return group_data
df = simulate_data()
# Use the `Data` object to store information about what the columns represent
data = Data(
df=df, col_t='time', col_obs='obs', col_covs=['intercept'],
obs_space=ln_gaussian_pdf, col_group='group_name', obs_se_func=lambda x: 1
)
Create a Parameter Set
We need to define variables, parameters that use those variables (in this case there are only intercepts -- and one covariate per parameter -- so the variables are effectively the same as the parameters), and a parameter set that collects all of that information into one object.
alpha_fe = Variable(
covariate='intercept',
var_link_fun=lambda x: x,
fe_init=fe_mean[0], re_init=0.,
fe_gprior=fe_gprior,
re_gprior=re_gprior
)
beta_fe = Variable(
covariate='intercept',
var_link_fun=lambda x: x,
fe_init=fe_mean[1], re_init=0.,
fe_gprior=fe_gprior,
re_gprior=re_gprior
)
p_fe = Variable(
covariate='intercept',
var_link_fun=lambda x: x,
fe_init=fe_mean[2], re_init=0.,
fe_gprior=fe_gprior,
re_gprior=re_gprior
)
alpha = Parameter('alpha', link_fun=np.exp, variables=[alpha_fe])
beta = Parameter('beta', link_fun=lambda x: x, variables=[beta_fe])
p = Parameter('p', link_fun=lambda x: x, variables=[p_fe])
params = ParameterSet([alpha, beta, p])
Models and Solvers
We now need a Model
that contains our parameter set and also which curve function we want to fit, and the loss
function for the optimization. We also need to define a Solver
that will actually perform the optimization.
Finally, we will fit the solver to our simulated data.
model = CoreModel(
param_set=params,
curve_fun=ln_gaussian_pdf,
loss_fun=normal_loss
)
solver = ScipyOpt()
Use the Prior Initializer
The purpose of using the prior initializer is that it gets smarter priors for later on. Therefore, there are many
different things that one could do to get good priors based on subsets of the data. Each of these types
is implemented in a separate PriorInitializerComponent
. Here we will use
the BetaPrior
prior initializer component; the goal is to estimate the mean and variance of the random effects
and then set that as the fixed effects prior variance for a new parameter set that can be used in later model fits.
Even though we only have 10 groups, our random effect variance should get pretty close to our simulated value.
The BetaPrior()
will update the fixed effects Gaussian prior for the beta
parameter to what is estimated based on
a joint model fit with random effects.
# Instantiate a PriorInitializer with one component
pi = PriorInitializer(prior_initializer_components=[BetaPrior()])
# Create a new parameter set from running the initialization
new_param_set = pi.initialize(
data=data,
model_prototype=model,
solver_prototype=solver
)
# Make sure that the estimated variance is close to the simulated variance
np.testing.assert_almost_equal(
new_param_set.fe_gprior[1], [fe_mean[1], re_var[1]], decimal=1
)
print('prior_initializer.py: OK')