Available Models
Parametrizing a Model
Each of the available models is described below. They have different functional forms and therefore different parameters, and possibly different link functions for those parameters.
The function to run a model is ccount.run.df_to_model
, and requires the argument model_type
. You determine the order of the outcomes by the order of the elements of the list of variable names that you pass as the outcomes
argument, but you do not determine the order of the parameters in the model (that is explained below).
The fixed effects are allowed to differ across the outcomes and parameters. The order in which you indicate the fixed_effects
argument matters, and that order is described below. For example, for and ,
fixed_effects=[
[
['covariate_1'], # first parameter - first outcome
['covariate_1'] # first parameter - second outcome
],
[
['covariate_1', 'covariate_2'], # second parameter - first outcome
['covariate_2'] # second parameter - second outcome
]
]
will add 'covariate_1'
as a covariate for the first parameter on both outcomes, add 'covariate_2'
as the only covariate for the second parameter and second outcome, and add both 'covariate_1'
and 'covariate_2'
as covariates for the second parameter and first outcome.
Model Choices
Zero-Inflated Poisson Model
The Zero-Inflated Poisson model (ZIP) has a Poisson distribution, with a binomial distribution that determines that probability of the Poisson random variable realization being masked by an additional zero. Zeros can arise from the Poisson distribution or the Binomial distribution.
- Parameter 1: probability of a structural zero (coming from the Binomial distribution)
- Parameter 2: mean of the Poisson distribution
To fit this model, use model_type = "zero_inflated_poisson"
, which will use an exponential link function for the mean and the inverse logit function for the probability of a structural zero. Alternatively, you can use model_type = "zero_inflated_poisson_relu"
, which will use a modified link function for the mean that is more stable for large values.
Poisson Hurdle Model
The Poisson hurdle model has a Poisson distribution that has been truncated at 0, and re-normalized for its new support, and a binomial model for the probability of a 0. In this model, zeros can only arise from the Binomial distribution.
- Parameter 1: probability of a zero
- Parameter 2: mean of the Poisson before truncation
To fit this model, use model_type = "hurdle_poisson"
, which will use an exponential link function for the mean and the inverse logit function for the probability of a structural zero. Alternatively, you can use model_type = "hurdle_poisson_relu"
, which will use a modified link function for the mean that is more stable for large values (if you haven't already scaled your covariates).
Negative Binomial
The Negative Binomial model is an extension of the Poisson model that allows for over-dispersion (since in the Poisson distribution the mean = variance).
- Parameter 1: mean of the distribution
- Parameter 2: over-dispersion parameter (large values means variance >> mean)
To fit this model, use model_type = "negative_binomial"
, which will use an exponential link function for the mean and the over-dispersion parameter.
Logistic Model
The Logistic model fits a logistic regression model.
- Parameter 1: probability of the outcome
To fit this model, use model_type = "logistic"
, which will use the inverse logit function for the probability of the outcome.
The logistic model is different than the count models because it expects either 0 or 1 as the outcome variable. It will give an error otherwise.
If you would rather input counts from some population size, the outcome variable must still be 0 or 1, but you can pass another column
for weight
that is the total number of 0 or 1 in that data row. For example, if you have a categorical variable as your predictor, you
could aggregate your data: create a new column in a pandas.DataFrame.groupby
that is the sum of the total number of 0 and 1 in each category, and pass that total number as weight
.