modrover.rover#

class modrover.rover.Rover(model_type, obs, cov_fixed, cov_exploring, main_param=None, param_specs=None, weights='weights', holdouts=None, get_score=<function get_rmse>)[source]#

Bases: object

Rover class explores model space and creates final super learner for prediction and inference.

Parameters:
  • model_type (str) – Type of the model. For example "gaussian" or "poisson"

  • obs (str) – The name of the column representing observations

  • cov_fixed (list[str]) – A list representing the covariates are present in every learner

  • cov_exploring (list[str]) – A list representing the covariates rover will explore over

  • main_param (str | None) – The main parameter where the cov_fixed and cov_exploring are applied to. By default main_param=None, and when the model only have one parameter, main_param will be automatically re-assigned to be that parameter. If we have multiple parameters in the model, user has to specify main_param.

  • param_specs (dict[str, dict] | None) – Parameter settings including, link function, priors, etc

  • weights (str) – Column name corresponding to the weights for each data point

  • holdouts (list[str] | None) – A list of column names containing 1’s and 0’s that represent folds in the rover cross-validation algorithm

  • get_score (Callable) – A callable used to evaluate cross-validated score of sub-learners in rover

property model_class: type#

Model class that model_type refers to.

property params: tuple[str, ...]#

A tuple of parameter names belong to the model class.

property variables: tuple[str, ...]#

A tuple of the variable names belong the model class with full list of covariates.

property num_vars: int#

Number of variables with full list of covariates.

property super_learner_id: tuple[int, ...]#

Learner id for the super learner.

property super_learner: Learner#

Ensembled super learner.

property learner_info: DataFrame#

A data frame contains important information of fitted learners.

property summary: DataFrame#

A data frame contains the summary information of explored covariates across all fitted learners.

fit(data, strategies, strategy_options=None, top_pct_score=0.1, top_pct_learner=1.0, coef_bounds=None)[source]#

Fits the ensembled super learner.

Explores over all covariate slices as defined by the input strategy, and fits the sublearners.

The super learner coefficients are determined by the ensemble method parameters, and the super learner itself will be created - to be used in prediction and summarization.

Parameters:
  • data (DataFrame) – Training data to fit individual learners on.

  • strategies (list[str]) – The selection strategy to determine the model tree. Valid strategies include “forward”, “backward” and “full”.

  • strategy_options (dict | None) – A dictionary with key as the strategy name and value as the option with calling the strategy. By default, strategy_options=None where all default options will be used by the strategies.

  • top_pct_score (float) – Only the learners with score that are greater or equal than best_score * (1 - top_score) can be selected. When top_score = 0 only the best model will be selected.

  • top_pct_learner (float) – Only the best top_pct_learner * num_learners will be selected.

  • coef_bounds (dict[str, tuple[float, float]] | None) – User pre-specified bounds for the coefficients. This is a dictionary with key as the covariate name and the value as the bounds. The learner will be marked valid or not if the coefficients are within the bounds. Invalid learners will not be used in ensemble process to create super learner. By default, coef_bounds=None, where there is not validation based on the value of the coefficients.

Return type:

None

predict(data, return_ui=False, alpha=0.05)[source]#

Predict with ensembled super learner.

Parameters:
  • data (DataFrame) – Testing data to predict

  • return_ui (bool) – If return_ui=True, a matrix will be returned. The first row is the point prediction, second and thrid rows are the lower and upper bound of the prediction.

  • alpha (float) – When return_ui=True, function will return (1 - alpha) uncertainty interval. By default, alpha=0.05.

Return type:

ndarray[Any, dtype[ScalarType]]

plot(bins=None)[source]#

Plot the result of the exploration. Each panel of the figure corresponding to one covariate in the cov_exploring. We plot the spread of the coefficients across all learners along with color represents their performance score.

Parameters:

bins (int | None) – When bins=None, the coefficients will be spread along the y axis randomly to display the spread. When user pass in an integer, the x axis will be divided into bins and the y value will be assigned according to the ranking of score within the bin.

Return type:

Figure