Skip to content

curvefit.uncertainty.predictive_validity.residuals

Data storage and manipulation for residual matrices

The Residuals class keeps track of a prediction matrix and the associated residual matrix at each time point.

Arguments

  • residual_info (ResidualInfo): metadata about residuals
  • data_specs (curvefit.core.data.DataSpecs): specifications about what data was passed in in order to generate these residuals

Attributes

  • prediction_matrix (np.ndarray): square matrix of size total number of time points for a group. The rows of the matrix are predictions from models fit on progressively more data, and the columns of the matrix are the predictions for each point in the time series. Everything above the diagonal is an out of sample prediction.
  • residual_matrix (np.ndarray): square matrix of the same size as prediction matrix but has had observations subtracted off of it and (potentially) scaled by the prediction value

Methods

_record_predictions

Records a set of predictions into the prediction matrix.

  • i (int): the ith set of predictions (the whole time series) to record
  • predictions (np.array): 1d numpy array of predictions across the time series

_compute_residuals

Given some observed data and an amount of scaling (theta), compute the residuals.

  • obs (np.array): 1d numpy array of observed data in the same space as the predictions
  • theta (float): amount of scaling. A theta = 1 means that they are relative residuals (relative to the prediction magnitude) and a theta = 0 means that they are absolute residuals

_condense_matrix

Takes a square matrix of predictions or residuals and condenses this to a smaller matrix that only has out of sample predictions or residuals, and matches it to metadata about those residuals or predictions including how much data was used to predict (data_density --> "num_data") and how far out was this prediction time point from the last observed time point (sequential diffs --> "far_out").

  • matrix (np.ndarray): the square matrix to condense
  • sequential_diffs (np.array): 1d array of sequential differences in time between observations (e.g. observations 3 time points apart would have sequential_diffs = 3) (results in the "far_out" column)
  • data_density (np.array): the amount of data at each time point dropping all observations beyond this time point (results in the "num_data" column)