`curvefit.uncertainty.predictive_validity.residuals`

Data storage and manipulation for residual matrices

The Residuals class keeps track of a prediction matrix and the associated residual matrix at each time point.

Arguments

residual_info (ResidualInfo): metadata about residuals
data_specs (curvefit.core.data.DataSpecs): specifications about what data was passed in in order to generate these residuals

Attributes

prediction_matrix (np.ndarray): square matrix of size total number of time points for a group. The rows of the matrix are predictions from models fit on progressively more data, and the columns of the matrix are the predictions for each point in the time series. Everything above the diagonal is an out of sample prediction.
residual_matrix (np.ndarray): square matrix of the same size as prediction matrix but has had observations subtracted off of it and (potentially) scaled by the prediction value

Methods

`_record_predictions`

Records a set of predictions into the prediction matrix.

i (int): the ith set of predictions (the whole time series) to record
predictions (np.array): 1d numpy array of predictions across the time series

`_compute_residuals`

Given some observed data and an amount of scaling (theta), compute the residuals.

obs (np.array): 1d numpy array of observed data in the same space as the predictions
theta (float): amount of scaling. A theta = 1 means that they are relative residuals (relative to the prediction magnitude) and a theta = 0 means that they are absolute residuals

`_condense_matrix`

Takes a square matrix of predictions or residuals and condenses this to a smaller matrix that only has out of sample predictions or residuals, and matches it to metadata about those residuals or predictions including how much data was used to predict (data_density --> "num_data") and how far out was this prediction time point from the last observed time point (sequential diffs --> "far_out").

matrix (np.ndarray): the square matrix to condense
sequential_diffs (np.array): 1d array of sequential differences in time between observations (e.g. observations 3 time points apart would have sequential_diffs = 3) (results in the "far_out" column)
data_density (np.array): the amount of data at each time point dropping all observations beyond this time point (results in the "num_data" column)