curvefit.uncertainty.predictive_validity.residuals
Data storage and manipulation for residual matrices
The Residuals
class keeps track of a prediction matrix and the associated
residual matrix at each time point.
Arguments
residual_info (ResidualInfo)
: metadata about residualsdata_specs (curvefit.core.data.DataSpecs)
: specifications about what data was passed in in order to generate these residuals
Attributes
prediction_matrix (np.ndarray)
: square matrix of size total number of time points for a group. The rows of the matrix are predictions from models fit on progressively more data, and the columns of the matrix are the predictions for each point in the time series. Everything above the diagonal is an out of sample prediction.residual_matrix (np.ndarray)
: square matrix of the same size as prediction matrix but has had observations subtracted off of it and (potentially) scaled by the prediction value
Methods
_record_predictions
Records a set of predictions into the prediction matrix.
i (int)
: the ith set of predictions (the whole time series) to recordpredictions (np.array)
: 1d numpy array of predictions across the time series
_compute_residuals
Given some observed data and an amount of scaling (theta), compute the residuals.
obs (np.array)
: 1d numpy array of observed data in the same space as the predictionstheta (float)
: amount of scaling. Atheta = 1
means that they are relative residuals (relative to the prediction magnitude) and atheta = 0
means that they are absolute residuals
_condense_matrix
Takes a square matrix of predictions or residuals and condenses this to a smaller matrix that only has out of sample predictions or residuals, and matches it to metadata about those residuals or predictions including how much data was used to predict (data_density --> "num_data") and how far out was this prediction time point from the last observed time point (sequential diffs --> "far_out").
matrix (np.ndarray)
: the square matrix to condensesequential_diffs (np.array)
: 1d array of sequential differences in time between observations (e.g. observations 3 time points apart would have sequential_diffs = 3) (results in the "far_out" column)data_density (np.array)
: the amount of data at each time point dropping all observations beyond this time point (results in the "num_data" column)