evaluate(forecaster, cv, y, X=None, strategy='refit', scoring=None, return_data=False, error_score=nan)[source]#

Evaluate forecaster using timeseries cross-validation.


Any forecaster

cvTemporal cross-validation splitter

Splitter of how to split the data into test data and train data


Target time series to which to fit the forecaster.

Xpd.DataFrame, default=None

Exogenous variables

strategy{“refit”, “update”, “no-update_params”}, optional, default=”refit”

defines the ingestion mode when the forecaster sees new data when window expands “refit” = forecaster is refitted to each training window “update” = forecaster is updated with training window data, in sequence provided “no-update_params” = fit to first training window, re-used without fit or update

scoringsubclass of sktime.performance_metrics.BaseMetric, default=None.

Used to get a score function that takes y_pred and y_test arguments and accept y_train as keyword argument. If None, then uses scoring = MeanAbsolutePercentageError(symmetric=True).

return_databool, default=False

Returns three additional columns in the DataFrame, by default False. The cells of the columns contain each a pd.Series for y_train, y_pred, y_test.

error_score“raise” or numeric, default=np.nan

Value to assign to the score if an exception occurs in estimator fitting. If set to “raise”, the exception is raised. If a numeric value is given, FitFailedWarning is raised.


DataFrame that contains several columns with information regarding each refit/update and prediction of the forecaster.

>>> from sktime.datasets import load_airline
>>> from sktime.forecasting.model_evaluation import evaluate
>>> from sktime.forecasting.model_selection import ExpandingWindowSplitter
>>> from sktime.forecasting.naive import NaiveForecaster
>>> y = load_airline()
>>> forecaster = NaiveForecaster(strategy="mean", sp=12)
>>> cv = ExpandingWindowSplitter(initial_window=12, step_length=3,
... fh=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
>>> results = evaluate(forecaster=forecaster, y=y, cv=cv)

Optionally, users may select other metrics that can be supplied by scoring argument. These can be forecast metrics of any kind, i.e., point forecast metrics, interval metrics, quantile foreast metrics. https://www.sktime.org/en/stable/api_reference/performance_metrics.html?highlight=metrics

To evaluate estimators using a specific metric, provide them to the scoring arg.

>>> from sktime.performance_metrics.forecasting import MeanAbsoluteError
>>> loss = MeanAbsoluteError()
>>> results = evaluate(forecaster=forecaster, y=y, cv=cv, scoring=loss)

An example of an interval metric is the PinballLoss. It can be used with all probabilistic forecasters.

>>> from sktime.forecasting.naive import NaiveVariance
>>> from sktime.performance_metrics.forecasting.probabilistic import PinballLoss
>>> loss = PinballLoss()
>>> forecaster = NaiveForecaster(strategy="drift")
>>> results = evaluate(forecaster=NaiveVariance(forecaster),
... y=y, cv=cv, scoring=loss)