Get Started

The following information is designed to get users up and running with sktime quickly. For more detailed information, see the links in each of the subsections.

Installation

sktime currently supports:

  • environments with python version 3.6, 3.7, or 3.8.

  • operating systems Mac OS X, Unix-like OS, Windows 8.1 and higher

  • installation via PyPi or conda

To install sktime with its core dependencies via pip use:

pip install sktime

To install sktime via pip with maximum dependencies, including soft dependencies, install using the all_extras modifier:

pip install sktime[all_extras]

To install sktime with its core dependencies via conda from conda-forge use:

conda install -c conda-forge sktime

To install sktime via conda with maximum dependencies, including soft dependencies, install using the all-extras conda recipe:

conda install -c conda-forge sktime-all-extras

Key Concepts

sktime seeks to provide a unified framework for multiple time series machine learning tasks. This (hopefully) makes sktime's functionality intuitive for users and lets developers extend the framework more easily. But time series data and the related scientific use cases each can take multiple forms. Therefore, a key set of common concepts and terminology is important.

Data Types

sktime is designed for time series machine learning. Time series data refers to data where the variables are ordered over time or an index indicating the position of an observation in the sequence of values.

In sktime time series data can refer to data that is univariate, multivariate or panel, with the difference relating to the number and interrelation between time series variables, as well as the number of instances for which each variable is observed.

  • Univariate time series data refers to data where a single variable is tracked over time.

  • Multivariate time series data refers to data where multiple variables are tracked over time for the same instance. For example, multiple quarterly economic indicators for a country or multiple sensor readings from the same machine.

  • Panel time series data refers to data where the variables (univariate or multivariate) are tracked for multiple instances. For example, multiple quarterly economic indicators for several countries or multiple sensor readings for multiple machines.

Learning Tasks

sktime's functionality for each learning tasks is centered around providing a set of code artifacts that match a common interface to a given scientific purpose (i.e. scientific type or scitype). For example, sktime includes a common interface for “forecaster” classes designed to predict future values of a time series.

sktime's interface currently supports:

  • Time series classification where the time series data for a given instance are used to predict a categorical target class.

  • Time series regression where the time series data for a given instance are used to predict a continuous target value.

  • Time series clustering where the goal is to discover groups consisting of instances with similar time series.

  • Forecasting where the goal is to predict future values of the input series.

  • Time series annotation which is focused on outlier detection, anomaly detection, change point detection and segmentation.

Reduction

While the list above presents each learning task separately, in many cases it is possible to adapt one learning task to help solve another related learning task. For example, one approach to forecasting would be to use a regression model that explicitly accounts for the data’s time dimension. However, another approach is to reduce the forecasting problem to cross-sectional regression, where the input data are tabularized and lags of the data are treated as independent features in scikit-learn style tabular regression algorithms. Likewise one approach to the time series annotation task like anomaly detection is to reduce the problem to using forecaster to predict future values and flag observations that are too far from these predictions as anomalies. sktime typically incorporates these type of reductions through the use of composable classes that let users adapt one learning task to solve another related one.

For more information on sktime's terminology and functionality see the Glossary of Common Terms and the user guide.

Quickstart

The code snippets below are designed to introduce sktime's functionality so you can start using its functionality quickly. For more detailed information see the Tutorials, User Guide and API Reference in sktime's Documentation.

Forecasting

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error

y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = ThetaForecaster(sp=12)  # monthly seasonal periodicity
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
mean_absolute_percentage_error(y_test, y_pred)
>>> 0.08661467738190656

Time Series Classification

from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import load_arrow_head
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
>>> 0.8679245283018868

Time Series Regression

Note

The time series regression API is stable. But the inclusion of a dataset to illustrate its features is still in progress.

from sktime.regression.compose import ComposableTimeSeriesForestRegressor

Time Series Clustering

Warning

The time series clustering API is still experimental. Features may change in future releases.

from sklearn.model_selection import train_test_split
from sktime.clustering import TimeSeriesKMeans
from sktime.clustering.evaluation._plot_clustering import plot_cluster_algorithm
from sktime.datasets import load_arrow_head

X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

k_means = TimeSeriesKMeans(n_clusters=5, init_algorithm="forgy", metric="dtw")
k_means.fit(X_train)
plot_cluster_algorithm(k_means, X_test, k_means.n_clusters)

Time Series Annotation

Warning

The time series annotation API is still experimental. Features may change in future releases.

from sktime.annotation.adapters import PyODAnnotator
from pyod.models.iforest import IForest
from sktime.datasets import load_airline
y = load_airline()
pyod_model = IForest()
pyod_sktime_annotator = PyODAnnotator(pyod_model)
pyod_sktime_annotator.fit(y)
annotated_series = pyod_sktime_annotator.predict(y)