Get Started¶
The following information is designed to get users up and running with sktime quickly. For more detailed information, see the links in each of the subsections.
Installation¶
sktime currently supports:
environments with python version 3.6, 3.7, or 3.8.
operating systems Mac OS X, Unix-like OS, Windows 8.1 and higher
installation via
PyPiorconda
To install sktime with its core dependencies via pip use:
pip install sktime
To install sktime via pip with maximum dependencies, including soft dependencies, install using the all_extras modifier:
pip install sktime[all_extras]
To install sktime with its core dependencies via conda from conda-forge use:
conda install -c conda-forge sktime
To install sktime via conda with maximum dependencies, including soft dependencies, install using the all-extras conda recipe:
conda install -c conda-forge sktime-all-extras
Key Concepts¶
sktime seeks to provide a unified framework for multiple time series machine learning tasks. This (hopefully) makes sktime's functionality intuitive for users
and lets developers extend the framework more easily. But time series data and the related scientific use cases each can take multiple forms.
Therefore, a key set of common concepts and terminology is important.
Data Types¶
sktime is designed for time series machine learning. Time series data refers to data where the variables are ordered over time or
an index indicating the position of an observation in the sequence of values.
In sktime time series data can refer to data that is univariate, multivariate or panel, with the difference relating to the number and interrelation
between time series variables, as well as the number of instances for which each variable is observed.
Univariate time series data refers to data where a single variable is tracked over time.
Multivariate time series data refers to data where multiple variables are tracked over time for the same instance. For example, multiple quarterly economic indicators for a country or multiple sensor readings from the same machine.
Panel time series data refers to data where the variables (univariate or multivariate) are tracked for multiple instances. For example, multiple quarterly economic indicators for several countries or multiple sensor readings for multiple machines.
Learning Tasks¶
sktime's functionality for each learning tasks is centered around providing a set of code artifacts that match a common interface to a given
scientific purpose (i.e. scientific type or scitype). For example, sktime includes a common interface for “forecaster” classes designed to predict future values
of a time series.
sktime's interface currently supports:
Time series classification where the time series data for a given instance are used to predict a categorical target class.
Time series regression where the time series data for a given instance are used to predict a continuous target value.
Time series clustering where the goal is to discover groups consisting of instances with similar time series.
Forecasting where the goal is to predict future values of the input series.
Time series annotation which is focused on outlier detection, anomaly detection, change point detection and segmentation.
Reduction¶
While the list above presents each learning task separately, in many cases it is possible to adapt one learning task to help solve another related learning task. For example,
one approach to forecasting would be to use a regression model that explicitly accounts for the data’s time dimension. However, another approach is to reduce the forecasting problem
to cross-sectional regression, where the input data are tabularized and lags of the data are treated as independent features in scikit-learn style
tabular regression algorithms. Likewise one approach to the time series annotation task like anomaly detection is to reduce the problem to using forecaster to predict future values and flag
observations that are too far from these predictions as anomalies. sktime typically incorporates these type of reductions through the use of composable classes that
let users adapt one learning task to solve another related one.
For more information on sktime's terminology and functionality see the Glossary of Common Terms and the user guide.
Quickstart¶
The code snippets below are designed to introduce sktime's functionality so you can start using its functionality quickly. For more detailed information see the Tutorials, User Guide and API Reference in sktime's Documentation.
Forecasting¶
from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.forecasting.theta import ThetaForecaster
from sktime.performance_metrics.forecasting import mean_absolute_percentage_error
y = load_airline()
y_train, y_test = temporal_train_test_split(y)
fh = ForecastingHorizon(y_test.index, is_relative=False)
forecaster = ThetaForecaster(sp=12) # monthly seasonal periodicity
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)
mean_absolute_percentage_error(y_test, y_pred)
>>> 0.08661467738190656
Time Series Classification¶
from sktime.classification.interval_based import TimeSeriesForestClassifier
from sktime.datasets import load_arrow_head
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
classifier = TimeSeriesForestClassifier()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
accuracy_score(y_test, y_pred)
>>> 0.8679245283018868
Time Series Regression¶
Note
The time series regression API is stable. But the inclusion of a dataset to illustrate its features is still in progress.
from sktime.regression.compose import ComposableTimeSeriesForestRegressor
Time Series Clustering¶
Warning
The time series clustering API is still experimental. Features may change in future releases.
from sklearn.model_selection import train_test_split
from sktime.clustering import TimeSeriesKMeans
from sktime.clustering.evaluation._plot_clustering import plot_cluster_algorithm
from sktime.datasets import load_arrow_head
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
k_means = TimeSeriesKMeans(n_clusters=5, init_algorithm="forgy", metric="dtw")
k_means.fit(X_train)
plot_cluster_algorithm(k_means, X_test, k_means.n_clusters)
Time Series Annotation¶
Warning
The time series annotation API is still experimental. Features may change in future releases.
from sktime.annotation.adapters import PyODAnnotator
from pyod.models.iforest import IForest
from sktime.datasets import load_airline
y = load_airline()
pyod_model = IForest()
pyod_sktime_annotator = PyODAnnotator(pyod_model)
pyod_sktime_annotator.fit(y)
annotated_series = pyod_sktime_annotator.predict(y)