binder

Univariate time series classification with sktime

In this notebook, we will use sktime for univariate time series classification. Here, we have a single time series variable and an associated label for multiple instances. The goal is to find a classifier that can learn the relationship between time series and label and accurately predict the label of new series.

When you have multiple time series variables and want to learn the relationship between them and a label, you can take a look at our multivariate time series classification notebook.

Preliminaries

[2]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.tree import DecisionTreeClassifier

from sktime.classification.compose import ComposableTimeSeriesForestClassifier
from sktime.datasets import load_arrow_head
from sktime.utils.slope_and_trend import _slope

Load data

In this notebook, we use the arrow head problem.

The arrowhead dataset consists of outlines of the images of arrow heads. The classification of projectile points is an important topic in anthropology. The classes are based on shape distinctions such as the presence and location of a notch in the arrow.

arrow heads

The shapes of the projectile points are converted into a sequence using the angle-based method as described in this blog post about converting images into time series for data mining.

from shapes to time series

Data representation

Throughout sktime, the expected data format is a pd.DataFrame, but in a slightly unusual format. A single column can contain not only primitives (floats, integers or strings), but also entire time series in form of a pd.Series or np.array.

For more details on our choice of data container, see this wiki entry.

[3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[4]:
# univariate time series input data
X_train.head()
[4]:
dim_0
20 0 -1.8624 1 -1.8624 2 -1.8427 3 ...
14 0 -1.9134 1 -1.9116 2 -1.8902 3 ...
27 0 -2.5471 1 -2.5494 2 -2.4694 3 ...
138 0 -1.7677 1 -1.7506 2 -1.7444 3 ...
32 0 -1.8955 1 -1.8963 2 -1.8802 3 ...
[5]:
# binary target variable
labels, counts = np.unique(y_train, return_counts=True)
print(labels, counts)
['0' '1' '2'] [65 48 45]
[6]:
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
for label in labels:
    X_train.loc[y_train == label, "dim_0"].iloc[0].plot(ax=ax, label=f"class {label}")
plt.legend()
ax.set(title="Example time series", xlabel="Time");
../_images/examples_02_classification_univariate_8_0.png

Why not just use scikit-learn?

We can still use scikit-learn, but using scikit-learn comes with some implicit modelling choices.

Reduction: from time-series classification to tabular classification

To use scikit-learn, we have to convert the data into the required tabular format. There are different ways we can do that:

Treating time points as separate features (tabularisation)

Alternatively, we could bin and aggregate observations in time bins of different length.

[7]:
from sklearn.ensemble import RandomForestClassifier

from sktime.datatypes._panel._convert import from_nested_to_2d_array

X_train_tab = from_nested_to_2d_array(X_train)
X_test_tab = from_nested_to_2d_array(X_test)

X_train_tab.head()
[7]:
dim_0__0 dim_0__1 dim_0__2 dim_0__3 dim_0__4 dim_0__5 dim_0__6 dim_0__7 dim_0__8 dim_0__9 ... dim_0__241 dim_0__242 dim_0__243 dim_0__244 dim_0__245 dim_0__246 dim_0__247 dim_0__248 dim_0__249 dim_0__250
20 -1.8624 -1.8624 -1.8427 -1.8389 -1.8125 -1.7883 -1.7353 -1.6973 -1.6867 -1.6005 ... -1.4512 -1.5289 -1.6007 -1.6346 -1.6776 -1.7156 -1.7428 -1.7931 -1.8010 -1.8223
14 -1.9134 -1.9116 -1.8902 -1.8888 -1.8449 -1.7968 -1.7260 -1.6978 -1.6536 -1.6046 ... -1.6822 -1.7278 -1.7872 -1.8406 -1.8647 -1.9101 -1.9322 -1.9538 -1.9375 -1.9396
27 -2.5471 -2.5494 -2.4694 -2.3852 -2.2816 -2.1575 -2.0229 -1.8772 -1.6540 -1.4304 ... -1.1124 -1.3715 -1.5929 -1.8242 -1.9655 -2.1057 -2.2102 -2.3431 -2.4450 -2.4901
138 -1.7677 -1.7506 -1.7444 -1.7378 -1.7401 -1.7149 -1.7006 -1.6687 -1.6465 -1.6050 ... -1.5569 -1.5907 -1.6087 -1.6451 -1.6871 -1.6965 -1.7349 -1.7454 -1.7659 -1.7817
32 -1.8955 -1.8963 -1.8802 -1.8965 -1.8756 -1.8331 -1.8294 -1.8007 -1.7673 -1.7392 ... -1.4227 -1.4808 -1.5360 -1.5851 -1.6271 -1.6552 -1.7084 -1.7752 -1.8361 -1.8736

5 rows × 251 columns

[8]:
# let's get a baseline for comparison
from sklearn.dummy import DummyClassifier

classifier = DummyClassifier(strategy="prior")
classifier.fit(X_train_tab, y_train)
classifier.score(X_test_tab, y_test)
[8]:
0.3018867924528302
[9]:
# now we can apply any scikit-learn classifier
classifier = RandomForestClassifier(n_estimators=100)
classifier.fit(X_train_tab, y_train)
y_pred = classifier.predict(X_test_tab)
accuracy_score(y_test, y_pred)
[9]:
0.7924528301886793
[10]:
from sklearn.pipeline import make_pipeline

# with sktime, we can write this as a pipeline
from sktime.transformations.panel.reduce import Tabularizer

classifier = make_pipeline(Tabularizer(), RandomForestClassifier())
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
[10]:
0.7547169811320755

What’s the implicit modelling choice here?

We treat each observation as a separate feature and thus ignore they are ordered in time. A tabular algorithm cannot make use of the fact that features are ordered in time, i.e. if we changed the order of the features, the fitted model and predictions wouldn’t change. Sometimes this works well, sometimes it doesn’t.

Feature extraction

Another modelling choice: we could extract features from the time series and then use the features to fit our tabular classifier. Here we use tsfresh for automatic feature extraction.

[11]:
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

transformer = TSFreshFeatureExtractor(default_fc_parameters="minimal")
extracted_features = transformer.fit_transform(X_train)
extracted_features.head()
C:\Users\LENOVO\anaconda3\lib\site-packages\sktime\transformations\panel\tsfresh.py:163: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  warn(
Feature Extraction: 100%|████████████████████████████████████████████████████████████████| 5/5 [00:03<00:00,  1.61it/s]
[11]:
dim_0__sum_values dim_0__median dim_0__mean dim_0__length dim_0__standard_deviation dim_0__variance dim_0__root_mean_square dim_0__maximum dim_0__minimum
0 -0.000303 0.019023 -1.207171e-06 251.0 0.998007 0.996019 0.998007 1.5310 -1.8624
1 0.000866 0.044839 3.450199e-06 251.0 0.998007 0.996019 0.998007 1.5132 -1.9538
2 -0.000113 0.449820 -4.501992e-07 251.0 0.998008 0.996020 0.998008 1.4556 -2.5494
3 0.000139 0.063572 5.537849e-07 251.0 0.998007 0.996019 0.998007 1.3282 -1.7817
4 -0.000082 -0.010786 -3.266932e-07 251.0 0.998004 0.996012 0.998004 1.6364 -1.8965
[12]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False), RandomForestClassifier()
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
C:\Users\LENOVO\anaconda3\lib\site-packages\sktime\transformations\panel\tsfresh.py:163: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  warn(
Feature Extraction: 100%|████████████████████████████████████████████████████████████████| 5/5 [00:13<00:00,  2.72s/it]
C:\Users\LENOVO\anaconda3\lib\site-packages\sktime\transformations\panel\tsfresh.py:163: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  warn(
Feature Extraction: 100%|████████████████████████████████████████████████████████████████| 5/5 [00:06<00:00,  1.34s/it]
[12]:
0.8490566037735849

What’s the implicit modelling choice here?

Instead of working in the domain of the time series, we extract features from time series and choose to work in the domain of the features. Again, sometimes this works well, sometimes it doesn’t. The main difficulty is finding discriminative features for the classification problem.

Time series classification with sktime

sktime has a number of specialised time series algorithms.

Time series forest

Time series forest is a modification of the random forest algorithm to the time series setting:

  1. Split the series into multiple random intervals,

  2. Extract features (mean, standard deviation and slope) from each interval,

  3. Train a decision tree on the extracted features,

  4. Ensemble steps 1 - 3.

For more details, take a look at the paper.

In sktime, we can write:

[13]:
from sktime.transformations.panel.summarize import RandomIntervalFeatureExtractor

steps = [
    (
        "extract",
        RandomIntervalFeatureExtractor(
            n_intervals="sqrt", features=[np.mean, np.std, _slope]
        ),
    ),
    ("clf", DecisionTreeClassifier()),
]
time_series_tree = Pipeline(steps)

We can directly fit and evaluate the single time series tree (which is simply a pipeline).

[14]:
time_series_tree.fit(X_train, y_train)
time_series_tree.score(X_test, y_test)
[14]:
0.6792452830188679

For time series forest, we can simply use the single tree as the base estimator in the forest ensemble.

[15]:
tsf = ComposableTimeSeriesForestClassifier(
    estimator=time_series_tree,
    n_estimators=100,
    criterion="entropy",
    bootstrap=True,
    oob_score=True,
    random_state=1,
    n_jobs=-1,
)

Fit and obtain the out-of-bag score:

[16]:
tsf.fit(X_train, y_train)

if tsf.oob_score:
    print(tsf.oob_score_)
0.8607594936708861
[17]:
tsf = ComposableTimeSeriesForestClassifier()
tsf.fit(X_train, y_train)
tsf.score(X_test, y_test)
[17]:
0.8301886792452831

We can also obtain feature importances for the different features and intervals that the algorithms looked at and plot them in a feature importance graph over time.

[18]:
fi = tsf.feature_importances_
# renaming _slope to slope.
fi.rename(columns={"_slope": "slope"}, inplace=True)
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi.plot(ax=ax)
ax.set(xlabel="Time", ylabel="Feature importance");
../_images/examples_02_classification_univariate_27_0.png

More about feature importances

The feature importances method is based on the example showcased in this paper.

In addition to the feature importances method available in scikit-learn, our method collects the feature importances values from each estimator for their respective intervals, calculates the sum of feature importances values on each timepoint, and normalises the values first by the number of estimators and then by the number of intervals.

As a result, the temporal importance curves can be plotted, as shown in the previous example.

Please note that this method currently supports only one particular structure of the TSF, where RandomIntervalFeatureExtractor() is used in the pipeline, or simply the default TimeSeriesForestClassifier() setting. For instance, two possible approaches could be:

[20]:
# Method 1: Default time-series forest classifier
tsf1 = ComposableTimeSeriesForestClassifier()
tsf1.fit(X_train, y_train)
fi1 = tsf1.feature_importances_
# renaming _slope to slope.
fi1.rename(columns={"_slope": "slope"}, inplace=True)
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi1.plot(ax=ax)

# Method 2: Pipeline
features = [np.mean, np.std, _slope]
steps = [
    ("transform", RandomIntervalFeatureExtractor(features=features)),
    ("clf", DecisionTreeClassifier()),
]
base_estimator = Pipeline(steps)
tsf2 = ComposableTimeSeriesForestClassifier(estimator=base_estimator)
tsf2.fit(X_train, y_train)
fi2 = tsf2.feature_importances_
# renaming _slope to slope.
fi2.rename(columns={"_slope": "slope"}, inplace=True)
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
fi2.plot(ax=ax);
../_images/examples_02_classification_univariate_29_0.png
../_images/examples_02_classification_univariate_29_1.png

RISE

Another popular variant of time series forest is the so-called Random Interval Spectral Ensemble (RISE), which makes use of several series-to-series feature extraction transformers, including:

  • Fitted auto-regressive coefficients,

  • Estimated autocorrelation coefficients,

  • Power spectrum coefficients.

[19]:
from sktime.classification.interval_based import RandomIntervalSpectralForest

rise = RandomIntervalSpectralForest(n_estimators=10)
rise.fit(X_train, y_train)
rise.score(X_test, y_test)
[19]:
0.8113207547169812

K-nearest-neighbours classifier for time series

For time series, the most popular k-nearest-neighbours algorithm is based on dynamic time warping (dtw) distance measure.

686e51e808484ea6b37e217d1cfa89a9

Here we look at the BasicMotions data set. The data was generated as part of a student project where four students performed four activities whilst wearing a smart watch. The watch collects 3D accelerometer and a 3D gyroscope It consists of four classes, which are walking, resting, running and badminton. Participants were required to record motion a total of five times, and the data is sampled once every tenth of a second, for a ten second period.

[20]:
from sktime.datasets import load_basic_motions

X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X.iloc[:, [0]], y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 1) (60,) (20, 1) (20,)
[21]:
labels, counts = np.unique(y_train, return_counts=True)
print(labels, counts)
['badminton' 'running' 'standing' 'walking'] [15 15 13 17]
[22]:
fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
for label in labels:
    X_train.loc[y_train == label, "dim_0"].iloc[0].plot(ax=ax, label=label)
plt.legend()
ax.set(title="Example time series", xlabel="Time");
[22]:
[Text(0.5, 1.0, 'Example time series'), Text(0.5, 0, 'Time')]
../_images/examples_02_classification_univariate_35_1.png
[23]:
for label in labels[:2]:
    fig, ax = plt.subplots(1, figsize=plt.figaspect(0.25))
    for instance in X_train.loc[y_train == label, "dim_0"]:
        ax.plot(instance)
    ax.set(title=f"Instances of {label}")
../_images/examples_02_classification_univariate_36_0.png
../_images/examples_02_classification_univariate_36_1.png

from sklearn.neighbors import KNeighborsClassifier knn = make_pipeline( Tabularizer(), KNeighborsClassifier(n_neighbors=1, metric=”euclidean”)) knn.fit(X_train, y_train) knn.score(X_test, y_test)

[24]:
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier

knn = KNeighborsTimeSeriesClassifier(n_neighbors=1, distance="dtw")
knn.fit(X_train, y_train)
knn.score(X_test, y_test)
[24]:
1.0

Other classifiers

To find out what other algorithms we have implemented in sktime, you can use our utility function:

[25]:
from sktime.registry import all_estimators

all_estimators(estimator_types="classifier", as_dataframe=True)
[25]:
[('BOSSEnsemble', sktime.classification.dictionary_based._boss.BOSSEnsemble),
 ('ColumnEnsembleClassifier',
  sktime.classification.compose._column_ensemble.ColumnEnsembleClassifier),
 ('ComposableTimeSeriesForestClassifier',
  sktime.classification.compose._ensemble.ComposableTimeSeriesForestClassifier),
 ('ContractableBOSS',
  sktime.classification.dictionary_based._cboss.ContractableBOSS),
 ('ElasticEnsemble',
  sktime.classification.distance_based._elastic_ensemble.ElasticEnsemble),
 ('HIVECOTEV1', sktime.classification.hybrid._hivecote_v1.HIVECOTEV1),
 ('IndividualBOSS',
  sktime.classification.dictionary_based._boss.IndividualBOSS),
 ('IndividualTDE', sktime.classification.dictionary_based._tde.IndividualTDE),
 ('KNeighborsTimeSeriesClassifier',
  sktime.classification.distance_based._time_series_neighbors.KNeighborsTimeSeriesClassifier),
 ('MUSE', sktime.classification.dictionary_based._muse.MUSE),
 ('MrSEQLClassifier',
  sktime.classification.shapelet_based.mrseql.mrseql.MrSEQLClassifier),
 ('ProximityForest',
  sktime.classification.distance_based._proximity_forest.ProximityForest),
 ('ProximityStump',
  sktime.classification.distance_based._proximity_forest.ProximityStump),
 ('ProximityTree',
  sktime.classification.distance_based._proximity_forest.ProximityTree),
 ('ROCKETClassifier',
  sktime.classification.shapelet_based._rocket_classifier.ROCKETClassifier),
 ('RandomIntervalSpectralForest',
  sktime.classification.interval_based._rise.RandomIntervalSpectralForest),
 ('ShapeDTW', sktime.classification.distance_based._shape_dtw.ShapeDTW),
 ('ShapeletTransformClassifier',
  sktime.classification.shapelet_based._stc.ShapeletTransformClassifier),
 ('SupervisedTimeSeriesForest',
  sktime.classification.interval_based._stsf.SupervisedTimeSeriesForest),
 ('TemporalDictionaryEnsemble',
  sktime.classification.dictionary_based._tde.TemporalDictionaryEnsemble),
 ('TimeSeriesForestClassifier',
  sktime.classification.interval_based._tsf.TimeSeriesForestClassifier),
 ('WEASEL', sktime.classification.dictionary_based._weasel.WEASEL)]

Generated using nbsphinx. The Jupyter notebook can be found here.