binder

Time series classification with Mr-SEQL

Mr-SEQL[1] is a univariate time series classifier which train linear classification models (logistic regression) with features extracted from multiple symbolic representations of time series (SAX, SFA). The features are extracted by using SEQL[2].

[1] T. L. Nguyen, S. Gsponer, I. Ilie, M. O’reilly and G. Ifrim Interpretable Time Series Classification using Linear Models and Multi-resolution Multi-domain Symbolic Representations in Data Mining and Knowledge Discovery (DAMI), May 2019, https://link.springer.com/article/10.1007/s10618-019-00633-3

[2] G. Ifrim, C. Wiuf “Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space” (KDD 2011)

In this notebook, we will demonstrate how to use Mr-SEQL for univariate time series classification with the ArrowHead dataset.

Imports

[1]:
from sklearn import metrics
from sklearn.model_selection import train_test_split

from sktime.classification.shapelet_based import MrSEQLClassifier
from sktime.datasets import load_arrow_head, load_basic_motions

Load data

For more details on the data set, see the univariate time series classification notebook.

[2]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)

Train and Test

Mr-SEQL can be configured to run in different mode with different symbolic representation.

seql_mode can be either ‘clf’ (SEQL as classifier) or ‘fs’ (SEQL as feature selection). If ‘fs’ mode is chosen, a logistic regression classifier will be trained with the features extracted by SEQL. ‘fs’ mode is more accurate in general.

symrep can include either ‘sax’ or ‘sfa’ or both. Using both usually produces a better result.

[3]:
# Create mrseql object
# use sax by default
ms = MrSEQLClassifier(seql_mode="clf")
# use sfa representations
# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sfa'])
# use sax and sfa representations
# ms = MrSEQLClassifier(seql_mode='fs', symrep=['sax', 'sfa'])

# fit training data
ms.fit(X_train, y_train)

# prediction
predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))
Accuracy with mr-seql: 0.887

Train and Test

Mr-SEQL also supports multivariate time series. Mr-SEQL extracts features from each dimension of the data independently.

[4]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[5]:
ms = MrSEQLClassifier()

# fit training data
ms.fit(X_train, y_train)

predicted = ms.predict(X_test)

# Classification accuracy
print("Accuracy with mr-seql: %2.3f" % metrics.accuracy_score(y_test, predicted))
Accuracy with mr-seql: 1.000

Generated using nbsphinx. The Jupyter notebook can be found here.