binder

Feature extraction with tsfresh transformer

In this tutorial, we show how you can use sktime with tsfresh to first extract features from time series, so that we can then use any scikit-learn estimator.

Preliminaries

You have to install tsfresh if you haven’t already. To install it, uncomment the cell below:

[1]:
# !pip install --upgrade tsfresh
[2]:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline

from sktime.datasets import load_arrow_head, load_basic_motions
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor

Univariate time series classification data

For more details on the data set, see the univariate time series classification notebook.

[3]:
X, y = load_arrow_head(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(158, 1) (158,) (53, 1) (53,)
[4]:
X_train.head()
[4]:
dim_0
69 0 -1.7998 1 -1.7987 2 -1.7942 3 ...
103 0 -1.8091 1 -1.8067 2 -1.7866 3 ...
34 0 -2.0417 1 -2.0572 2 -2.0522 3 ...
14 0 -2.1888 1 -2.1855 2 -2.1765 3 ...
121 0 -1.9586 1 -1.9371 2 -1.8798 3 ...
[5]:
#  binary classification task
np.unique(y_train)
[5]:
array(['0', '1', '2'], dtype=object)

Using tsfresh to extract features

[6]:
# tf = TsFreshTransformer()
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:10<00:00,  2.05s/it]
[6]:
dim_0__variance_larger_than_standard_deviation dim_0__has_duplicate_max dim_0__has_duplicate_min dim_0__has_duplicate dim_0__sum_values dim_0__abs_energy dim_0__mean_abs_change dim_0__mean_change dim_0__mean_second_derivative_central dim_0__median ... dim_0__fourier_entropy__bins_2 dim_0__fourier_entropy__bins_3 dim_0__fourier_entropy__bins_5 dim_0__fourier_entropy__bins_10 dim_0__fourier_entropy__bins_100 dim_0__permutation_entropy__dimension_3__tau_1 dim_0__permutation_entropy__dimension_4__tau_1 dim_0__permutation_entropy__dimension_5__tau_1 dim_0__permutation_entropy__dimension_6__tau_1 dim_0__permutation_entropy__dimension_7__tau_1
0 0.0 0.0 0.0 1.0 -0.000080 249.998516 0.052357 -0.000001 -0.000005 -0.024066 ... 0.046288 0.092513 0.092513 0.092513 0.250609 1.323194 1.819631 2.183824 2.463220 2.707387
1 0.0 0.0 1.0 1.0 -0.000525 250.000756 0.049118 0.000000 -0.000006 -0.031622 ... 0.046288 0.046288 0.092513 0.092513 0.184769 1.213529 1.668744 2.081159 2.418614 2.707518
2 0.0 0.0 0.0 1.0 -0.000034 249.998998 0.069971 0.000084 0.000025 0.018880 ... 0.081510 0.092513 0.092513 0.138673 0.311663 1.116706 1.545256 1.889777 2.155644 2.374722
3 0.0 0.0 0.0 1.0 0.000202 249.999702 0.067601 -0.000002 -0.000010 0.384770 ... 0.046288 0.092513 0.092513 0.204643 0.414263 1.323315 1.915330 2.406197 2.794719 3.117007
4 0.0 0.0 0.0 1.0 -0.000146 249.998674 0.050355 -0.000004 -0.000046 -0.045353 ... 0.046288 0.092513 0.092513 0.092513 0.230801 1.173933 1.628543 2.003443 2.303091 2.559695

5 rows × 773 columns

Using tsfresh with sktime

[7]:
classifier = make_pipeline(
    TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False),
    RandomForestClassifier(),
)
classifier.fit(X_train, y_train)
classifier.score(X_test, y_test)
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:11<00:00,  2.21s/it]
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:03<00:00,  1.45it/s]
[7]:
0.8490566037735849

Multivariate time series classification data

[8]:
X, y = load_basic_motions(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
(60, 6) (60,) (20, 6) (20,)
[9]:
#  multivariate input data
X_train.head()
[9]:
dim_0 dim_1 dim_2 dim_3 dim_4 dim_5
20 0 -0.294498 1 -0.294498 2 -0.050044 3... 0 0.540218 1 0.540218 2 -0.515245 3... 0 0.218114 1 0.218114 2 -0.301108 3... 0 -0.045277 1 -0.045277 2 0.103872 3... 0 -0.002663 1 -0.002663 2 -0.183773 3... 0 0.031960 1 0.031960 2 0.037287 3...
26 0 -0.761604 1 -0.761604 2 0.121078 3... 0 0.260125 1 0.260125 2 -1.423255 3... 0 -0.064487 1 -0.064487 2 0.075600 3... 0 0.069248 1 0.069248 2 -0.282318 3... 0 0.242367 1 0.242367 2 -0.332922 3... 0 -0.007990 1 -0.007990 2 0.239704 3...
7 0 -0.352746 1 -0.352746 2 -1.354561 3... 0 0.316845 1 0.316845 2 0.490525 3... 0 -0.473779 1 -0.473779 2 1.454261 3... 0 -0.327595 1 -0.327595 2 -0.269001 3... 0 0.106535 1 0.106535 2 0.021307 3... 0 0.197090 1 0.197090 2 0.460763 3...
8 0 -0.342233 1 -0.342233 2 -0.298542 3... 0 0.327415 1 0.327415 2 -0.527154 3... 0 0.157229 1 0.157229 2 0.248585 3... 0 0.394179 1 0.394179 2 -0.037287 3... 0 0.074574 1 0.074574 2 -0.087891 3... 0 -0.037287 1 -0.037287 2 -0.050604 3...
10 0 0.206148 1 0.206148 2 6.53436... 0 -0.658294 1 -0.658294 2 4.597327 3... 0 0.469612 1 0.469612 2 -2.723661 3... 0 -0.106535 1 -0.106535 2 -0.439456 3... 0 0.306288 1 0.306288 2 1.717875 3... 0 0.950824 1 0.950824 2 -1.041379 3...
[10]:
t = TSFreshFeatureExtractor(default_fc_parameters="efficient", show_warnings=False)
Xt = t.fit_transform(X_train)
Xt.head()
/Users/mloning/Documents/Research/software/sktime/sktime/sktime/transformations/panel/tsfresh.py:164: UserWarning: tsfresh requires a unique index, but found non-unique. To avoid this warning, please make sure the index of X contains only unique values.
  "tsfresh requires a unique index, but found "
Feature Extraction: 100%|██████████| 5/5 [00:18<00:00,  3.69s/it]
[10]:
dim_0__variance_larger_than_standard_deviation dim_0__has_duplicate_max dim_0__has_duplicate_min dim_0__has_duplicate dim_0__sum_values dim_0__abs_energy dim_0__mean_abs_change dim_0__mean_change dim_0__mean_second_derivative_central dim_0__median ... dim_5__fourier_entropy__bins_2 dim_5__fourier_entropy__bins_3 dim_5__fourier_entropy__bins_5 dim_5__fourier_entropy__bins_10 dim_5__fourier_entropy__bins_100 dim_5__permutation_entropy__dimension_3__tau_1 dim_5__permutation_entropy__dimension_4__tau_1 dim_5__permutation_entropy__dimension_5__tau_1 dim_5__permutation_entropy__dimension_6__tau_1 dim_5__permutation_entropy__dimension_7__tau_1
0 0.0 0.0 0.0 1.0 33.334188 110.735119 0.822452 0.000639 0.001751 0.164096 ... 0.165443 0.165443 0.165443 0.192626 0.545824 1.279774 1.910772 2.565051 3.096812 3.567632
1 1.0 0.0 0.0 1.0 73.888480 220.949429 0.964075 -0.002087 -0.003908 0.613719 ... 0.096509 0.096509 0.261160 0.261160 0.451359 1.313299 1.987599 2.593635 3.173890 3.696247
2 0.0 0.0 0.0 1.0 -17.428760 7.940863 0.170422 0.002326 -0.000244 -0.152038 ... 0.223718 0.261160 0.356468 0.545824 1.821690 1.438857 2.291659 3.140440 3.819994 4.207710
3 0.0 0.0 0.0 1.0 -18.154841 5.568890 0.135705 0.001051 0.000688 -0.196623 ... 0.399949 0.705356 1.127853 1.742820 3.274497 1.683010 2.766048 3.748502 4.303872 4.449241
4 1.0 0.0 0.0 1.0 395.985445 11192.658970 6.583700 0.099344 0.000000 8.608970 ... 0.165443 0.165443 0.165443 0.165443 0.706253 1.483926 2.279149 3.014130 3.525453 3.919983

5 rows × 4638 columns

Using tsfresh for forecasting

You can also use tsfresh to do univariate forecasting. To find out more about forecasting, check out our forecasting tutorial notebook.

[11]:
from sklearn.ensemble import RandomForestRegressor

from sktime.datasets import load_airline
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import make_reduction
from sktime.forecasting.model_selection import temporal_train_test_split

y = load_airline()
y_train, y_test = temporal_train_test_split(y)

regressor = make_pipeline(
    TSFreshFeatureExtractor(show_warnings=False, disable_progressbar=True),
    RandomForestRegressor(),
)
forecaster = make_reduction(
    regressor, scitype="time-series-regressor", window_length=12
)
forecaster.fit(y_train)

fh = ForecastingHorizon(y_test.index, is_relative=False)
y_pred = forecaster.predict(fh)

Generated using nbsphinx. The Jupyter notebook can be found here.