Glossary of Common Terms

The glossary below defines common terms and API elements used throughout sktime.

Note

The glossary is under development. Important terms are still missing. Please create a pull request if you want to add one.

Endogenous

Within a learning task endogenous variables are determined by exogenous variables or past timepoints of the variable itself. Also referred to as the dependent variable or target.

Exogenous

Within a learning task exogenous variables are external factors whose pattern of impact on tasks’ endogenous variables must be learned. Also referred to as independent variables or features.

Forecasting

A learning task focused on prediction future values of a time series. For more details, see the Forecasting.

Instance

A member of the set of entities being studied and which an ML practitioner wishes to generalize. For example, patients, chemical process runs, machines, countries, etc. May also be referred to as samples, examples, observations or records depending on the discipline and context.

Multivariate time series

Multiple time series. Typically observed for the same observational unit. Multivariate time series is typically used to refer to cases where the series evolve together over time. This is related, but different than the cases where a univariate time series is dependent on exogenous data.

Panel time series

A form of time series data where the same time series are observed observed for multiple observational units. The observed series may consist of univariate time series or multivariate time series. Accordingly, the data varies across time, observational unit and series (i.e. variables).

Reduction

Reduction refers to decomposing a given learning task into simpler tasks that can be composed to create a solution to the original task. In sktime reduction is used to allow one learning task to be adapted as a solution for an alternative task.

Scientific type

A class or object type to denote a category of objects defined by a common interface and data scientific purpose. For example, “forecaster” or “classifier”.

Scitype

See scientific type.

Seasonality

When a :term: time series is affected by seasonal characteristics such as the time of year or the day of the week, it is called a seasonal pattern. The duration of a season is always fixed and known.

Tabular

Is a setting where each timepoint of the univariate time series being measured for each instance are treated as features and stored as a primitive data type in the DataFrame’s cells. E.g., there are N instances of time series and each has T timepoints, this would yield a pandas DataFrame with shape (N, T): N rows, T columns.

Time series

Data where the variable measurements are ordered over time or an index indicating the position of an observation in the sequence of values.

Time series annotation

A learning task focused on labeling the timepoints of a time series. This includes the related tasks of outlier detection, anomaly detection, change point detection and segmentation.

Time series classification

A learning task focused on using the patterns across instances between the time series and a categorical target variable.

Time series clustering

A learning task focused on discovering groups consisting of instances with similar time series.

Time series regression

A learning task focused on using the patterns across instances between the time series and a continuous target variable.

Timepoint

The point in time that an observation is made. A time point may represent an exact point in time (a timestamp), a timeperiod (e.g. minutes, hours or days), or simply an index indicating the position of an observation in the sequence of values.

Trend

When data shows a long-term increase or decrease, this is referred to as a trend. Trends can also be non-linear.

Univariate time series

A single time series. While univariate analysis often only uses information contained in the series itself, univariate time series regression and forecasting can also include exogenous data.

Variable

Refers to some measurement of interest. Variables may be cross-sectional (e.g. time-invariant measurements like a patient’s place of birth) or time series.