DeepAI AI Chat
Log In Sign Up

catch22: CAnonical Time-series CHaracteristics

by   Carl H Lubba, et al.
The University of Sydney
Imperial College London

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147 000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a generically useful set of 22 CAnonical Time-series CHaracteristics, catch22. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.


Highly comparative feature-based time-series classification

A highly comparative, feature-based approach to time series classificati...

Highly comparative time-series analysis: The empirical structure of time series and their methods

The process of collecting and organizing sets of observations represents...

An Empirical Evaluation of Time-Series Feature Sets

Solving time-series problems with features has been rising in popularity...

CompEngine: a self-organizing, living library of time-series data

Modern biomedical applications often involve time-series data, from high...

An Interpretable Baseline for Time Series Classification Without Intensive Learning

Recent advances in time series classification have largely focused on me...

Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts

This study investigates global properties of literary and non-literary t...

The Wasserstein-Fourier Distance for Stationary Time Series

We introduce a novel framework for analysing stationary time series base...