Error-bounded Approximate Time Series Joins using Compact Dictionary Representations of Time Series

12/24/2021
by   Chin-Chia Michael Yeh, et al.
6

The matrix profile is an effective data mining tool that provides similarity join functionality for time series data. Users of the matrix profile can either join a time series with itself using intra-similarity join (i.e., self-join) or join a time series with another time series using inter-similarity join. By invoking either or both types of joins, the matrix profile can help users discover both conserved and anomalous structures in the data. Since the introduction of the matrix profile five years ago, multiple efforts have been made to speed up the computation with approximate joins; however, the majority of these efforts only focus on self-joins. In this work, we show that it is possible to efficiently perform approximate inter-time series similarity joins with error bounded guarantees by creating a compact "dictionary" representation of time series. Using the dictionary representation instead of the original time series, we are able to improve the throughput of an anomaly mining system by at least 20X, with essentially no decrease in accuracy. As a side effect, the dictionaries also summarize the time series in a semantically meaningful way and can provide intuitive and actionable insights. We demonstrate the utility of our dictionary-based inter-time series similarity joins on domains as diverse as medicine and transportation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2018

Towards a Near Universal Time Series Data Mining Tool: Introducing the Matrix Profile

The last decade has seen a flurry of research on all-pairs-similarity-se...
research
04/18/2019

tsmp: An R Package for Time Series with Matrix Profile

This article describes tsmp, an R package that implements the matrix pro...
research
08/17/2018

Visualizing a Million Time Series with the Density Line Chart

Data analysts often need to work with multiple series of data---conventi...
research
03/25/2022

HYDRA: Competing convolutional kernels for fast and accurate time series classification

We demonstrate a simple connection between dictionary methods for time s...
research
05/03/2019

CompEngine: a self-organizing, living library of time-series data

Modern biomedical applications often involve time-series data, from high...
research
01/21/2014

Skill Analysis with Time Series Image Data

We present a skill analysis with time series image data using data minin...
research
06/16/2023

Calculating the matrix profile from noisy data

The matrix profile (MP) is a data structure computed from a time series ...

Please sign up or login with your details

Forgot password? Click here to reset