Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery in Data Series

by   Michele Linardi, et al.

In the last fifteen years, data series motif and discord discovery have emerged as two useful and well-used primitives for data series mining, with applications to many domains, including robotics, entomology, seismology, medicine, and climatology. Nevertheless, the state-of-the-art motif and discord discovery tools still require the user to provide the relative length. Yet, in several cases, the choice of length is critical and unforgiving. Unfortunately, the obvious brute-force solution, which tests all lengths within a given range, is computationally untenable. In this work, we introduce a new framework, which provides an exact and scalable motif and discord discovery algorithm that efficiently finds all motifs and discords in a given range of lengths. We evaluate our approach with five diverse real datasets, and demonstrate that it is up to 20 times faster than the state-of-the-art. Our results also show that removing the unrealistic assumption that the user knows the correct length, can often produce more intuitive and actionable results, which could have otherwise been missed. (Paper published in Data Mining and Knowledge Discovery Journal - 2020)



There are no comments yet.


page 31

page 34


VALMOD: A Suite for Easy and Exact Detection of Variable Length Motifs in Data Series

Data series motif discovery represents one of the most useful primitives...

Effective and Efficient Variable-Length Data Series Analytics

In the last twenty years, data series similarity search has emerged as a...

Efficient Discovery of Variable-length Time Series Motifs with Large Length Range in Million Scale Time Series

Detecting repeated variable-length patterns, also called variable-length...

Discovering Subdimensional Motifs of Different Lengths in Large-Scale Multivariate Time Series

Detecting repeating patterns of different lengths in time series, also c...

Self-Organizing Maps with Variable Input Length for Motif Discovery and Word Segmentation

Time Series Motif Discovery (TSMD) is defined as searching for patterns ...

Parallel Algorithm for Time Series Discords Discovery on the Intel Xeon Phi Knights Landing Many-core Processor

Discord is a refinement of the concept of anomalous subsequence of a tim...

Learning LWF Chain Graphs: A Markov Blanket Discovery Approach

This paper provides a graphical characterization of Markov blankets in c...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.