Effective and Efficient Variable-Length Data Series Analytics

by   Michele Linardi, et al.

In the last twenty years, data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different mining problems work by means of similarity search. In this regard, all the proposed solutions require the prior knowledge of the series length on which similarity search is performed. In several cases, the choice of the length is critical and sensibly influences the quality of the expected outcome. Unfortunately, the obvious brute-force solution, which provides an outcome for all lengths within a given range is computationally untenable. In this Ph.D. work, we present the first solutions that inherently support scalable and variable-length similarity search in data series, applied to sequence/subsequences matching, motif and discord discovery problems.The experimental results show that our approaches are up to orders of magnitude faster than the alternatives. They also demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed.



There are no comments yet.


page 1

page 2

page 3

page 4


Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery in Data Series

In the last fifteen years, data series motif and discord discovery have ...

Data Series Indexing Gone Parallel

Data series similarity search is a core operation for several data serie...

VALMOD: A Suite for Easy and Exact Detection of Variable Length Motifs in Data Series

Data series motif discovery represents one of the most useful primitives...

Scalable Data Series Subsequence Matching with ULISSE

Data series similarity search is an important operation and at the core ...

The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art

Increasingly large data series collections are becoming commonplace acro...

GENDIS: GENetic DIscovery of Shapelets

In the time series classification domain, shapelets are small time serie...

Efficiently Discovering Locally Exceptional yet Globally Representative Subgroups

Subgroup discovery is a local pattern mining technique to find interpret...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.