Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search

06/20/2020
by   Karima Echihabi, et al.
0

Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has studied approximate similarity search techniques. We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domains, we describe modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees, and we conduct a thorough experimental evaluation to compare approximate similarity search techniques under a unified framework, on synthetic and real datasets in memory and on disk. Although data series differ from generic multidimensional vectors (series usually exhibit correlation between neighboring values), our results show that data series techniques answer approximate empirical performance, on data series and vectors alike. These techniques outperform the state-of-the-art approximate techniques for vectors when operating on disk, and remain competitive in memory.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

09/02/2020

Data Series Indexing Gone Parallel

Data series similarity search is a core operation for several data serie...
06/20/2020

The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art

Increasingly large data series collections are becoming commonplace acro...
09/02/2020

MESSI: In-Memory Data Series Indexing

Data series similarity search is a core operation for several data serie...
10/14/2021

Fast Data Series Indexing for In-Memory Data

Data series similarity search is a core operation for several data serie...
09/22/2020

Scalable Data Series Subsequence Matching with ULISSE

Data series similarity search is an important operation and at the core ...
01/04/2022

Elastic Product Quantization for Time Series

Analyzing numerous or long time series is difficult in practice due to t...
03/24/2020

Efficient Algorithms for Multidimensional Segmented Regression

We study the fundamental problem of fixed design multidimensional segme...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.