Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search

06/20/2020
by   Karima Echihabi, et al.
0

Data series are a special type of multidimensional data present in numerous domains, where similarity search is a key operation that has been extensively studied in the data series literature. In parallel, the multidimensional community has studied approximate similarity search techniques. We propose a taxonomy of similarity search techniques that reconciles the terminology used in these two domains, we describe modifications to data series indexing techniques enabling them to answer approximate similarity queries with quality guarantees, and we conduct a thorough experimental evaluation to compare approximate similarity search techniques under a unified framework, on synthetic and real datasets in memory and on disk. Although data series differ from generic multidimensional vectors (series usually exhibit correlation between neighboring values), our results show that data series techniques answer approximate empirical performance, on data series and vectors alike. These techniques outperform the state-of-the-art approximate techniques for vectors when operating on disk, and remain competitive in memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2020

Data Series Indexing Gone Parallel

Data series similarity search is a core operation for several data serie...
research
06/20/2020

The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art

Increasingly large data series collections are becoming commonplace acro...
research
10/14/2021

Fast Data Series Indexing for In-Memory Data

Data series similarity search is a core operation for several data serie...
research
12/26/2022

Hercules Against Data Series Similarity Search

We propose Hercules, a parallel tree-based technique for exact similarit...
research
08/04/2022

Unconventional application of k-means for distributed approximate similarity search

Similarity search based on a distance function in metric spaces is a fun...
research
12/26/2022

ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Existing systems dealing with the increasing volume of data series canno...
research
09/22/2020

Effective and Efficient Variable-Length Data Series Analytics

In the last twenty years, data series similarity search has emerged as a...

Please sign up or login with your details

Forgot password? Click here to reset