Hercules Against Data Series Similarity Search

12/26/2022
by   Karima Echihabi, et al.
0

We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets. This paper was published in the Proceedings of the VLDB Endowment, Volume 15, Number 10, June 2022.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2020

ParIS+: Data Series Indexing on Multi-Core Architectures

Data series similarity search is a core operation for several data serie...
research
03/14/2019

A Hybrid Data Cleaning Framework using Markov Logic Networks

With the increase of dirty data, data cleaning turns into a crux of data...
research
09/02/2020

MESSI: In-Memory Data Series Indexing

Data series similarity search is a core operation for several data serie...
research
06/20/2020

Coconut: a scalable bottom-up approach for building data series indexes

Many modern applications produce massive amounts of data series that nee...
research
06/20/2020

Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search

Data series are a special type of multidimensional data present in numer...
research
06/20/2020

Coconut: sortable summarizations for scalable indexes over static and streaming data series

Many modern applications produce massive streams of data series that nee...
research
01/11/2018

Multidimensional Range Queries on Modern Hardware

Range queries over multidimensional data are an important part of databa...

Please sign up or login with your details

Forgot password? Click here to reset