Fast Data Series Indexing for In-Memory Data

10/14/2021
by   Botao Peng, et al.
0

Data series similarity search is a core operation for several data series analysis applications across many different domains. However, the state-of-the-art techniques fail to deliver the time performance required for interactive exploration, or analysis of large data series collections. In this work, we propose MESSI, the first data series index designed for in-memory operation on modern hardware. Our index takes advantage of the modern hardware parallelization opportunities (i.e., SIMD instructions, multi-socket and multi-core architectures), in order to accelerate both index construction and similarity search processing times. Moreover, it benefits from a careful design in the setup and coordination of the parallel workers and data structures, so that it maximizes its performance for in-memory operations. MESSI supports similarity search using both the Euclidean and Dynamic Time Warping (DTW) distances. Our experiments with synthetic and real datasets demonstrate that overall MESSI is up to 4x faster at index construction, and up to 11x faster at query answering than the state-of-the-art parallel approach. MESSI is the first to answer exact similarity search queries on 100GB datasets in  50msec (30-75msec across diverse datasets), which enables real-time, interactive data exploration on very large data series collections.

READ FULL TEXT

page 18

page 19

page 22

research
09/02/2020

MESSI: In-Memory Data Series Indexing

Data series similarity search is a core operation for several data serie...
research
09/22/2020

Scalable Data Series Subsequence Matching with ULISSE

Data series similarity search is an important operation and at the core ...
research
01/26/2023

Odyssey: A Journey in the Land of Distributed Data Series Similarity Search

This paper presents Odyssey, a novel distributed data-series processing ...
research
04/17/2023

Dumpy: A Compact and Adaptive Index for Large Data Series Collections

Data series indexes are necessary for managing and analyzing the increas...
research
12/26/2022

ProS: Data Series Progressive k-NN Similarity Search and Classification with Probabilistic Quality Guarantees

Existing systems dealing with the increasing volume of data series canno...
research
06/20/2020

Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search

Data series are a special type of multidimensional data present in numer...
research
06/20/2020

The Lernaean Hydra of Data Series Similarity Search: An Experimental Evaluation of the State of the Art

Increasingly large data series collections are becoming commonplace acro...

Please sign up or login with your details

Forgot password? Click here to reset