KV-match: An Efficient Subsequence Matching Approach for Large Scale Time Series

10/02/2017
by   Jiaye Wu, et al.
0

Time series data have exploded due to the popularity of new applications, like data center management and IoT. Time series data management system (TSDB), emerges to store and query the large volume of time series data. Subsequence matching is critical in many time series mining algorithms, and extensive approaches have been proposed. However, the shift of distributed storage system and the performance gap make these approaches not compatible with TSDB. To fill this gap, we propose a new index structure, KV-index, and the corresponding matching algorithm, KV-match. KV-index is a file-based structure, which can be easily implemented on local files, HDFS or HBase tables. KV-match algorithm probes the index efficiently with a few sequential scans. Moreover, two optimization techniques, window reduction and window reordering, are proposed to further accelerate the processing. To support the query of arbitrary lengths, we extend KV-match to KV-match_DP, which utilizes multiple varied length indexes to process the query simultaneously. A two-dimensional dynamic programming algorithm is proposed to find the optimal query segmentation. We implement our approach on both local files and HBase tables, and conduct extensive experiments on synthetic and real-world datasets. Results show that our index is of comparable size to the popular tree-style index while our query processing is order of magnitudes more efficient.

READ FULL TEXT
research
10/10/2019

Time series classification for varying length series

Research into time series classification has tended to focus on the case...
research
10/17/2018

A Periodicity-based Parallel Time Series Prediction Algorithm in Cloud Computing Environments

In the era of big data, practical applications in various domains contin...
research
06/13/2009

Exact Indexing for Massive Time Series Databases under Time Warping Distance

Among many existing distance measures for time series data, Dynamic Time...
research
02/06/2023

Using Learned Indexes to Improve Time Series Indexing Performance on Embedded Sensor Devices

Efficiently querying data on embedded sensor and IoT devices is challeng...
research
03/17/2019

Time Series Predict DB

In this work, we are motivated to make predictive functionalities native...
research
04/19/2021

Local Similarity Search on Geolocated Time Series Using Hybrid Indexing

Geolocated time series, i.e., time series associated with certain locati...
research
05/19/2022

Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow

In modeling time series data, we often need to augment the existing data...

Please sign up or login with your details

Forgot password? Click here to reset