Ingestion, Indexing and Retrieval of High-Velocity Multidimensional Sensor Data on a Single Node

07/04/2017
by   Juan A. Colmenares, et al.
0

Multidimensional data are becoming more prevalent, partly due to the rise of the Internet of Things (IoT), and with that the need to ingest and analyze data streams at rates higher than before. Some industrial IoT applications require ingesting millions of records per second, while processing queries on recently ingested and historical data. Unfortunately, existing database systems suited to multidimensional data exhibit low per-node ingestion performance, and even if they can scale horizontally in distributed settings, they require large number of nodes to meet such ingest demands. For this reason, in this paper we evaluate a single-node multidimensional data store for high-velocity sensor data. Its design centers around a two-level indexing structure, wherein the global index is an in-memory R*-tree and the local indices are serialized kd-trees. This study is confined to records with numerical indexing fields and range queries, and covers ingest throughput, query response time, and storage footprint. We show that the adopted design streamlines data ingestion and offers ingress rates two orders of magnitude higher than those of Percona Server, SQLite, and Druid. Our prototype also reports query response times comparable to or better than those of Percona Server and Druid, and compares favorably in terms of storage footprint. In addition, we evaluate a kd-tree partitioning based scheme for grouping incoming streamed data records. Compared to a random scheme, this scheme produces less overlap between groups of streamed records, but contrary to what we expected, such reduced overlap does not translate into better query performance. By contrast, the local indices prove much more beneficial to query performance. We believe the experience reported in this paper is valuable to practitioners and researchers alike interested in building database systems for high-velocity multidimensional data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2021

Hierarchical Bitmap Indexing for Range and Membership Queries on Multidimensional Arrays

Traditional indexing techniques commonly employed in da­ta­ba­se systems...
research
01/30/2018

A-Tree: A Bounded Approximate Index Structure

Index structures are one of the most important tools that DBAs leverage ...
research
11/07/2021

Em-K Indexing for Approximate Query Matching in Large-scale ER

Accurate and efficient entity resolution (ER) is a significant challenge...
research
09/16/2021

SEACOW: Synopsis Embedded Array Compression using Wavelet Transform

Recently, multidimensional data is produced in various domains; because ...
research
01/09/2018

Search on Secondary Attributes in Geo-Distributed Systems

In the age of big data, more and more applications need to query and ana...
research
08/28/2019

Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries

Given a graph G, a source node s and a target node t, the personalized P...
research
09/04/2022

Towards Adaptive Storage Views in Virtual Memory

Traditionally, DBMSs separate their storage layer from their indexing la...

Please sign up or login with your details

Forgot password? Click here to reset