Multidimensional Range Queries on Modern Hardware

01/11/2018
by   Stefan Sprenger, et al.
0

Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15 is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation shows that, on current machines, the new rule of thumb for the threshold from which on scanning should be favored over parallel versions of classical MDIS should be set rather around 1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2020

MESSI: In-Memory Data Series Indexing

Data series similarity search is a core operation for several data serie...
research
06/05/2023

Fast Search-By-Classification for Large-Scale Databases Using Index-Aware Decision Trees and Random Forests

The vast amounts of data collected in various domains pose great challen...
research
10/25/2019

Overlay Indexes: Efficiently Supporting Aggregate Range Queries and Authenticated Data Structures in Off-the-Shelf Databases

Commercial off-the-shelf DataBase Management Systems (DBMSes) are highly...
research
03/02/2023

RTIndeX: Exploiting Hardware-Accelerated GPU Raytracing for Database Indexing

Data management on GPUs has become increasingly relevant due to a tremen...
research
12/26/2022

Hercules Against Data Series Similarity Search

We propose Hercules, a parallel tree-based technique for exact similarit...
research
10/17/2019

MV-PBT: Multi-Version Index for Large Datasets and HTAP Workloads

Modern mixed (HTAP) workloads execute fast update-transactions and long-...
research
01/25/2021

Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction

Indexing large-scale databases in main memory is still challenging today...

Please sign up or login with your details

Forgot password? Click here to reset