Leveraging Soft Functional Dependencies for Indexing Multi-dimensional Data

06/29/2020
by   Behzad Ghaffari, et al.
0

A new proposal in database indexing has been for index structures to automatically learn and use the distribution of the underlying data to improve their performance. Initial work on learned indexes has repeatedly shown that by learning the distribution of the data, index structures such as the B-Tree, can boost their performance by an order of magnitude while using a smaller memory footprint. In this work we propose a new class of learned indexes for multidimensional data that instead of learning only from distribution of keys, learns from correlations between columns of the dataset. Our approach is motivated by the observation that in real datasets, correlation between two or more attributes of the data is a common occurrence. This idea of learning from functional dependencies has been previously explored and implemented in many state of the art query optimisers to predict selectivity of queries and come up with better query plans. In this project we aim to take the use of learned functional dependencies a step further in databases. Consequently, we focus on using learned functional dependencies to reduce the dimensionality of datasets. With this we attempt to work around the curse of dimensionality - which in the context of spatial data stipulates that with every additional dimension, the performance of an index deteriorates further - to accelerate query execution. In more precise terms, we learn how to infer one (or multiple) attributes from the remaining attributes and hence no longer need to index predicted columns. This method reduces the dimensionality of the index and thus makes it more efficient. We show experimentally that by predicting correlated attributes in the data, rather than indexing them, we can improve the query execution time and reduce the memory overhead of the index at the same time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2020

Cortex: Harnessing Correlations to Boost Query Performance

Databases employ indexes to filter out irrelevant records, which reduces...
research
03/27/2019

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations (Extended Version)

Database administrators construct secondary indexes on data tables to ac...
research
06/29/2020

Hands-off Model Integration in Spatial Index Structures

Spatial indexes are crucial for the analysis of the increasing amounts o...
research
03/27/2019

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Database administrators construct secondary indexes on data tables to ac...
research
07/15/2022

GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries

Although spatial index structures shorten the query response time, they ...
research
02/06/2023

Using Learned Indexes to Improve Time Series Indexing Performance on Embedded Sensor Devices

Efficiently querying data on embedded sensor and IoT devices is challeng...
research
06/23/2020

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

Filtering data based on predicates is one of the most fundamental operat...

Please sign up or login with your details

Forgot password? Click here to reset