Cortex: Harnessing Correlations to Boost Query Performance

12/12/2020
by   Vikram Nathan, et al.
0

Databases employ indexes to filter out irrelevant records, which reduces scan overhead and speeds up query execution. However, this optimization is only available to queries that filter on the indexed attribute. To extend these speedups to queries on other attributes, database systems have turned to secondary and multi-dimensional indexes. Unfortunately, these approaches are restrictive: secondary indexes have a large memory footprint and can only speed up queries that access a small number of records, and multi-dimensional indexes cannot scale to more than a handful of columns. We present Cortex, an approach that takes advantage of correlations to extend the reach of primary indexes to more attributes. Unlike prior work, Cortex can adapt itself to any existing primary index, whether single or multi-dimensional, to harness a broad variety of correlations, such as those that exist between more than two attributes or have a large number of outliers. We demonstrate that on real datasets exhibiting these diverse types of correlations, Cortex matches or outperforms traditional secondary indexes with 5× less space, and it is 2-8× faster than existing approaches to indexing correlations.

READ FULL TEXT
research
03/27/2019

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations (Extended Version)

Database administrators construct secondary indexes on data tables to ac...
research
06/29/2020

Leveraging Soft Functional Dependencies for Indexing Multi-dimensional Data

A new proposal in database indexing has been for index structures to aut...
research
03/27/2019

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Database administrators construct secondary indexes on data tables to ac...
research
09/14/2020

Answering Multi-Dimensional Range Queries under Local Differential Privacy

In this paper, we tackle the problem of answering multi-dimensional rang...
research
01/09/2018

Search on Secondary Attributes in Geo-Distributed Systems

In the age of big data, more and more applications need to query and ana...
research
06/23/2020

Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads

Filtering data based on predicates is one of the most fundamental operat...
research
05/03/2023

MaskSearch: Querying Image Masks at Scale

Machine learning tasks over image databases often generate masks that an...

Please sign up or login with your details

Forgot password? Click here to reset