Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction

01/25/2021
by   Ali Hadian, et al.
0

Indexing large-scale databases in main memory is still challenging today. Learned index structures – in which the core components of classical indexes are replaced with machine learning models – have recently been suggested to significantly improve performance for read-only range queries. However, a recent benchmark study shows that learned indexes only achieve limited performance improvements for real-world data on modern hardware. More specifically, a learned model cannot learn the micro-level details and fluctuations of data distributions thus resulting in poor accuracy; or it can fit to the data distribution at the cost of training a big model whose parameters cannot fit into cache. As a consequence, querying a learned index on real-world data takes a substantial number of memory lookups, thereby degrading performance. In this paper, we adopt a different approach for modeling a data distribution that complements the model fitting approach of learned indexes. We propose Shift-Table, an algorithmic layer that captures the micro-level data distribution and resolves the local biases of a learned model at the cost of at most one memory lookup. Our suggested model combines the low latency of lookup tables with learned indexes and enables low-latency processing of range queries. Using Shift-Table, we achieve a speedup of 1.5X to 2X on real-world datasets compared to trained and tuned learned indexes.

READ FULL TEXT
research
02/02/2019

Learned Indexes for Dynamic Workloads

The recent proposal of learned index structures opens up a new perspecti...
research
05/28/2023

One stone, two birds: A lightweight multidimensional learned index with cardinality support

Innovative learning based structures have recently been proposed to tack...
research
05/08/2019

A Scalable Learned Index Scheme in Storage Systems

Index structures are important for efficient data access, which have bee...
research
01/04/2021

A Pluggable Learned Index Method via Sampling and Gap Insertion

Database indexes facilitate data retrieval and benefit broad application...
research
02/10/2020

A Computational Approach to Packet Classification

Multi-field packet classification is a crucial component in modern softw...
research
01/11/2018

Multidimensional Range Queries on Modern Hardware

Range queries over multidimensional data are an important part of databa...
research
11/27/2018

Cloud based Real-Time and Low Latency Scientific Event Analysis

Astronomy is well recognized as big data driven science. As the novel ob...

Please sign up or login with your details

Forgot password? Click here to reset