DeepAI AI Chat
Log In Sign Up

Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction

by   Ali Hadian, et al.

Indexing large-scale databases in main memory is still challenging today. Learned index structures – in which the core components of classical indexes are replaced with machine learning models – have recently been suggested to significantly improve performance for read-only range queries. However, a recent benchmark study shows that learned indexes only achieve limited performance improvements for real-world data on modern hardware. More specifically, a learned model cannot learn the micro-level details and fluctuations of data distributions thus resulting in poor accuracy; or it can fit to the data distribution at the cost of training a big model whose parameters cannot fit into cache. As a consequence, querying a learned index on real-world data takes a substantial number of memory lookups, thereby degrading performance. In this paper, we adopt a different approach for modeling a data distribution that complements the model fitting approach of learned indexes. We propose Shift-Table, an algorithmic layer that captures the micro-level data distribution and resolves the local biases of a learned model at the cost of at most one memory lookup. Our suggested model combines the low latency of lookup tables with learned indexes and enables low-latency processing of range queries. Using Shift-Table, we achieve a speedup of 1.5X to 2X on real-world datasets compared to trained and tuned learned indexes.


Learned Indexes for Dynamic Workloads

The recent proposal of learned index structures opens up a new perspecti...

One stone, two birds: A lightweight multidimensional learned index with cardinality support

Innovative learning based structures have recently been proposed to tack...

A Scalable Learned Index Scheme in Storage Systems

Index structures are important for efficient data access, which have bee...

A Pluggable Learned Index Method via Sampling and Gap Insertion

Database indexes facilitate data retrieval and benefit broad application...

A Computational Approach to Packet Classification

Multi-field packet classification is a crucial component in modern softw...

Multidimensional Range Queries on Modern Hardware

Range queries over multidimensional data are an important part of databa...

Cloud based Real-Time and Low Latency Scientific Event Analysis

Astronomy is well recognized as big data driven science. As the novel ob...