AirIndex: Versatile Index Tuning Through Data and Storage

06/26/2023
by   Supawit Chockchowwat, et al.
0

The end-to-end lookup latency of a hierarchical index – such as a B-tree or a learned index – is determined by its structure such as the number of layers, the kinds of branching functions appearing in each layer, the amount of data we must fetch from layers, etc. Our primary observation is that by optimizing those structural parameters (or designs) specifically to a target system's I/O characteristics (e.g., latency, bandwidth), we can offer a faster lookup compared to the ones that are not optimized. Can we develop a systematic method for finding those optimal design parameters? Ideally, the method must have the potential to generate almost any existing index or a novel combination of them for the fastest possible lookup. In this work, we present new data and an I/O-aware index builder (called AirIndex) that can find high-speed hierarchical index designs in a principled way. Specifically, AirIndex minimizes an objective function expressing the end-to-end latency in terms of various designs – the number of layers, types of layers, and more – for given data and a storage profile, using a graph-based optimization method purpose-built to address the computational challenges rising from the inter-dependencies among index layers and the exponentially many candidate parameters in a large search space. Our empirical studies confirm that AirIndex can find optimal index designs, build optimal indexes within the times comparable to existing methods, and deliver up to 4.1x faster lookup than a lightweight B-tree library (LMDB), 3.3x–46.3x faster than state-of-the-art learned indexes (RMI/CDFShop, PGM-Index, ALEX/APEX, PLEX), and 2.0 faster than Data Calculator's suggestion on various dataset and storage settings.

READ FULL TEXT
research
08/07/2022

Automatically Finding Optimal Index Structure

Existing learned indexes (e.g., RMI, ALEX, PGM) optimize the internal re...
research
10/16/2022

End-to-End Learning to Index and Search in Large Output Spaces

Extreme multi-label classification (XMC) is a popular framework for solv...
research
12/26/2021

Airphant: Cloud-oriented Document Indexing

Modern data warehouses can scale compute nodes independently of storage....
research
01/30/2018

A-Tree: A Bounded Approximate Index Structure

Index structures are one of the most important tools that DBAs leverage ...
research
03/01/2021

CARMI: A Cache-Aware Learned Index with a Cost-based Construction Algorithm

Learned indexes, which use machine learning models to replace traditiona...
research
07/15/2022

GLIN: A Lightweight Learned Indexing Mechanism for Complex Geometries

Although spatial index structures shorten the query response time, they ...
research
06/18/2023

A Survey on User-Space Storage and Its Implementations

The storage stack in the traditional operating system is primarily optim...

Please sign up or login with your details

Forgot password? Click here to reset