OSM-tree: A Sortedness-Aware Index

02/08/2022
by   Aneesh Raman, et al.
0

Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a result, when loading data, if we anticipate future selective (point or range) queries, we typically maintain an index that is gradually populated as new data is ingested. In that respect, indexing can be perceived as the process of adding structure to an incoming, otherwise unsorted, data collection. The process of adding structure comes at a cost, as instead of simply appending incoming data, every new entry is inserted into the index. If the data ingestion order matches the indexed attribute order, the ingestion cost is entirely redundant and can be avoided (e.g., via bulk loading in a B+-tree). However, state-of-the-art index designs do not benefit when data is ingested in an order that is close to being sorted but not fully sorted. In this paper, we study how indexes can benefit from partial data sortedness or near-sortedness, and we propose an ensemble of techniques that combine bulk loading, index appends, variable node fill/split factor, and buffering, to optimize the ingestion cost of a tree index in presence of partial data sortedness. We further augment the proposed design with necessary metadata structures to ensure competitive read performance. We apply the proposed design paradigm on a state-of-the-art B+-tree, and we propose the Ordered Sort-Merge tree (OSM-tree). OSM-tree outperforms the state of the art by up to 8.8x in ingestion performance in the presence of sortedness, while falling back to a B+-tree's ingestion performance when data is scrambled. OSM-tree offers competitive query performance, leading to performance benefits between 28 5x for mixed read/write workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 12

page 13

research
03/04/2020

Analysis of Indexing Structures for Immutable Data

In emerging applications such as blockchains and collaborative data anal...
research
03/04/2020

Analysis of Indexing Structures for Immutable Data (Full Version)

In emerging applications such as blockchains and collaborative data anal...
research
06/08/2020

Lethe: A Tunable Delete-Aware LSM Engine

Data-intensive applications fueled the evolution of log structured merge...
research
06/08/2020

Lethe: A Tunable Delete-Aware LSM Engine (Updated Version)

Data-intensive applications fueled the evolution of log structured merge...
research
06/05/2023

A Simple Yet High-Performing On-disk Learned Index: Can We Have Our Cake and Eat it Too?

While in-memory learned indexes have shown promising performance as comp...
research
02/01/2021

Jiffy: A Lock-free Skip List with Batch Updates and Snapshots

In this paper we introduce Jiffy, the first lock-free, linearizable orde...
research
01/30/2018

A-Tree: A Bounded Approximate Index Structure

Index structures are one of the most important tools that DBAs leverage ...

Please sign up or login with your details

Forgot password? Click here to reset