Tutorial: The Ubiquitous Skiplist, its Variants, and Applications in Modern Big Data Systems

The Skiplist, or skip list, originally designed as an in-memory data structure, has attracted a lot of attention in recent years as a main-memory component in many NoSQL, cloud-based, and big data systems. Unlike the B-tree, the skiplist does not need complex rebalancing mechanisms, but it still shows expected logarithmic performance. It supports a variety of operations, including insert, point read, and range queries. To make the skiplist more versatile, many optimizations have been applied to its node structure, construction algorithm, list structure, concurrent access, to name a few. Many variants of the skiplist have been proposed and experimented with, in many big-data system scenarios. In addition to being a main-memory component, the skiplist also serves as a core index in systems to address problems including write amplification, write stalls, sorting, range query processing, etc. In this tutorial, we present a comprehensive overview of the skiplist, its variants, optimizations, and various use cases of how big data and NoSQL systems make use of skiplists. Throughout this tutorial, we demonstrate the advantages of using a skiplist or skiplist-like structures in modern data systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2018

Wormhole: A Fast Ordered Index for In-memory Data Management

In-memory data management systems, such as key-value store, have become ...
research
12/31/2020

Bundled References: An Abstraction for Highly-Concurrent Linearizable Range Queries

We present bundled references, a new building block to provide lineariza...
research
07/26/2022

Implementing the Comparison-Based External Sort

In the age of big data, sorting is an indispensable operation for DBMSes...
research
06/15/2019

Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems

Modern big data systems run on cloud environments where resources are sh...
research
08/27/2018

Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems

In recent years, the Log Structured Merge (LSM) tree has been widely ado...
research
07/06/2020

Characterizing BigBench queries, Hive, and Spark in multi-cloud environments

BigBench is the new standard (TPCx-BB) for benchmarking and testing Big ...
research
11/21/2017

HybridTune: Spatio-temporal Data and Model Driven Performance Diagnosis for Big Data Systems

With tremendous growing interests in Big Data systems, analyzing and fac...

Please sign up or login with your details

Forgot password? Click here to reset