Real-Time LSM-Trees for HTAP Workloads

01/17/2021
by   Hemant Saxena, et al.
0

Real-time data analytics systems such as SAP HANA, MemSQL, and IBM Wildfire employ hybrid data layouts, in which data are stored in different formats throughout their lifecycle. Recent data are stored in a row-oriented format to serve OLTP workloads and support high data rates, while older data are transformed to a column-oriented format for OLAP access patterns. We observe that a Log-Structured Merge (LSM) Tree is a natural fit for a lifecycle-aware storage engine due to its high write throughput and level-oriented structure, in which records propagate from one level to the next over time. To build a lifecycle-aware storage engine using an LSM-Tree, we make a crucial modification to allow different data layouts in different levels, ranging from purely row-oriented to purely column-oriented, leading to a Real-Time LSM-Tree. We give a cost model and an algorithm to design a Real-Time LSM-Tree that is suitable for a given workload, followed by an experimental evaluation of LASER - a prototype implementation of our idea built on top of the RocksDB key-value store. In our evaluation, LASER is almost 5x faster than Postgres (a pure row-store) and two orders of magnitude faster than MonetDB (a pure column-store) for real-time data analytics workloads.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2022

Spatial Parquet: A Column File Format for Geospatial Data Lakes [Extended Version]

Modern data analytics applications prefer to use column-storage formats ...
research
02/03/2020

Optimizing Query Predicates with Disjunctions for Column Stores

Since its inception, database research has given limited attention to op...
research
06/18/2017

Evolutionary Data Systems

Anyone in need of a data system today is confronted with numerous comple...
research
03/28/2014

DimmWitted: A Study of Main-Memory Statistical Analytics

We perform the first study of the tradeoff space of access methods and r...
research
04/22/2020

Qd-tree: Learning Data Layouts for Big Data Analytics

Corporations today collect data at an unprecedented and accelerating sca...
research
10/13/2019

LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List Scans

The specific characteristics of graph workloads make it hard to design a...
research
04/17/2023

Hybrid Materialization in a Disk-Based Column-Store

In column-oriented query processing, a materialization strategy determin...

Please sign up or login with your details

Forgot password? Click here to reset