MV-PBT: Multi-Version Index for Large Datasets and HTAP Workloads

10/17/2019
by   Christian Riegger, et al.
0

Modern mixed (HTAP) workloads execute fast update-transactions and long-running analytical queries on the same dataset and system. In multi-version (MVCC) systems, such workloads result in many short-lived versions and long version-chains as well as in increased and frequent maintenance overhead. Consequently, the index pressure increases significantly. Firstly, the frequent modifications cause frequent creation of new versions, yielding a surge in index maintenance overhead. Secondly and more importantly, index-scans incur extra I/O overhead to determine, which of the resulting tuple-versions are visible to the executing transaction (visibility-check) as current designs only store version/timestamp information in the base table – not in the index. Such index-only visibility-check is critical for HTAP workloads on large datasets. In this paper we propose the Multi-Version Partitioned B-Tree (MV-PBT) as a version-aware index structure, supporting index-only visibility checks and flash-friendly I/O patterns. The experimental evaluation indicates a 2x improvement for analytical queries and 15 transactional throughput under HTAP workloads (CH-Benchmark). MV-PBT offers 40 higher transactional throughput compared to WiredTiger's LSM-Tree implementation under YCSB.

READ FULL TEXT
research
03/20/2021

Greenplum: A Hybrid Database for Transactional and Analytical Workloads

Demand for enterprise data warehouse solutions to support real-time Onli...
research
09/13/2017

Accelerating Analytical Processing in MVCC using Fine-Granular High-Frequency Virtual Snapshotting

Efficient transactional management is a delicate task. As systems face t...
research
10/09/2022

Oze: Decentralized Graph-based Concurrency Control for Real-world Long Transactions on BoM Benchmark

In this paper, we propose Oze, a new concurrency control protocol that h...
research
03/23/2018

Efficient Single Writer Concurrency

In this paper we consider single writer multiple reader concurrency - an...
research
01/11/2018

Multidimensional Range Queries on Modern Hardware

Range queries over multidimensional data are an important part of databa...
research
09/27/2018

Obladi: Oblivious Serializable Transactions in the Cloud

This paper presents the design and implementation of Obladi, the first s...
research
06/07/2021

Balancing Garbage Collection vs I/O Amplification using hybrid Key-Value Placement in LSM-based Key-Value Stores

Key-value (KV) separation is a technique that introduces randomness in t...

Please sign up or login with your details

Forgot password? Click here to reset