DynaHash: Efficient Data Rebalancing in Apache AsterixDB (Extended Version)

05/24/2021
by   Chen Luo, et al.
0

Parallel shared-nothing data management systems have been widely used to exploit a cluster of machines for efficient and scalable data processing. When a cluster needs to be dynamically scaled in or out, data must be efficiently rebalanced. Ideally, data rebalancing should have a low data movement cost, incur a small overhead on data ingestion and query processing, and be performed online without blocking reads or writes. However, existing parallel data management systems often exhibit certain limitations and drawbacks in terms of efficient data rebalancing. In this paper, we introduce DynaHash, an efficient data rebalancing approach that combines dynamic bucketing with extendible hashing for shared-nothing OLAP-style parallel data management systems. DynaHash dynamically partitions the records into a number of buckets using extendible hashing to achieve good a load balance with small rebalancing costs. We further describe an end-to-end implementation of the proposed approach inside an open-source Big Data Management System (BDMS), Apache AsterixDB. Our implementation exploits the out-of-place update design of LSM-trees to efficiently rebalance data without blocking concurrent reads and writes. Finally, we have conducted performance experiments using the TPC-H benchmark and we present the results here.

READ FULL TEXT

page 1

page 4

page 11

research
11/01/2022

Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment

Modern high load applications store data using multiple database instanc...
research
08/02/2023

DPA Load Balancer: Load balancing for Data Parallel Actor-based systems

In this project we explore ways to dynamically load balance actors in a ...
research
11/08/2019

Lock-Free Hopscotch Hashing

In this paper we present a lock-free version of Hopscotch Hashing. Hopsc...
research
03/03/2020

Dynamic Graph Operations: A Consistent Non-blocking Approach

Graph algorithms enormously contribute to the domains such as blockchain...
research
12/09/2017

Code Generation Techniques for Raw Data Processing

The motivation of the current study was to design an algorithm that can ...
research
06/24/2022

VIP Hashing – Adapting to Skew in Popularity of Data on the Fly (extended version)

All data is not equally popular. Often, some portion of data is more fre...
research
03/14/2023

One Size Cannot Fit All: a Self-Adaptive Dispatcher for Skewed Hash Join in Shared-nothing RDBMSs

Shared-nothing architecture has been widely adopted in various commercia...

Please sign up or login with your details

Forgot password? Click here to reset