Benchmarking Hashing Algorithms for Load Balancing in a Distributed Database Environment

11/01/2022
by   Alexander Slesarev, et al.
0

Modern high load applications store data using multiple database instances. Such an architecture requires data consistency, and it is important to ensure even distribution of data among nodes. Load balancing is used to achieve these goals. Hashing is the backbone of virtually all load balancing systems. Since the introduction of classic Consistent Hashing, many algorithms have been devised for this purpose. One of the purposes of the load balancer is to ensure storage cluster scalability. It is crucial for the performance of the whole system to transfer as few data records as possible during node addition or removal. The load balancer hashing algorithm has the greatest impact on this process. In this paper we experimentally evaluate several hashing algorithms used for load balancing, conducting both simulated and real system experiments. To evaluate algorithm performance, we have developed a benchmark suite based on Unidata MDM  – a scalable toolkit for various Master Data Management (MDM) applications. For assessment, we have employed three criteria  – uniformity of the produced distribution, the number of moved records, and computation speed. Following the results of our experiments, we have created a table, in which each algorithm is given an assessment according to the abovementioned criteria.

READ FULL TEXT
research
08/23/2019

Revisiting Consistent Hashing with Bounded Loads

Dynamic load balancing lies at the heart of distributed caching. Here, t...
research
05/24/2021

DynaHash: Efficient Data Rebalancing in Apache AsterixDB (Extended Version)

Parallel shared-nothing data management systems have been widely used to...
research
08/02/2023

DPA Load Balancer: Load balancing for Data Parallel Actor-based systems

In this project we explore ways to dynamically load balance actors in a ...
research
10/13/2019

Load Balancing Performance in Distributed Storage with Regular Balanced Redundancy

Contention at the storage nodes is the main cause of long and variable d...
research
12/30/2020

When Load Rebalancing Does Not Work for Distributed Hash Table

Distributed hash table (DHT) is the foundation of many widely used stora...
research
02/25/2020

Measuring Basic Load-Balancing and Fail-Over Setups for Email Delivery via DNS MX Records

The domain name system (DNS) has long provided means to assure basic loa...
research
07/16/2021

DxHash: A Scalable Consistent Hash Based on the Pseudo-Random Sequence

Consistent hasing has played a fundamental role as a data router and a l...

Please sign up or login with your details

Forgot password? Click here to reset