Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage

05/12/2022
by   Athreya Chandramouli, et al.
0

We consider replication-based distributed storage systems in which each node stores the same quantum of data and each data bit stored has the same replication factor across the nodes. Such systems are referred to as balanced distributed databases. When existing nodes leave or new nodes are added to this system, the balanced nature of the database is lost, either due to the reduction in the replication factor, or the non-uniformity of the storage at the nodes. This triggers a rebalancing algorithm, that exchanges data between the nodes so that the balance of the database is reinstated. The goal is then to design rebalancing schemes with minimal communication load. In a recent work by Krishnan et al., coded transmissions were used to rebalance a carefully designed distributed database from a node removal or addition. These coded rebalancing schemes have optimal communication load, however, require the file-size to be at least exponential in the system parameters. In this work, we consider a cyclic balanced database (where data is cyclically placed in the system nodes) and present coded rebalancing schemes for node removal and addition in such a database. These databases (and the associated rebalancing schemes) require the file-size to be only cubic in the number of nodes in the system. We bound the advantage of our node removal rebalancing scheme over the uncoded scheme, and show that our scheme has a smaller communication load. In the node addition scenario, the rebalancing scheme presented is a simple uncoded scheme, which we show has optimal load.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2020

Coded Data Rebalancing: Fundamental Limits and Constructions

Distributed databases often suffer unequal distribution of data among st...
research
10/22/2020

Coded Data Rebalancing for Decentralized Distributed Databases

The performance of replication-based distributed databases is affected d...
research
10/13/2019

Load Balancing Performance in Distributed Storage with Regular Balanced Redundancy

Contention at the storage nodes is the main cause of long and variable d...
research
10/20/2020

An Umbrella Converse for Data Exchange: Applied to Caching, Computing, Shuffling Rebalancing

The problem of data exchange between multiple nodes with (not necessaril...
research
02/08/2021

Distributed Storage Allocations for Optimal Service Rates

Redundant storage maintains the performance of distributed systems under...
research
06/26/2019

Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

We introduce an information-theoretic framework, named Coded State Machi...
research
01/17/2022

Universal Coded Distributed Computing For MapReduce Frameworks

Coded distributed computing (CDC) can trade extra computing power to red...

Please sign up or login with your details

Forgot password? Click here to reset