Coded Data Rebalancing for Decentralized Distributed Databases

10/22/2020
by   K V Sushena Sree, et al.
0

The performance of replication-based distributed databases is affected due to non-uniform storage across storage nodes (also called data skew) and reduction in the replication factor during operation, particularly due to node additions or removals. Data rebalancing refers to the communication involved between the nodes in correcting this data skew, while maintaining the replication factor. For carefully designed distributed databases, transmitting coded symbols during the rebalancing phase has been recently shown to reduce the communication load of rebalancing. In this work, we look at balanced distributed databases with random placement, in which each data segment is stored in a random subset of r nodes in the system, where r refers to the replication factor of the distributed database. We call these as decentralized databases. For a natural class of such decentralized databases, we propose rebalancing schemes for correcting data skew and the reduction in the replication factor arising due to a single node addition or removal. We give converse arguments which show that our proposed rebalancing schemes are optimal asymptotically in the size of the file.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2022

Coded Data Rebalancing for Distributed Data Storage Systems with Cyclic Storage

We consider replication-based distributed storage systems in which each ...
research
01/14/2020

Coded Data Rebalancing: Fundamental Limits and Constructions

Distributed databases often suffer unequal distribution of data among st...
research
11/21/2017

Non-uniform Replication

Replication is a key technique in the design of efficient and reliable d...
research
11/27/2018

The Capacity of Private Information Retrieval from Decentralized Uncoded Caching Databases

We consider the private information retrieval (PIR) problem from decentr...
research
02/08/2021

Distributed Storage Allocations for Optimal Service Rates

Redundant storage maintains the performance of distributed systems under...
research
10/20/2020

An Umbrella Converse for Data Exchange: Applied to Caching, Computing, Shuffling Rebalancing

The problem of data exchange between multiple nodes with (not necessaril...
research
06/26/2019

Coded State Machine -- Scaling State Machine Execution under Byzantine Faults

We introduce an information-theoretic framework, named Coded State Machi...

Please sign up or login with your details

Forgot password? Click here to reset