Load Balancing Performance in Distributed Storage with Regular Balanced Redundancy
Contention at the storage nodes is the main cause of long and variable data access times in distributed storage systems. Offered load on the system must be balanced across the storage nodes in order to minimize contention, and load balance in the system should be robust against the skews and fluctuations in content popularities. Data objects are replicated across multiple nodes in practice to allow for load balancing. However redundancy increases the storage requirement and should be used efficiently. We evaluate load balancing performance of natural storage schemes in which each data object is stored at d different nodes and each node stores the same number of objects. We find that load balance in a system of n nodes improves multiplicatively with d as long as d = o(log(n)), and improves exponentially as soon as d = Θ(log(n)). We show that the load balance in the system improves the same way with d when the service choices are created with XOR's of r objects rather than object replicas, which also reduces the storage overhead multiplicatively by r. However, unlike accessing an object replica, access through a recovery set composed by an XOR'ed object copy requires downloading content from r nodes, which increases the load imbalance in the system additively by r.
READ FULL TEXT