Data Replication for Reducing Computing Time inDistributed Systems with Stragglers
In distributed computing systems with stragglers,various forms of redundancy can improve the average delayperformance. We study the optimal replication of data in systemswhere the job execution time is a stochastically decreasing andconvex random variable. We show that in such systems, theoptimum assignment policy is the balanced replication of disjointbatches of data. Furthermore, for Exponential and Shifted-Exponential service times, we derive the optimum redundancylevels for minimizing both expected value and the variance ofthe job completion time. Our analysis shows that, the optimumredundancy level may not be the same for the two metrics, thusthere is a trade-off between reducing the expected value of thecompletion time and reducing its variance.
READ FULL TEXT