Improved Latency-Communication Trade-Off for Map-Shuffle-Reduce Systems with Stragglers

08/20/2018
by   Jingjing Zhang, et al.
0

In a distributed computing system operating according to the map-shuffle-reduce framework, coding data prior to storage can be useful both to reduce the latency caused by straggling servers and to decrease the inter-server communication load in the shuffling phase. In prior work, a concatenated coding scheme was proposed for a matrix multiplication task. In this scheme, the outer Maximum Distance Separable (MDS) code is leveraged to correct erasures caused by stragglers, while the inner repetition code is used to improve the communication efficiency in the shuffling phase by means of coded multicasting. In this work, it is demonstrated that it is possible to leverage the redundancy created by repetition coding in order to increase the rate of the outer MDS code and hence to increase the multicasting opportunities in the shuffling phase. As a result, the proposed approach is shown to improve over the best known latency-communication overhead trade-off.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/13/2020

Improved Computation-Communication Trade-Off for Coded Distributed Computing using Linear Dependence of Intermediate Values

In large scale distributed computing systems, communication overhead is ...
research
05/03/2022

Private Matrix Multiplication From MDS-Coded Storage With Colluding Servers

In this paper, we study the two problems of Private and Secure Matrix Mu...
research
05/14/2020

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

Distributed implementations of gradient-based methods, wherein a server ...
research
05/24/2018

Coded FFT and Its Communication Overhead

We propose a coded computing strategy and examine communication costs of...
research
12/21/2017

Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers

We propose two coded schemes for the distributed computing problem of mu...
research
10/25/2018

Wireless Map-Reduce Distributed Computing with Full-Duplex Radios and Imperfect CSI

Consider a distributed computing system in which the worker nodes are co...
research
01/22/2019

CAMR: Coded Aggregated MapReduce

Many big data algorithms executed on MapReduce-like systems have a shuff...

Please sign up or login with your details

Forgot password? Click here to reset