Fundamental Limits of Approximate Gradient Coding

01/23/2019
by   Sinong Wang, et al.
0

It has been established that when the gradient coding problem is distributed among n servers, the computation load (number of stored data partitions) of each worker is at least s+1 in order to resists s stragglers. This scheme incurs a large overhead when the number of stragglers s is large. In this paper, we focus on a new framework called approximate gradient coding to mitigate stragglers in distributed learning. We show that, to exactly recover the gradient with high probability, the computation load is lower bounded by O((n)/(n/s)). We also propose a code that exactly matches such lower bound. We identify a fundamental three-fold tradeoff for any approximate gradient coding scheme d≥ O((1/ϵ)/(n/s)), where d is the computation load, ϵ is the error of gradient. We give an explicit code construction based on random edge removal process that achieves the derived tradeoff. We implement our schemes and demonstrate the advantage of the approaches over the current fastest gradient coding strategies.

READ FULL TEXT
research
02/19/2021

On Gradient Coding with Partial Recovery

We consider a generalization of the recently proposed gradient coding fr...
research
06/20/2018

Storage, Computation, and Communication: A Fundamental Tradeoff in Distributed Computing

We consider a MapReduce-like distributed computing system. We derive a l...
research
04/30/2019

Gradient Coding Based on Block Designs for Mitigating Adversarial Stragglers

Distributed implementations of gradient-based methods, wherein a server ...
research
02/09/2018

Communication-Computation Efficient Gradient Coding

This paper develops coding techniques to reduce the running time of dist...
research
06/08/2020

Adaptive Gradient Coding

This paper focuses on mitigating the impact of stragglers in distributed...
research
04/25/2018

Fundamental Limits of Coded Linear Transform

In large scale distributed linear transform problems, coded computation ...
research
05/14/2020

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

Distributed implementations of gradient-based methods, wherein a server ...

Please sign up or login with your details

Forgot password? Click here to reset