Adaptive Gradient Coding

06/08/2020
by   Hankun Cao, et al.
0

This paper focuses on mitigating the impact of stragglers in distributed learning system. Unlike the existing results designed for a fixed number of stragglers, we developed a new scheme called Adaptive Gradient Coding(AGC) with flexible tolerance of various number of stragglers. Our scheme gives an optimal tradeoff between computation load, straggler tolerance and communication cost. In particular, it allows to minimize the communication cost according to the real-time number of stragglers in the practical environments. Implementations on Amazon EC2 clusters using Python with mpi4py package verify the flexibility in several situations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2018

Communication-Computation Efficient Gradient Coding

This paper develops coding techniques to reduce the running time of dist...
research
05/14/2020

Communication-Efficient Gradient Coding for Straggler Mitigation in Distributed Learning

Distributed implementations of gradient-based methods, wherein a server ...
research
12/16/2022

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

We consider distributed learning in the presence of slow and unresponsiv...
research
01/23/2019

Fundamental Limits of Approximate Gradient Coding

It has been established that when the gradient coding problem is distrib...
research
03/02/2021

Optimal Communication-Computation Trade-Off in Heterogeneous Gradient Coding

Gradient coding allows a master node to derive the aggregate of the part...
research
05/22/2019

LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning

Gradient-based distributed learning in Parameter Server (PS) computing a...
research
02/06/2019

CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning

We focus on the commonly used synchronous Gradient Descent paradigm for ...

Please sign up or login with your details

Forgot password? Click here to reset