ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

01/28/2019
by   Hongyi Wang, et al.
0

We present ErasureHead, a new approach for distributed gradient descent (GD) that mitigates system delays by employing approximate gradient coding. Gradient coded distributed GD uses redundancy to exactly recover the gradient at each iteration from a subset of compute nodes. ErasureHead instead uses approximate gradient codes to recover an inexact gradient at each iteration, but with higher delay tolerance. Unlike prior work on gradient coding, we provide a performance analysis that combines both delay and convergence guarantees. We establish that down to a small noise floor, ErasureHead converges as quickly as distributed GD and has faster overall runtime under a probabilistic delay model. We conduct extensive experiments on real world datasets and distributed clusters and demonstrate that our method can lead to significant speedups over both standard and gradient coded GD.

READ FULL TEXT
research
05/14/2019

Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning

We consider distributed gradient descent in the presence of stragglers. ...
research
11/17/2017

Approximate Gradient Coding via Sparse Random Graphs

Distributed algorithms are often beset by the straggler effect, where th...
research
03/14/2018

Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

Performance of distributed optimization and learning systems is bottlene...
research
05/25/2018

Gradient Coding via the Stochastic Block Model

Gradient descent and its many variants, including mini-batch stochastic ...
research
01/31/2022

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Coded distributed computation has become common practice for performing ...
research
06/17/2020

Approximate Gradient Coding with Optimal Decoding

In distributed optimization problems, a technique called gradient coding...
research
11/14/2017

Straggler Mitigation in Distributed Optimization Through Data Encoding

Slow running or straggler tasks can significantly reduce computation spe...

Please sign up or login with your details

Forgot password? Click here to reset