Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

01/31/2022
by   Pedro Soto, et al.
1

Coded distributed computation has become common practice for performing gradient descent on large datasets to mitigate stragglers and other faults. This paper proposes a novel algorithm that encodes the partial derivatives themselves and furthermore optimizes the codes by performing lossy compression on the derivative codewords by maximizing the information contained in the codewords while minimizing the information between the codewords. The utility of this application of coding theory is a geometrical consequence of the observed fact in optimization research that noise is tolerable, sometimes even helpful, in gradient descent based learning algorithms since it helps avoid overfitting and local minima. This stands in contrast with much current conventional work on distributed coded computation which focuses on recovering all of the data from the workers. A second further contribution is that the low-weight nature of the coding scheme allows for asynchronous gradient updates since the code can be iteratively decoded; i.e., a worker's task can immediately be updated into the larger gradient. The directional derivative is always a linear function of the direction vectors; thus, our framework is robust since it can apply linear coding techniques to general machine learning frameworks such as deep neural networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/27/2019

Heterogeneity-aware Gradient Coding for Straggler Tolerance

Gradient descent algorithms are widely used in machine learning. In orde...
research
04/15/2019

Distributed Matrix Multiplication Using Speed Adaptive Coding

While performing distributed computations in today's cloud-based platfor...
research
05/21/2016

Make Workers Work Harder: Decoupled Asynchronous Proximal Stochastic Gradient Descent

Asynchronous parallel optimization algorithms for solving large-scale ma...
research
01/28/2019

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

We present ErasureHead, a new approach for distributed gradient descent ...
research
07/03/2018

Private Information Retrieval in Asynchronous Coded Computation

We firstly consider fully asynchronous coded computation for matrix mult...
research
01/15/2019

Distributed Stochastic Gradient Descent Using LDGM Codes

We consider a distributed learning problem in which the computation is c...
research
06/06/2022

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Gradient coding schemes effectively mitigate full stragglers in distribu...

Please sign up or login with your details

Forgot password? Click here to reset