Numerically Stable Binary Gradient Coding

01/30/2020
by   Neophytos Charalambides, et al.
0

A major hurdle in machine learning is scalability to massive datasets. One approach to overcoming this is to distribute the computational tasks among several workers. Gradient coding has been recently proposed in distributed optimization to compute the gradient of an objective function using multiple, possibly unreliable, worker nodes. By designing distributed coded schemes, gradient coded computations can be made resilient to stragglers, nodes with longer response time comparing to other nodes in a distributed network. Most such schemes rely on operations over the real or complex numbers and are inherently numerically unstable. We present a binary scheme which avoids such operations, thereby enabling numerically stable distributed computation of the gradient. Also, some restricting assumptions in prior work are dropped, and a more efficient decoding is given.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2021

Numerically Stable Binary Coded Computations

This paper addresses the gradient coding and coded matrix multiplication...
research
01/30/2020

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. A...
research
09/17/2020

Berrut Approximated Coded Computing: Straggler Resistance Beyond Polynomial Computing

One of the major challenges in using distributed learning to train compl...
research
11/17/2017

Approximate Gradient Coding via Sparse Random Graphs

Distributed algorithms are often beset by the straggler effect, where th...
research
05/13/2021

Approximate Gradient Coding for Heterogeneous Nodes

In distributed machine learning (DML), the training data is distributed ...
research
06/06/2022

Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

Gradient coding schemes effectively mitigate full stragglers in distribu...
research
07/18/2019

Random Convolutional Coding for Robust and Straggler Resilient Distributed Matrix Computation

Distributed matrix computations (matrix-vector and matrix-matrix multipl...

Please sign up or login with your details

Forgot password? Click here to reset