Gradient Coding from Cyclic MDS Codes and Expander Graphs

07/12/2017
by   Netanel Raviv, et al.
0

Gradient Descent, and its variants, are a popular method for solving empirical risk minimization problems in machine learning. However, if the size of the training set is large, a computational bottleneck is the computation of the gradient, and hence, it is common to distribute the training set among worker nodes. Doing this in a synchronous fashion faces yet another challenge of stragglers (i.e., slow or unavailable nodes) which might cause a considerable delay, and hence, schemes for mitigation of stragglers are essential. It was recently shown by Tandon et al. that stragglers can be avoided by carefully assigning redundant computations to the worker nodes and coding across partial gradients, and a randomized construction for the coding was given. In this paper we obtain a comparable deterministic scheme by employing cyclic MDS codes. In addition, we propose replacing the exact computation of the gradient with an approximate one; a technique which drastically increases the straggler tolerance, and stems from adjacency matrices of expander graphs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2022

Nested Gradient Codes for Straggler Mitigation in Distributed Machine Learning

We consider distributed learning in the presence of slow and unresponsiv...
research
06/15/2021

The subfield codes and subfield subcodes of a family of MDS codes

Maximum distance separable (MDS) codes are very important in both theory...
research
11/24/2022

Sequential Gradient Coding For Straggler Mitigation

In distributed computing, slower nodes (stragglers) usually become a bot...
research
05/06/2021

Coded Gradient Aggregation: A Tradeoff Between Communication Costs at Edge Nodes and at Helper Nodes

The increasing amount of data generated at the edge/client nodes and the...
research
10/27/2017

Near-Optimal Straggler Mitigation for Distributed Gradient Methods

Modern learning algorithms use gradient descent updates to train inferen...
research
05/13/2021

Approximate Gradient Coding for Heterogeneous Nodes

In distributed machine learning (DML), the training data is distributed ...
research
11/14/2017

Straggler Mitigation in Distributed Optimization Through Data Encoding

Slow running or straggler tasks can significantly reduce computation spe...

Please sign up or login with your details

Forgot password? Click here to reset