Straggler Mitigation in Distributed Optimization Through Data Encoding

11/14/2017
by   Can Karakus, et al.
0

Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an alternate approach where we embed the redundancy directly in the data itself, and allow the computation to proceed completely oblivious to encoding. We propose several encoding schemes, and demonstrate that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration. Moreover, this approximation can be controlled by the amount of redundancy and the number of nodes used in each iteration. We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2018

Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

Performance of distributed optimization and learning systems is bottlene...
research
04/15/2019

Distributed Matrix Multiplication Using Speed Adaptive Coding

While performing distributed computations in today's cloud-based platfor...
research
03/31/2018

Fundamental Resource Trade-offs for Encoded Distributed Optimization

Dealing with the shear size and complexity of today's massive data sets ...
research
03/04/2019

CodeNet: Training Large Scale Neural Networks in Presence of Soft-Errors

This work proposes the first strategy to make distributed training of ne...
research
05/13/2021

Approximate Gradient Coding for Heterogeneous Nodes

In distributed machine learning (DML), the training data is distributed ...
research
01/28/2019

ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding

We present ErasureHead, a new approach for distributed gradient descent ...
research
07/12/2017

Gradient Coding from Cyclic MDS Codes and Expander Graphs

Gradient Descent, and its variants, are a popular method for solving emp...

Please sign up or login with your details

Forgot password? Click here to reset