Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning

06/06/2022
by   Qi Wang, et al.
0

Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the L coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with Nvariables representing the partition of the L coordinates into N blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with N coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity O(N^2) to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity O(N). For the resultant maximization of the completion probability, we develop an iterative algorithm of...

READ FULL TEXT
research
09/18/2021

Optimization-based Block Coordinate Gradient Coding

Existing gradient coding schemes introduce identical redundancy across t...
research
01/27/2019

Heterogeneity-aware Gradient Coding for Straggler Tolerance

Gradient descent algorithms are widely used in machine learning. In orde...
research
01/30/2020

Numerically Stable Binary Gradient Coding

A major hurdle in machine learning is scalability to massive datasets. O...
research
04/30/2019

Harmonic Coding: An Optimal Linear Code for Privacy-Preserving Gradient-Type Computation

We consider the problem of distributedly computing a general class of fu...
research
05/16/2022

Two-Stage Coded Federated Edge Learning: A Dynamic Partial Gradient Coding Perspective

Federated edge learning (FEL) can training a global model from terminal ...
research
12/16/2020

Incentive Mechanism Design for Distributed Coded Machine Learning

A distributed machine learning platform needs to recruit many heterogene...
research
01/31/2022

Lightweight Projective Derivative Codes for Compressed Asynchronous Gradient Descent

Coded distributed computation has become common practice for performing ...

Please sign up or login with your details

Forgot password? Click here to reset