ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less

01/21/2023
by   Qiao Tan, et al.
0

Wall-clock convergence time and communication rounds are critical performance metrics in distributed learning with parameter-server setting. While synchronous methods converge fast but are not robust to stragglers; and asynchronous ones can reduce the wall-clock time per round but suffers from degraded convergence rate due to the staleness of gradients, it is natural to combine the two methods to achieve a balance. In this work, we develop a novel asynchronous strategy that leverages the advantages of both synchronous methods and asynchronous ones, named adaptive bounded staleness (ABS). The key enablers of ABS are two-fold. First, the number of workers that the PS waits for per round for gradient aggregation is adaptively selected to strike a straggling-staleness balance. Second, the workers with relatively high staleness are required to start a new round of computation to alleviate the negative effect of staleness. Simulation results are provided to demonstrate the superiority of ABS over state-of-the-art schemes in terms of wall-clock time and communication rounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2022

Adaptive Worker Grouping For Communication-Efficient and Straggler-Tolerant Distributed SGD

Wall-clock convergence time and communication load are key performance m...
research
12/15/2020

Anytime Minibatch with Delayed Gradients

Distributed optimization is widely deployed in practice to solve a broad...
research
11/08/2018

Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training

Distributed training of deep nets is an important technique to address s...
research
05/22/2019

LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning

Gradient-based distributed learning in Parameter Server (PS) computing a...
research
10/06/2022

STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

Synchronous local stochastic gradient descent (local SGD) suffers from s...
research
01/29/2019

Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation

Batch Bayesian optimisation (BO) has been successfully applied to hyperp...
research
04/22/2020

Derivation of Heard-Of Predicates From Elementary Behavioral Patterns

There are many models of distributed computing, and no unifying mathemat...

Please sign up or login with your details

Forgot password? Click here to reset