STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

10/06/2022
by   Feng Zhu, et al.
0

Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. In this paper, to mitigate stragglers and improve communication efficiency, a novel local SGD strategy, named STSyn, is developed. The key point is to wait for the K fastest workers, while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. An analysis of the average wall-clock time, average number of local updates and average number of uploading workers per round is provided to gauge the performance of STSyn. The convergence of STSyn is also rigorously established even when the objective function is nonconvex. Experimental results show the superiority of the proposed STSyn against state-of-the-art schemes through utilization of the straggler-tolerant technique and additional effective local updates at each worker, and the influence of system parameters is studied. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.

READ FULL TEXT

page 1

page 5

research
10/31/2022

Communication-Efficient Local SGD with Age-Based Worker Selection

A major bottleneck of distributed learning under parameter-server (PS) f...
research
01/12/2022

Adaptive Worker Grouping For Communication-Efficient and Straggler-Tolerant Distributed SGD

Wall-clock convergence time and communication load are key performance m...
research
07/07/2020

Divide-and-Shuffle Synchronization for Distributed Machine Learning

Distributed Machine Learning suffers from the bottleneck of synchronizat...
research
06/03/2020

Local SGD With a Communication Overhead Depending Only on the Number of Workers

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
research
10/06/2018

Anytime Stochastic Gradient Descent: A Time to Hear from all the Workers

In this paper, we focus on approaches to parallelizing stochastic gradie...
research
01/21/2023

ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less

Wall-clock convergence time and communication rounds are critical perfor...
research
03/07/2020

ShadowSync: Performing Synchronization in the Background for Highly Scalable Distributed Training

Distributed training is useful to train complicated models to shorten th...

Please sign up or login with your details

Forgot password? Click here to reset