Communication-Efficient Local SGD with Age-Based Worker Selection

10/31/2022
by   Feng Zhu, et al.
0

A major bottleneck of distributed learning under parameter-server (PS) framework is communication cost due to frequent bidirectional transmissions between the PS and workers. To address this issue, local stochastic gradient descent (SGD) and worker selection have been exploited by reducing the communication frequency and the number of participating workers at each round, respectively. However, partial participation can be detrimental to convergence rate, especially for heterogeneous local datasets. In this paper, to improve communication efficiency and speed up the training process, we develop a novel worker selection strategy named AgeSel. The key enabler of AgeSel is utilization of the ages of workers to balance their participation frequencies. The convergence of local SGD with the proposed age-based partial worker participation is rigorously established. Simulation results demonstrate that the proposed AgeSel strategy can significantly reduce the number of training rounds needed to achieve a targeted accuracy, as well as the communication cost. The influence of the algorithm hyper-parameter is also explored to manifest the benefit of age-based worker selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2022

STSyn: Speeding Up Local SGD with Straggler-Tolerant Synchronization

Synchronous local stochastic gradient descent (local SGD) suffers from s...
research
12/30/2019

Variance Reduced Local SGD with Lower Communication Complexity

To accelerate the training of machine learning models, distributed stoch...
research
11/12/2020

Distributed Sparse SGD with Majority Voting

Distributed learning, particularly variants of distributed stochastic gr...
research
11/20/2019

Understanding Top-k Sparsification in Distributed Deep Learning

Distributed stochastic gradient descent (SGD) algorithms are widely depl...
research
06/03/2020

Local SGD With a Communication Overhead Depending Only on the Number of Workers

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
research
06/09/2021

Communication-efficient SGD: From Local SGD to One-Shot Averaging

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
research
10/09/2019

Straggler-Agnostic and Communication-Efficient Distributed Primal-Dual Algorithm for High-Dimensional Data Mining

Recently, reducing communication time between machines becomes the main ...

Please sign up or login with your details

Forgot password? Click here to reset