Variance Reduced Local SGD with Lower Communication Complexity

12/30/2019
by   Xianfeng Liang, et al.
0

To accelerate the training of machine learning models, distributed stochastic gradient descent (SGD) and its variants have been widely adopted, which apply multiple workers in parallel to speed up training. Among them, Local SGD has gained much attention due to its lower communication cost. Nevertheless, when the data distribution on workers is non-identical, Local SGD requires O(T^3/4 N^3/4) communications to maintain its linear iteration speedup property, where T is the total number of iterations and N is the number of workers. In this paper, we propose Variance Reduced Local SGD (VRL-SGD) to further reduce the communication complexity. Benefiting from eliminating the dependency on the gradient variance among workers, we theoretically prove that VRL-SGD achieves a linear iteration speedup with a lower communication complexity O(T^1/2 N^3/2) even if workers access non-identical datasets. We conduct experiments on three machine learning tasks, and the experimental results demonstrate that VRL-SGD performs impressively better than Local SGD when the data among workers are quite diverse.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2019

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

With the increase in the amount of data and the expansion of model scale...
research
06/11/2020

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Distributed parallel stochastic gradient descent algorithms are workhors...
research
06/03/2020

Local SGD With a Communication Overhead Depending Only on the Number of Workers

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
research
06/09/2021

Communication-efficient SGD: From Local SGD to One-Shot Averaging

We consider speeding up stochastic gradient descent (SGD) by parallelizi...
research
10/31/2022

Communication-Efficient Local SGD with Age-Based Worker Selection

A major bottleneck of distributed learning under parameter-server (PS) f...
research
01/21/2021

Clairvoyant Prefetching for Distributed Machine Learning I/O

I/O is emerging as a major bottleneck for machine learning training, esp...
research
07/01/2020

Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

As a crucial scheme to accelerate the deep neural network (DNN) training...

Please sign up or login with your details

Forgot password? Click here to reset