CSER: Communication-efficient SGD with Error Reset

07/26/2020
by   Cong Xie, et al.
2

The scalability of Distributed Stochastic Gradient Descent (SGD) is today limited by communication bottlenecks. We propose a novel SGD variant: Communication-efficient SGD with Error Reset, or CSER. The key idea in CSER is first a new technique called "error reset" that adapts arbitrary compressors for SGD, producing bifurcated local models with periodic reset of resulting local residual errors. Second we introduce partial synchronization for both the gradients and the models, leveraging advantages from them. We prove the convergence of CSER for smooth non-convex problems. Empirical results show that when combined with highly aggressive compressors, the CSER algorithms: i) cause no loss of accuracy, and ii) accelerate the training by nearly 10× for CIFAR-100, and by 4.5× for ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2019

Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates

Recent years have witnessed the growth of large-scale distributed machin...
research
06/11/2020

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Distributed parallel stochastic gradient descent algorithms are workhors...
research
09/11/2019

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

We analyze (stochastic) gradient descent (SGD) with delayed updates on s...
research
03/09/2020

Communication-Efficient Distributed SGD with Error-Feedback, Revisited

We show that the convergence proof of a recent algorithm called dist-EF-...
research
05/21/2020

rTop-k: A Statistical Estimation Approach to Distributed SGD

The large communication cost for exchanging gradients between different ...
research
03/21/2022

ImageNet Challenging Classification with the Raspberry Pi: An Incremental Local Stochastic Gradient Descent Algorithm

With rising powerful, low-cost embedded devices, the edge computing has ...
research
08/12/2019

Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations

Load imbalance pervasively exists in distributed deep learning training ...

Please sign up or login with your details

Forgot password? Click here to reset