Linearly Converging Error Compensated SGD

10/23/2020
by   Eduard Gorbunov, et al.
1

In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary compressions and delayed updates. Our framework is general enough to cover different variants of quantized SGD, Error-Compensated SGD (EC-SGD) and SGD with delayed updates (D-SGD). Via a single theorem, we derive the complexity results for all the methods that fit our framework. For the existing methods, this theorem gives the best-known complexity results. Moreover, using our general scheme, we develop new variants of SGD that combine variance reduction or arbitrary sampling with error feedback and quantization and derive the convergence rates for these methods beating the state-of-the-art results. In order to illustrate the strength of our framework, we develop 16 new methods that fit this. In particular, we propose the first method called EC-SGD-DIANA that is based on error-feedback for biased compression operator and quantization of gradient differences and prove the convergence guarantees showing that EC-SGD-DIANA converges to the exact optimum asymptotically in expectation with constant learning rate for both convex and strongly convex objectives when workers compute full gradients of their loss functions. Moreover, for the case when the loss function of the worker has the form of finite sum, we modified the method and got a new one called EC-LSVRG-DIANA which is the first distributed stochastic method with error feedback and variance reduction that converges to the exact optimum asymptotically in expectation with a constant learning rate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2019

A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent

In this paper we introduce a unified analysis of a large family of varia...
research
09/11/2019

The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication

We analyze (stochastic) gradient descent (SGD) with delayed updates on s...
research
12/09/2015

Efficient Distributed SGD with Variance Reduction

Stochastic Gradient Descent (SGD) has become one of the most popular opt...
research
03/09/2020

Communication-Efficient Distributed SGD with Error-Feedback, Revisited

We show that the convergence proof of a recent algorithm called dist-EF-...
research
04/19/2021

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

Virtually all state-of-the-art methods for training supervised machine l...
research
08/17/2021

Compressing gradients by exploiting temporal correlation in momentum-SGD

An increasing bottleneck in decentralized optimization is communication....
research
07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...

Please sign up or login with your details

Forgot password? Click here to reset