Error Feedback Fixes SignSGD and other Gradient Compression Schemes

01/28/2019
by   Sai Praneeth Karimireddy, et al.
0

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator. We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm EF-SGD achieves the same rate of convergence as SGD without any additional assumptions for arbitrary compression operators (including the sign operator), indicating that we get gradient compression for free. Our experiments thoroughly substantiate the theory showing the superiority of our algorithm.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...
09/04/2020

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, ...
02/13/2018

signSGD: compressed optimisation for non-convex problems

Training large neural networks requires distributing learning across mul...
10/23/2020

Linearly Converging Error Compensated SGD

In this paper, we propose a unified analysis of variants of distributed ...
03/25/2021

Compressed Gradient Tracking Methods for Decentralized Optimization with Linear Convergence

Communication compression techniques are of growing interests for solvin...
03/09/2020

Communication-Efficient Distributed SGD with Error-Feedback, Revisited

We show that the convergence proof of a recent algorithm called dist-EF-...
09/28/2020

On Efficient Constructions of Checkpoints

Efficient construction of checkpoints/snapshots is a critical tool for t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.