EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

06/09/2021
by   Peter Richtarik, et al.
0

Error feedback (EF), also known as error compensation, is an immensely popular convergence stabilization mechanism in the context of distributed training of supervised machine learning models enhanced by the use of contractive communication compression mechanisms, such as Top-k. First proposed by Seide et al (2014) as a heuristic, EF resisted any theoretical understanding until recently [Stich et al., 2018, Alistarh et al., 2018]. However, all existing analyses either i) apply to the single node setting only, ii) rely on very strong and often unreasonable assumptions, such global boundedness of the gradients, or iterate-dependent assumptions that cannot be checked a-priori and may not hold in practice, or iii) circumvent these issues via the introduction of additional unbiased compressors, which increase the communication cost. In this work we fix all these deficiencies by proposing and analyzing a new EF mechanism, which we call EF21, which consistently and substantially outperforms EF in practice. Our theoretical analysis relies on standard assumptions only, works in the distributed heterogeneous data setting, and leads to better and more meaningful rates. In particular, we prove that EF21 enjoys a fast O(1/T) convergence rate for smooth nonconvex problems, beating the previous bound of O(1/T^2/3), which was shown a bounded gradients assumption. We further improve this to a fast linear rate for PL functions, which is the first linear convergence result for an EF-type method not relying on unbiased compressors. Since EF has a large number of applications where it reigns supreme, we believe that our 2021 variant, EF21, can a large impact on the practice of communication efficient distributed learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2021

EF21 with Bells Whistles: Practical Algorithmic Extensions of Modern Error Feedback

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a ...
research
05/24/2023

Momentum Provably Improves Error Feedback!

Due to the high communication overhead when training machine learning mo...
research
01/26/2019

Distributed Learning with Compressed Gradient Differences

Training very large machine learning models requires a distributed compu...
research
02/02/2022

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms fo...
research
08/20/2022

Adam Can Converge Without Any Modification on Update Rules

Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, m...
research
05/30/2023

Clip21: Error Feedback for Gradient Clipping

Motivated by the increasing popularity and importance of large-scale tra...
research
06/19/2020

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Modern large-scale machine learning applications require stochastic opti...

Please sign up or login with your details

Forgot password? Click here to reset