Momentum Provably Improves Error Feedback!

05/24/2023
by   Ilyas Fatkhullin, et al.
0

Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues. In particular, in the canonical nonconvex setting, all known variants of EF rely on very large batch sizes to converge, which can be prohibitive in practice. We propose a surprisingly simple fix which removes this issue both theoretically, and in practice: the application of Polyak's momentum to the latest incarnation of EF due to Richtárik et al. [2021] known as EF21. Our algorithm, for which we coin the name EF21-SGDM, improves the communication and sample complexities of previous error feedback algorithms under standard smoothness and bounded variance assumptions, and does not require any further strong assumptions such as bounded gradient dissimilarity. Moreover, we propose a double momentum version of our method that improves the complexities even further. Our proof seems to be novel even when compression is removed from the method, and as such, our proof technique is of independent interest in the study of nonconvex stochastic optimization enriched with Polyak's momentum.

READ FULL TEXT
research
06/09/2021

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Error feedback (EF), also known as error compensation, is an immensely p...
research
10/07/2021

EF21 with Bells Whistles: Practical Algorithmic Extensions of Modern Error Feedback

First proposed by Seide (2014) as a heuristic, error feedback (EF) is a ...
research
08/20/2022

Adam Can Converge Without Any Modification on Update Rules

Ever since Reddi et al. 2018 pointed out the divergence issue of Adam, m...
research
01/26/2019

Distributed Learning with Compressed Gradient Differences

Training very large machine learning models requires a distributed compu...
research
02/02/2022

3PC: Three Point Compressors for Communication-Efficient Distributed Training and a Better Theory for Lazy Aggregation

We propose and study a new class of gradient communication mechanisms fo...
research
05/30/2023

Clip21: Error Feedback for Gradient Clipping

Motivated by the increasing popularity and importance of large-scale tra...
research
10/07/2021

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

We study the MARINA method of Gorbunov et al (2021) – the current state-...

Please sign up or login with your details

Forgot password? Click here to reset