Shifted Compression Framework: Generalizations and Improvements

06/21/2022
by   Egor Shulgin, et al.
10

Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models, and lossy compression of exchanged information, such as stochastic gradients or models, is one of the most effective instruments to alleviate this issue. Among the most studied compression techniques is the class of unbiased compression operators with variance bounded by a multiple of the square norm of the vector we wish to compress. By design, this variance may remain high, and only diminishes if the input vector approaches zero. However, unless the model being trained is overparameterized, there is no a-priori reason for the vectors we wish to compress to approach zero during the iterations of classical methods such as distributed compressed SGD, which has adverse effects on the convergence speed. Due to this issue, several more elaborate and seemingly very different algorithms have been proposed recently, with the goal of circumventing this issue. These methods are based on the idea of compressing the difference between the vector we would normally wish to compress and some auxiliary vector which changes throughout the iterative process. In this work we take a step back, and develop a unified framework for studying such methods, conceptually, and theoretically. Our framework incorporates methods compressing both gradients and models, using unbiased and biased compressors, and sheds light on the construction of the auxiliary vectors. Furthermore, our general framework can lead to the improvement of several existing algorithms, and can produce new algorithms. Finally, we performed several numerical experiments which illustrate and support our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/27/2020

On Biased Compression for Distributed Learning

In the last few years, various communication compression techniques have...
research
09/04/2020

On Communication Compression for Distributed Optimization on Heterogeneous Data

Lossy gradient compression, with either unbiased or biased compressors, ...
research
05/27/2019

Natural Compression for Distributed Deep Learning

Due to their hunger for big data, modern deep learning models are traine...
research
08/04/2021

ErrorCompensatedX: error compensation for variance reduced algorithms

Communication cost is one major bottleneck for the scalability for distr...
research
06/18/2018

Distributed learning with compressed gradients

Asynchronous computation and gradient compression have emerged as two ke...
research
06/19/2020

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Modern large-scale machine learning applications require stochastic opti...
research
06/06/2021

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

We propose a generic variance-reduced algorithm, which we call MUltiple ...

Please sign up or login with your details

Forgot password? Click here to reset