Distributed Learning with Compressed Gradient Differences

01/26/2019
by   Konstantin Mishchenko, et al.
0

Training very large machine learning models requires a distributed computing approach, with communication of the model updates often being the bottleneck. For this reason, several methods based on the compression (e.g., sparsification and/or quantization) of the updates were recently proposed, including QSGD (Alistarh et al., 2017), TernGrad (Wen et al., 2017), SignSGD (Bernstein et al., 2018), and DQGD (Khirirat et al., 2018). However, none of these methods are able to learn the gradients, which means that they necessarily suffer from several issues, such as the inability to converge to the true optimum in the batch mode, inability to work with a nonsmooth regularizer, and slow convergence rates. In this work we propose a new distributed learning method---DIANA---which resolves these issues via compression of gradient differences. We perform a theoretical analysis in the strongly convex and nonconvex settings and show that our rates are vastly superior to existing rates. Our analysis of block-quantization and differences between ℓ_2 and ℓ_∞ quantization closes the gaps in theory and practice. Finally, by applying our analysis technique to TernGrad, we establish the first convergence rate for this method.

READ FULL TEXT
research
06/09/2021

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Error feedback (EF), also known as error compensation, is an immensely p...
research
10/31/2022

Adaptive Compression for Communication-Efficient Distributed Training

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optim...
research
06/17/2021

On Effects of Compression with Hyperdimensional Computing in Distributed Randomized Neural Networks

A change of the prevalent supervised learning techniques is foreseeable ...
research
12/14/2020

Quantizing data for distributed learning

We consider machine learning applications that train a model by leveragi...
research
06/07/2021

Smoothness-Aware Quantization Techniques

Distributed machine learning has become an indispensable tool for traini...
research
05/24/2023

Momentum Provably Improves Error Feedback!

Due to the high communication overhead when training machine learning mo...
research
01/18/2021

Inference for BART with Multinomial Outcomes

The multinomial probit Bayesian additive regression trees (MPBART) frame...

Please sign up or login with your details

Forgot password? Click here to reset