Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

06/06/2019
by   Debraj Basu, et al.
0

Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper we propose Qsparse-local-SGD algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of Qsparse-local-SGD. We analyze convergence for Qsparse-local-SGD in the distributed setting for smooth non-convex and convex objective functions. We demonstrate that Qsparse-local-SGD converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use Qsparse-local-SGD to train ResNet-50 on ImageNet, and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.

READ FULL TEXT
research
05/13/2020

SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

In this paper, we consider the problem of communication-efficient decent...
research
10/31/2019

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

In this paper, we propose and analyze SPARQ-SGD, which is an event-trigg...
research
12/14/2020

Quantizing data for distributed learning

We consider machine learning applications that train a model by leveragi...
research
07/27/2020

Multi-Level Local SGD for Heterogeneous Hierarchical Networks

We propose Multi-Level Local SGD, a distributed gradient method for lear...
research
09/28/2020

On Efficient Constructions of Checkpoints

Efficient construction of checkpoints/snapshots is a critical tool for t...
research
10/23/2022

K-SAM: Sharpness-Aware Minimization at the Speed of SGD

Sharpness-Aware Minimization (SAM) has recently emerged as a robust tech...
research
04/09/2023

SLowcal-SGD: Slow Query Points Improve Local-SGD for Stochastic Convex Optimization

We consider distributed learning scenarios where M machines interact wit...

Please sign up or login with your details

Forgot password? Click here to reset