SQuARM-SGD: Communication-Efficient Momentum SGD for Decentralized Optimization

05/13/2020
by   Navjot Singh, et al.
3

In this paper, we consider the problem of communication-efficient decentralized training of large-scale machine learning models over a network. We propose and analyze SQuARM-SGD, an algorithm for decentralized training, which employs momentum and compressed communication between nodes regulated by a locally computable triggering condition in stochastic gradient descent (SGD). In SQuARM-SGD, each node performs a fixed number of local SGD steps using Nesterov's momentum and then sends sparisified and quantized updates to its neighbors only when there is a significant change in the model parameters since the last time communication occurred. We provide convergence guarantees of our algorithm for (smooth) strongly convex and non-convex objectives, and show that SQuARM-SGD converges at a rate of 𝒪(1/nT) for strongly convex objectives, while for non-convex objectives it convergences at a rate of 𝒪(1/√(nT)), thus matching the convergence rate of vanilla distributed SGD in both these settings. We corroborate our theoretical understanding with experiments and compare the performance of our algorithm with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2019

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

In this paper, we propose and analyze SPARQ-SGD, which is an event-trigg...
research
04/24/2021

DecentLaM: Decentralized Momentum SGD for Large-batch Deep Training

The scale of deep learning nowadays calls for efficient distributed trai...
research
02/01/2019

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication

We consider decentralized stochastic optimization with the objective fun...
research
06/06/2019

Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations

Communication bottleneck has been identified as a significant issue in d...
research
10/21/2019

Communication Efficient Decentralized Training with Multiple Local Updates

Communication efficiency plays a significant role in decentralized optim...
research
10/27/2019

PopSGD: Decentralized Stochastic Gradient Descent in the Population Model

The population model is a standard way to represent large-scale decentra...
research
10/01/2019

SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum

Distributed optimization is essential for training large models on large...

Please sign up or login with your details

Forgot password? Click here to reset