Moniqua: Modulo Quantized Communication in Decentralized SGD

02/26/2020
by   Yucheng Lu, et al.
1

Running Stochastic Gradient Descent (SGD) in a decentralized fashion has shown promising results. In this paper we propose Moniqua, a technique that allows decentralized SGD to use quantized communication. We prove in theory that Moniqua communicates a provably bounded number of bits per iteration, while converging at the same asymptotic rate as the original algorithm does with full-precision communication. Moniqua improves upon prior works in that it (1) requires zero additional memory, (2) works with 1-bit quantization, and (3) is applicable to a variety of decentralized algorithms. We demonstrate empirically that Moniqua converges faster with respect to wall clock time than other quantized decentralized algorithms. We also show that Moniqua is robust to very low bit-budgets, allowing 1-bit-per-parameter communication without compromising validation accuracy when training ResNet20 and ResNet110 on CIFAR10.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/15/2020

Acceleration of stochastic methods on the example of decentralized SGD

In this paper, we present an algorithm for accelerating decentralized st...
01/10/2019

Quantized Epoch-SGD for Communication-Efficient Distributed Learning

Due to its efficiency and ease to implement, stochastic gradient descent...
10/31/2019

SPARQ-SGD: Event-Triggered and Compressed Communication in Decentralized Stochastic Optimization

In this paper, we propose and analyze SPARQ-SGD, which is an event-trigg...
06/19/2020

DEED: A General Quantization Scheme for Communication Efficiency in Bits

In distributed optimization, a popular technique to reduce communication...
04/26/2019

SWALP : Stochastic Weight Averaging in Low-Precision Training

Low precision operations can provide scalability, memory savings, portab...
10/06/2021

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the ex...
10/26/2021

Exponential Graph is Provably Efficient for Decentralized Deep Training

Decentralized SGD is an emerging training method for deep learning known...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.