NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

04/28/2021
by   Ali Ramezani-Kebrya, et al.
0

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.

READ FULL TEXT
research
08/16/2019

NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need...
research
10/23/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Many communication-efficient variants of SGD use gradient quantization s...
research
02/05/2023

Quantized Distributed Training of Large Models with Convergence Guarantees

Communication-reduction techniques are a popular way to improve scalabil...
research
05/29/2023

Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Efficient distributed training is a principal driver of recent advances ...
research
09/27/2018

The Convergence of Sparsified Gradient Methods

Distributed training of massive machine learning models, in particular d...
research
10/24/2022

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression ...
research
11/20/2022

Non-reversible Parallel Tempering for Deep Posterior Approximation

Parallel tempering (PT), also known as replica exchange, is the go-to wo...

Please sign up or login with your details

Forgot password? Click here to reset