NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization

08/16/2019
by   Ali Ramezani-Kebrya, et al.
4

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel. Alistarh et al. (2017) describe two variants of data-parallel SGD that quantize and encode gradients to lessen communication costs. For the first variant, QSGD, they provide strong theoretical guarantees. For the second variant, which we call QSGDinf, they demonstrate impressive empirical gains for distributed training of large neural networks. Building on their work, we propose an alternative scheme for quantizing gradients and show that it yields stronger theoretical guarantees than exist for QSGD while matching the empirical performance of QSGDinf.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2021

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

As the size and complexity of models and datasets grow, so does the need...
research
01/13/2021

Improved Hierarchical Clustering on Massive Datasets with Broad Guarantees

Hierarchical clustering is a stronger extension of one of today's most i...
research
06/12/2020

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

Large neural network models present a hefty communication challenge to d...
research
05/29/2023

Global-QSGD: Practical Floatless Quantization for Distributed Learning with Theoretical Guarantees

Efficient distributed training is a principal driver of recent advances ...
research
10/23/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Many communication-efficient variants of SGD use gradient quantization s...
research
09/27/2018

The Convergence of Sparsified Gradient Methods

Distributed training of massive machine learning models, in particular d...
research
10/24/2014

Median Selection Subset Aggregation for Parallel Inference

For massive data sets, efficient computation commonly relies on distribu...

Please sign up or login with your details

Forgot password? Click here to reset