DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning

07/30/2021
by   Guangfeng Yan, et al.
0

Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach to dynamically quantize gradients. This paper addresses this issue by proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to dynamically adjust the quantization scheme for each gradient descent step by exploring the trade-off between communication cost and convergence error. We derive an upper bound, tight in some cases, of the convergence error for a restricted family of quantization schemes and loss functions. We design our DQ-SGD algorithm via minimizing the communication cost under the convergence error constraints. Finally, through extensive experiments on large-scale natural language processing and computer vision tasks on AG-News, CIFAR-10, and CIFAR-100 datasets, we demonstrate that our quantization scheme achieves better tradeoffs between the communication cost and learning performance than other state-of-the-art gradient quantization methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2022

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning

Distributed stochastic gradient descent (SGD) with gradient compression ...
research
04/13/2022

Joint Coreset Construction and Quantization for Distributed Machine Learning

Coresets are small, weighted summaries of larger datasets, aiming at pro...
research
06/21/2018

Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization

Large-scale distributed optimization is of great importance in various a...
research
02/25/2020

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

The communication of gradients is costly for training deep neural networ...
research
10/23/2020

Adaptive Gradient Quantization for Data-Parallel SGD

Many communication-efficient variants of SGD use gradient quantization s...
research
05/20/2021

Towards Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM framework

The Graph Augmented Multi-layer Perceptron (GA-MLP) model is an attracti...
research
08/11/2022

Quantized Adaptive Subgradient Algorithms and Their Applications

Data explosion and an increase in model size drive the remarkable advanc...

Please sign up or login with your details

Forgot password? Click here to reset