Quantized Adam with Error Feedback

04/29/2020
by   Congliang Chen, et al.
6

In this paper, we present a distributed variant of adaptive stochastic gradient method for training deep neural networks in the parameter-server model. To reduce the communication cost among the workers and server, we incorporate two types of quantization schemes, i.e., gradient quantization and weight quantization, into the proposed distributed Adam. Besides, to reduce the bias introduced by quantization operations, we propose an error-feedback technique to compensate for the quantized gradient. Theoretically, in the stochastic nonconvex setting, we show that the distributed adaptive gradient method with gradient quantization and error-feedback converges to the first-order stationary point, and that the distributed adaptive gradient method with weight quantization and error-feedback converges to the point related to the quantized level under both the single-worker and multi-worker modes. At last, we apply the proposed distributed adaptive gradient methods to train deep neural networks. Experimental results demonstrate the efficacy of our methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2022

Efficient-Adam: Communication-Efficient Distributed Adam with Complexity Analysis

Distributed adaptive stochastic gradient methods have been widely used f...
research
02/25/2020

Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

The communication of gradients is costly for training deep neural networ...
research
10/01/2018

ProxQuant: Quantized Neural Networks via Proximal Operators

To make deep neural networks feasible in resource-constrained environmen...
research
06/12/2023

NF4 Isn't Information Theoretically Optimal (and that's Good)

This note shares some simple calculations and experiments related to abs...
research
08/11/2022

Quantized Adaptive Subgradient Algorithms and Their Applications

Data explosion and an increase in model size drive the remarkable advanc...
research
04/18/2022

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

In this paper, we introduce 𝖢𝖮_3, an algorithm for communication-efficie...
research
10/26/2020

A Distributed Training Algorithm of Generative Adversarial Networks with Quantized Gradients

Training generative adversarial networks (GAN) in a distributed fashion ...

Please sign up or login with your details

Forgot password? Click here to reset