Convert, compress, correct: Three steps toward communication-efficient DNN training

03/17/2022
by   Zhong-Jing Chen, et al.
0

In this paper, we introduce a novel algorithm, 𝖢𝖮_3, for communication-efficiency distributed Deep Neural Network (DNN) training. 𝖢𝖮_3 is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2022

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

In this paper, we introduce 𝖢𝖮_3, an algorithm for communication-efficie...
research
10/13/2017

TensorQuant - A Simulation Toolbox for Deep Neural Network Quantization

Recent research implies that training and inference of deep neural netwo...
research
10/28/2021

FAST: DNN Training Under Variable Precision Block Floating Point with Stochastic Rounding

Block Floating Point (BFP) can efficiently support quantization for Deep...
research
07/22/2022

WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Communication efficiency plays an important role in accelerating the dis...
research
05/29/2023

Reversible Deep Neural Network Watermarking:Matching the Floating-point Weights

Static deep neural network (DNN) watermarking embeds watermarks into the...
research
11/15/2021

DNN gradient lossless compression: Can GenNorm be the answer?

In this paper, the problem of optimal gradient lossless compression in D...

Please sign up or login with your details

Forgot password? Click here to reset