Convert, compress, correct: Three steps toward communication-efficient DNN training

03/17/2022
by   Zhong-Jing Chen, et al.
0

In this paper, we introduce a novel algorithm, 𝖢𝖮_3, for communication-efficiency distributed Deep Neural Network (DNN) training. 𝖢𝖮_3 is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset