Convert, compress, correct: Three steps toward communication-efficient DNN training
In this paper, we introduce a novel algorithm, 𝖢𝖮_3, for communication-efficiency distributed Deep Neural Network (DNN) training. 𝖢𝖮_3 is a joint training/communication protocol, which encompasses three processing steps for the network gradients: (i) quantization through floating-point conversion, (ii) lossless compression, and (iii) error correction. These three components are crucial in the implementation of distributed DNN training over rate-constrained links. The interplay of these three steps in processing the DNN gradients is carefully balanced to yield a robust and high-performance scheme. The performance of the proposed scheme is investigated through numerical evaluations over CIFAR-10.
READ FULL TEXT