ByteComp: Revisiting Gradient Compression in Distributed Training

05/28/2022
by   Zhuang Wang, et al.
0

Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. To fully unleash the benefits of GC, two questions must be addressed: 1) How to express all compression strategies and the corresponding interactions among tensors of any DDL training job? 2) How to quickly select a near-optimal compression strategy? In this paper, we propose ByteComp to answer these questions. It first designs a decision tree abstraction to express all the compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable ByteComp to derive the intricate interactions among tensors. It then designs a compression decision algorithm that analyzes tensor interactions to eliminate and prioritize strategies and optimally offloads compression to CPUs. Experimental evaluations show that ByteComp can improve the training throughput over the start-of-the-art compression-enabled system by up to 77 time needed to select the compression strategy is measured in milliseconds, and the selected strategy is only a few percent from optimal.

READ FULL TEXT
research
05/17/2021

Compressed Communication for Distributed Training: Adaptive Methods and System

Communication overhead severely hinders the scalability of distributed m...
research
02/14/2020

Back-and-Forth prediction for deep tensor compression

Recent AI applications such as Collaborative Intelligence with neural ne...
research
02/05/2021

DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep Learning

Sparse tensors appear frequently in distributed deep learning, either as...
research
06/17/2020

Is Network the Bottleneck of Distributed Training?

Recently there has been a surge of research on improving the communicati...
research
08/04/2021

ErrorCompensatedX: error compensation for variance reduced algorithms

Communication cost is one major bottleneck for the scalability for distr...
research
09/24/2019

Exascale Deep Learning for Scientific Inverse Problems

We introduce novel communication strategies in synchronous distributed D...
research
06/17/2021

On Effects of Compression with Hyperdimensional Computing in Distributed Randomized Neural Networks

A change of the prevalent supervised learning techniques is foreseeable ...

Please sign up or login with your details

Forgot password? Click here to reset