DeepAI AI Chat
Log In Sign Up

On the Utility of Gradient Compression in Distributed Training Systems

by   Saurabh Agarwal, et al.

Rapid growth in data sets and the scale of neural network architectures have rendered distributed training a necessity. A rich body of prior work has highlighted the existence of communication bottlenecks in synchronous data-parallel training. To alleviate these bottlenecks, the machine learning community has largely focused on developing gradient and model compression methods. In parallel, the systems community has adopted several High Performance Computing (HPC)techniques to speed up distributed training. In this work, we evaluate the efficacy of gradient compression methods and compare their scalability with optimized implementations of synchronous data-parallel SGD. Surprisingly, we observe that due to computation overheads introduced by gradient compression, the net speedup over vanilla data-parallel training is marginal, if not negative. We conduct an extensive investigation to identify the root causes of this phenomenon, and offer a performance model that can be used to identify the benefits of gradient compression for a variety of system setups. Based on our analysis, we propose a list of desirable properties that gradient compression methods should satisfy, in order for them to provide a meaningful end-to-end speedup


page 6

page 7


CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation

Communication overhead is the key challenge for distributed training. Gr...

GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training

Distributed data-parallel (DDP) training improves overall application th...

Quantization for Distributed Optimization

Massive amounts of data have led to the training of large-scale machine ...

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

Distributed model training suffers from communication bottlenecks due to...

Is Network the Bottleneck of Distributed Training?

Recently there has been a surge of research on improving the communicati...

Pufferfish: Communication-efficient Models At No Extra Cost

To mitigate communication overheads in distributed model training, sever...

On Efficient Constructions of Checkpoints

Efficient construction of checkpoints/snapshots is a critical tool for t...