Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

09/18/2022
by   Daegun Yoon, et al.
0

To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work.

READ FULL TEXT
research
11/20/2019

Understanding Top-k Sparsification in Distributed Deep Learning

Distributed stochastic gradient descent (SGD) algorithms are widely depl...
research
06/15/2023

Evaluation and Optimization of Gradient Compression for Distributed Deep Learning

To accelerate distributed training, many gradient compression methods ha...
research
09/19/2020

Sparse Communication for Training Deep Networks

Synchronous stochastic gradient descent (SGD) is the most common method ...
research
01/19/2022

Near-Optimal Sparse Allreduce for Distributed Deep Learning

Communication overhead is one of the major obstacles to train large deep...
research
05/21/2020

rTop-k: A Statistical Estimation Approach to Distributed SGD

The large communication cost for exchanging gradients between different ...
research
09/27/2018

The Convergence of Sparsified Gradient Methods

Distributed training of massive machine learning models, in particular d...
research
01/30/2018

Parameter Hub: High Performance Parameter Servers for Efficient Distributed Deep Neural Network Training

Most work in the deep learning systems community has focused on faster i...

Please sign up or login with your details

Forgot password? Click here to reset