MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms

11/27/2018
by   Shaohuai Shi, et al.
0

Distributed synchronous stochastic gradient descent has been widely used to train deep neural networks on computer clusters. With the increase of computational power, network communications have become one limiting factor on system scalability. In this paper, we observe that many deep neural networks have a large number of layers with only a small amount of data to be communicated. Based on the fact that merging some short communication tasks into a single one may reduce the overall communication time, we formulate an optimization problem to minimize the training iteration time. We develop an optimal solution named merged-gradient WFBP (MG-WFBP) and implement it in our open-source deep learning platform B-Caffe. Our experimental results on an 8-node GPU cluster with 10GbE interconnect and trace-based simulation results on a 64-node cluster both show that the MG-WFBP algorithm can achieve much better scaling efficiency than existing methods WFBP and SyncEASGD.

READ FULL TEXT
research
12/18/2019

MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning

Distributed synchronous stochastic gradient descent has been widely used...
research
07/14/2021

Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks

Distributed training with synchronous stochastic gradient descent (SGD) ...
research
05/10/2018

Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs

With huge amounts of training data, deep learning has made great breakth...
research
03/16/2019

swCaffe: a Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight

This paper reports our efforts on swCaffe, a highly efficient parallel f...
research
12/26/2017

Scaling GRPC Tensorflow on 512 nodes of Cori Supercomputer

We explore scaling of the standard distributed Tensorflow with GRPC prim...
research
01/04/2019

A Scalable Framework for Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters with Weight and Workload Balancing

Deep Neural Networks (DNNs) have revolutionized numerous applications, b...
research
09/21/2023

Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance

Stochastic Gradient Descent (SGD), a widely used optimization algorithm ...

Please sign up or login with your details

Forgot password? Click here to reset