Experiments on Parallel Training of Deep Neural Network using Model Averaging

07/05/2015
by   Hang Su, et al.
0

In this work we apply model averaging to parallel training of deep neural network (DNN). Parallelization is done in a model averaging manner. Data is partitioned and distributed to different nodes for local model updates, and model averaging across nodes is done every few minibatches. We use multiple GPUs for data parallelization, and Message Passing Interface (MPI) for communication between nodes, which allows us to perform model averaging frequently without losing much time on communication. We investigate the effectiveness of Natural Gradient Stochastic Gradient Descent (NG-SGD) and Restricted Boltzmann Machine (RBM) pretraining for parallel training in model-averaging framework, and explore the best setups in term of different learning rate schedules, averaging frequencies and minibatch sizes. It is shown that NG-SGD and RBM pretraining benefits parameter-averaging based model training. On the 300h Switchboard dataset, a 9.3 times speedup is achieved using 16 GPUs and 17 times speedup using 32 GPUs with limited decoding accuracy loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2019

Faster Distributed Deep Net Training: Computation and Communication Decoupled Stochastic Gradient Descent

With the increase in the amount of data and the expansion of model scale...
research
11/27/2018

Stochastic Gradient Push for Distributed Deep Learning

Large mini-batch parallel SGD is commonly used for distributed training ...
research
03/04/2021

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

Training deep neural networks on large datasets can often be accelerated...
research
06/02/2016

Ensemble-Compression: A New Method for Parallel Training of Deep Neural Networks

Parallelization framework has become a necessity to speed up the trainin...
research
07/09/2018

Efficient Decentralized Deep Learning by Dynamic Model Averaging

We propose an efficient protocol for decentralized training of deep neur...
research
10/27/2014

Parallel training of DNNs with Natural Gradient and Parameter Averaging

We describe the neural-network training framework used in the Kaldi spee...
research
01/11/2018

MXNET-MPI: Embedding MPI parallelism in Parameter Server Task Model for scaling Deep Learning

Existing Deep Learning frameworks exclusively use either Parameter Serve...

Please sign up or login with your details

Forgot password? Click here to reset