Gear Training: A new way to implement high-performance model-parallel training

06/11/2018
by   Hao Dong, et al.
0

The training of Deep Neural Networks usually needs tremendous computing resources. Therefore many deep models are trained in large cluster instead of single machine or GPU. Though major researchs at present try to run whole model on all machines by using asynchronous asynchronous stochastic gradient descent (ASGD), we present a new approach to train deep model parallely -- split the model and then seperately train different parts of it in different speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2018

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

Deep learning has led to tremendous advancements in the field of Artific...
research
11/19/2015

SparkNet: Training Deep Networks in Spark

Training deep networks is a time-consuming process, with networks for ob...
research
03/17/2017

Empirical Evaluation of Parallel Training Algorithms on Acoustic Modeling

Deep learning models (DLMs) are state-of-the-art techniques in speech re...
research
11/29/2016

Gossip training for deep learning

We address the issue of speeding up the training of convolutional networ...
research
12/21/2013

GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training

The ability to train large-scale neural networks has resulted in state-o...
research
04/04/2018

GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange

We address the issue of speeding up the training of convolutional neural...
research
05/08/2019

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks

Deep neural networks have yielded superior performance in many applicati...

Please sign up or login with your details

Forgot password? Click here to reset