Diversely Stale Parameters for Efficient Training of CNNs

09/05/2019
by   An Xu, et al.
0

The backpropagation algorithm is the most popular algorithm training neural networks nowadays. However, it suffers from the forward locking, backward locking and update locking problems, especially when a neural network is so large that its layers are distributed across multiple devices. Existing solutions either can only handle one locking problem or lead to severe accuracy loss or memory explosion. Moreover, none of them consider the straggler problem among devices. In this paper, we propose Layer-wise Staleness and a novel efficient training algorithm, Diversely Stale Parameters (DSP), which can address all these challenges without accuracy loss or memory issue. We also analyze the convergence of DSP with two popular gradient-based methods and prove that both of them are guaranteed to converge to critical points for nonconvex problems. Finally, extensive experimental results on training deep convolutional neural networks demonstrate that our proposed DSP algorithm can achieve significant training speedup with stronger robustness and better generalization than compared methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2018

Training Neural Networks Using Features Replay

Training a neural network using backpropagation algorithm requires passi...
research
04/27/2018

Decoupled Parallel Backpropagation with Convergence Guarantee

Backpropagation algorithm is indispensable for the training of feedforwa...
research
05/26/2023

XGrad: Boosting Gradient-Based Optimizers With Weight Prediction

In this paper, we propose a general deep learning training framework XGr...
research
12/03/2020

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes t...
research
12/10/2021

Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks

Gradient-based methods for the distributed training of residual networks...
research
11/18/2021

Training Neural Networks with Fixed Sparse Masks

During typical gradient-based training of deep neural networks, all of t...
research
07/22/2022

Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning

Deep Neural Network (DNN) models are usually trained sequentially from o...

Please sign up or login with your details

Forgot password? Click here to reset