Fully Decoupled Neural Network Learning Using Delayed Gradients

06/21/2019
by   Huiping Zhuang, et al.
2

Using the back-propagation (BP) to train neural networks requires a sequential passing of the activations and the gradients, which forces the network modules to work in a synchronous fashion. This has been recognized as the lockings (i.e., the forward, backward and update lockings) inherited from the BP. In this paper, we propose a fully decoupled training scheme using delayed gradients (FDG) to break all these lockings. The proposed method splits a neural network into multiple modules that are trained independently and asynchronously in different GPUs. We also introduce a gradient shrinking process to reduce the stale gradient effect caused by the delayed gradients. In addition, we prove that the proposed FDG algorithm guarantees a statistical convergence during training. Experiments are conducted by training deep convolutional neural networks to perform classification tasks on benchmark datasets. The proposed FDG is able to train very deep networks (>100 layers) and very large networks (>35 million parameters) with significant speed gains while outperforming the state-of-the-art methods and the standard BP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2018

Decoupled Parallel Backpropagation with Convergence Guarantee

Backpropagation algorithm is indispensable for the training of feedforwa...
research
05/14/2022

BackLink: Supervised Local Training with Backward Links

Empowered by the backpropagation (BP) algorithm, deep neural networks ha...
research
12/03/2020

Accumulated Decoupled Learning: Mitigating Gradient Staleness in Inter-Layer Model Parallelization

Decoupled learning is a branch of model parallelism which parallelizes t...
research
09/14/2018

Non-iterative recomputation of dense layers for performance improvement of DCNN

An iterative method of learning has become a paradigm for training deep ...
research
08/18/2023

Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

Backward propagation (BP) is widely used to compute the gradients in neu...
research
03/01/2017

Understanding Synthetic Gradients and Decoupled Neural Interfaces

When training neural networks, the use of Synthetic Gradients (SG) allow...
research
12/02/2021

Target Propagation via Regularized Inversion

Target Propagation (TP) algorithms compute targets instead of gradients ...

Please sign up or login with your details

Forgot password? Click here to reset