Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks

09/03/2020
by   Qi Sun, et al.
0

Algorithms for training residual networks (ResNets) typically require forward pass of data, followed by backpropagating of loss gradient to perform parameter updates, which can take many hours or even days for networks with hundreds of layers. Inspired by the penalty and augmented Lagrangian methods, a layer-parallel training algorithm is proposed in this work to overcome the scalability barrier caused by the serial nature of forward-backward propagation in deep residual learning. Moreover, by viewing the supervised classification task as a numerical discretization of the terminal control problem, we bridge the concept of synthetic gradient for decoupling backpropagation with the parareal method for solving differential equations, which not only offers a novel perspective on the design of synthetic loss function but also performs parameter updates with reduced storage overhead. Experiments on a preliminary example demonstrate that the proposed algorithm achieves comparable or even better testing accuracy to the full serial backpropagation approach, while enabling layer-parallelism can provide speedup over the traditional layer-serial training methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/10/2021

Layer-Parallel Training of Residual Networks with Auxiliary-Variable Networks

Gradient-based methods for the distributed training of residual networks...
research
12/11/2018

Layer-Parallel Training of Deep Residual Neural Networks

Residual neural networks (ResNets) are a promising class of deep neural ...
research
07/14/2020

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residu...
research
01/27/2023

Deep Residual Compensation Convolutional Network without Backpropagation

PCANet and its variants provided good accuracy results for classificatio...
research
12/01/2018

Stochastic Training of Residual Networks: a Differential Equation Viewpoint

During the last few years, significant attention has been paid to the st...
research
06/04/2018

Backdrop: Stochastic Backpropagation

We introduce backdrop, a flexible and simple-to-implement method, intuit...
research
07/22/2022

Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning

Deep Neural Network (DNN) models are usually trained sequentially from o...

Please sign up or login with your details

Forgot password? Click here to reset