Stochastic Training of Residual Networks: a Differential Equation Viewpoint

12/01/2018
by   Qi Sun, et al.
0

During the last few years, significant attention has been paid to the stochastic training of artificial neural networks, which is known as an effective regularization approach that helps improve the generalization capability of trained models. In this work, the method of modified equations is applied to show that the residual network and its variants with noise injection can be regarded as weak approximations of stochastic differential equations. Such observations enable us to bridge the stochastic training processes with the optimal control of backward Kolmogorov's equations. This not only offers a novel perspective on the effects of regularization from the loss landscape viewpoint but also sheds light on the design of more reliable and efficient stochastic training strategies. As an example, we propose a new way to utilize Bernoulli dropout within the plain residual network architecture and conduct experiments on a real-world image classification task to substantiate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/27/2017

Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations

In our work, we bridge deep neural network design with numerical differe...
research
02/06/2019

DiffEqFlux.jl - A Julia Library for Neural Differential Equations

DiffEqFlux.jl is a library for fusing neural networks and differential e...
research
06/05/2019

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

Neural Ordinary Differential Equation (Neural ODE) has been proposed as ...
research
05/25/2023

Stochastic Modified Equations and Dynamics of Dropout Algorithm

Dropout is a widely utilized regularization technique in the training of...
research
05/25/2021

Scaling Properties of Deep Residual Networks

Residual networks (ResNets) have displayed impressive results in pattern...
research
09/03/2020

Penalty and Augmented Lagrangian Methods for Layer-parallel Training of Residual Networks

Algorithms for training residual networks (ResNets) typically require fo...

Please sign up or login with your details

Forgot password? Click here to reset