Scaling ResNets in the Large-depth Regime

06/14/2022
by   Pierre Marion, et al.
0

Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth L increases. No consensus has been reached on how to mitigate this issue, although a widely discussed strategy consists in scaling the output of each layer by a factor α_L. We show in a probabilistic setting that with standard i.i.d. initializations, the only non-trivial dynamics is for α_L = 1/√(L) (other choices lead either to explosion or to identity mapping). This scaling factor corresponds in the continuous-time limit to a neural stochastic differential equation, contrarily to a widespread interpretation that deep ResNets are discretizations of neural ordinary differential equations. By contrast, in the latter regime, stability is obtained with specific correlated initializations and α_L = 1/L. Our analysis suggests a strong interplay between scaling and regularity of the weights as a function of the layer index. Finally, in a series of experiments, we exhibit a continuous range of regimes driven by these two parameters, which jointly impact performance before and after training.

READ FULL TEXT

page 19

page 21

research
12/15/2022

Asymptotic Analysis of Deep Residual Networks

We investigate the asymptotic properties of deep Residual networks (ResN...
research
05/25/2021

Scaling Properties of Deep Residual Networks

Residual networks (ResNets) have displayed impressive results in pattern...
research
09/21/2022

Neural Generalized Ordinary Differential Equations with Layer-varying Parameters

Deep residual networks (ResNets) have shown state-of-the-art performance...
research
05/27/2019

Neural Stochastic Differential Equations

Deep neural networks whose parameters are distributed according to typic...
research
12/26/2018

Portfolio Optimization for Cointelated Pairs: SDEs vs. Machine Learning

We investigate the problem of dynamic portfolio optimization in continuo...
research
05/26/2020

Trainability of Dissipative Perceptron-Based Quantum Neural Networks

Several architectures have been proposed for quantum neural networks (QN...

Please sign up or login with your details

Forgot password? Click here to reset