Doubly infinite residual networks: a diffusion process approach

07/07/2020
by   Stefano Peluchetti, et al.
0

When neural network's parameters are initialized as i.i.d., neural networks exhibit undesirable forward and backward properties as the number of layers increases, e.g., vanishing dependency on the input, and perfectly correlated outputs for any two inputs. To overcome these drawbacks Peluchetti and Favaro (2020) considered fully connected residual networks (ResNets) with parameters' distributions that shrink as the number of layers increases. In particular, they established an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, showing that infinitely deep ResNets does not suffer from undesirable forward properties. In this paper, we review the forward-propagation results of Peluchetti and Favaro (2020), extending them to the setting of convolutional ResNets. Then, we study analogous backward-propagation results, which directly relate to the problem of training deep ResNets. Finally, we extend our study to the doubly infinite regime where both network's width and depth grow unboundedly. Within this novel regime the dynamics of quantities of interest converge, at initialization, to deterministic limits. This allow us to provide analytical expressions for inference, both in the case of weakly trained and fully trained networks. These results point to a limited expressive power of doubly infinite ResNets when the unscaled parameters are i.i.d, and residual blocks are shallow.

READ FULL TEXT
research
05/27/2019

Neural Stochastic Differential Equations

Deep neural networks whose parameters are distributed according to typic...
research
12/15/2022

Asymptotic Analysis of Deep Residual Networks

We investigate the asymptotic properties of deep Residual networks (ResN...
research
07/27/2023

Speed Limits for Deep Learning

State-of-the-art neural networks require extreme computational power to ...
research
05/24/2019

Robust learning with implicit residual networks

In this effort we propose a new deep architecture utilizing residual blo...
research
06/06/2022

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

We provide quantitative bounds measuring the L^2 difference in function ...
research
10/22/2019

Vanishing Nodes: Another Phenomenon That Makes Training Deep Neural Networks Difficult

It is well known that the problem of vanishing/exploding gradients is a ...
research
07/31/2018

Spectrum concentration in deep residual learning: a free probability appproach

We revisit the initialization of deep residual networks (ResNets) by int...

Please sign up or login with your details

Forgot password? Click here to reset