Optimal signal propagation in ResNets through residual scaling

05/12/2023
by   Kirsten Fischer, et al.
0

Residual networks (ResNets) have significantly better trainability and thus performance than feed-forward networks at large depth. Introducing skip connections facilitates signal propagation to deeper layers. In addition, previous works found that adding a scaling parameter for the residual branch further improves generalization performance. While they empirically identified a particularly beneficial range of values for this scaling parameter, the associated performance improvement and its universality across network hyperparameters yet need to be understood. For feed-forward networks (FFNets), finite-size theories have led to important insights with regard to signal propagation and hyperparameter tuning. We here derive a systematic finite-size theory for ResNets to study signal propagation and its dependence on the scaling for the residual branch. We derive analytical expressions for the response function, a measure for the network's sensitivity to inputs, and show that for deep networks the empirically found values for the scaling parameter lie within the range of maximal sensitivity. Furthermore, we obtain an analytical expression for the optimal scaling parameter that depends only weakly on other network hyperparameters, such as the weight variance, thereby explaining its universality across hyperparameters. Overall, this work provides a framework for theory-guided optimal scaling in ResNets and, more generally, provides the theoretical framework to study ResNets at finite widths.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2018

Expectation propagation: a probabilistic view of Deep Feed Forward Networks

We present a statistical mechanics model of deep feed forward neural net...
research
07/20/2021

Edge of chaos as a guiding principle for modern neural network training

The success of deep neural networks in real-world problems has prompted ...
research
09/02/2022

Normalization effects on deep neural networks

We study the effect of normalization on the layers of deep neural networ...
research
10/13/2020

Unfolding recurrence by Green's functions for optimized reservoir computing

Cortical networks are strongly recurrent, and neurons have intrinsic tem...
research
12/24/2017

Mean Field Residual Networks: On the Edge of Chaos

We study randomly initialized residual networks using mean field theory ...
research
11/29/2020

Architectural Adversarial Robustness: The Case for Deep Pursuit

Despite their unmatched performance, deep neural networks remain suscept...
research
12/20/2014

Visual Scene Representations: Contrast, Scaling and Occlusion

We study the structure of representations, defined as approximations of ...

Please sign up or login with your details

Forgot password? Click here to reset