The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization

11/08/2016
by   Etai Littwin, et al.
0

Deep Residual Networks present a premium in performance in comparison to conventional networks of the same depth and are trainable at extreme depths. It has recently been shown that Residual Networks behave like ensembles of relatively shallow networks. We show that these ensembles are dynamic: while initially the virtual ensemble is mostly at depths lower than half the network's depth, as training progresses, it becomes deeper and deeper. The main mechanism that controls the dynamic ensemble behavior is the scaling introduced, e.g., by the Batch Normalization technique. We explain this behavior and demonstrate the driving force behind it. As a main tool in our analysis, we employ generalized spin glass models, which we also use in order to study the number of critical points in the optimization of Residual Networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Batch Normalization Biases Deep Residual Networks Towards Shallow Paths

Batch normalization has multiple benefits. It improves the conditioning ...
research
12/02/2018

Analysis on Gradient Propagation in Batch Normalized Residual Networks

We conduct mathematical analysis on the effect of batch normalization (B...
research
09/19/2016

Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks

In this article, we take one step toward understanding the learning beha...
research
05/04/2023

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

Residual networks have shown great success and become indispensable in r...
research
05/21/2021

Maximum and Leaky Maximum Propagation

In this work, we present an alternative to conventional residual connect...
research
12/15/2017

Gradients explode - Deep Networks are shallow - ResNet explained

Whereas it is believed that techniques such as Adam, batch normalization...
research
05/20/2016

Residual Networks Behave Like Ensembles of Relatively Shallow Networks

In this work we propose a novel interpretation of residual networks show...

Please sign up or login with your details

Forgot password? Click here to reset