Residual Networks Behave Like Ensembles of Relatively Shallow Networks

05/20/2016
by   Andreas Veit, et al.
0

In this work we propose a novel interpretation of residual networks showing that they can be seen as a collection of many paths of differing length. Moreover, residual networks seem to enable very deep networks by leveraging only the short paths during training. To support this observation, we rewrite residual networks as an explicit collection of paths. Unlike traditional models, paths through residual networks vary in length. Further, a lesion study reveals that these paths show ensemble-like behavior in the sense that they do not strongly depend on each other. Finally, and most surprising, most paths are shorter than one might expect, and only the short paths are needed during training, as longer paths do not contribute any gradient. For example, most of the gradient in a residual network with 110 layers comes from paths that are only 10-34 layers deep. Our results reveal one of the key characteristics that seem to enable the training of very deep networks: Residual networks avoid the vanishing gradient problem by introducing short paths which can carry gradient throughout the extent of very deep networks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2016

Multi-Residual Networks: Improving the Speed and Accuracy of Residual Networks

In this article, we take one step toward understanding the learning beha...
research
03/30/2016

Deep Networks with Stochastic Depth

Very deep convolutional networks with hundreds of layers have led to sig...
research
12/15/2017

Gradients explode - Deep Networks are shallow - ResNet explained

Whereas it is believed that techniques such as Adam, batch normalization...
research
08/12/2023

Revisiting Vision Transformer from the View of Path Ensemble

Vision Transformers (ViTs) are normally regarded as a stack of transform...
research
11/08/2016

The Loss Surface of Residual Networks: Ensembles and the Role of Batch Normalization

Deep Residual Networks present a premium in performance in comparison to...
research
05/04/2023

Stimulative Training++: Go Beyond The Performance Limits of Residual Networks

Residual networks have shown great success and become indispensable in r...
research
09/15/2023

Make Deep Networks Shallow Again

Deep neural networks have a good success record and are thus viewed as t...

Please sign up or login with your details

Forgot password? Click here to reset