Norm-Preservation: Why Residual Networks Can Become Extremely Deep?

05/18/2018
by   Alireza Zaeemzadeh, et al.
0

Augmenting deep neural networks with skip connections, as introduced in the so called ResNet architecture, surprised the community by enabling the training of networks of more than 1000 layers with significant performance gains. It has been shown that identity skip connections eliminate singularities and improve the optimization landscape of the network. This paper deciphers ResNet by analyzing the of effect of skip connections in the backward path and sets forth new theoretical results on the advantages of identity skip connections in deep neural networks. We prove that the skip connections in the residual blocks facilitate preserving the norm of the gradient and lead to well-behaved and stable back-propagation, which is a desirable feature from optimization perspective. We also show that, perhaps surprisingly, as more residual blocks are stacked, the network becomes more norm-preserving. Traditionally, norm-preservation is enforced on the network only at beginning of the training, by using initialization techniques. However, we show that identity skip connection retain norm-preservation during the training procedure. Our theoretical arguments are supported by extensive empirical evidence. Can we push for more norm-preservation? We answer this question by proposing zero-phase whitening of the fully-connected layer and adding norm-preserving transition layers. Our numerical investigations demonstrate that the learning dynamics and the performance of ResNets can be improved by making it even more norm preserving through changing only a few blocks in very deep residual networks. Our results and the introduced modification for ResNet, referred to as Procrustes ResNets, can be used as a guide for studying more complex architectures such as DenseNet, training deeper networks, and inspiring new architectures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2016

Identity Mappings in Deep Residual Networks

Deep residual networks have emerged as a family of extremely deep archit...
research
06/01/2017

DiracNets: Training Very Deep Neural Networks Without Skip-Connections

Deep neural networks with skip-connections, such as ResNet, show excelle...
research
07/19/2017

Orthogonal and Idempotent Transformations for Learning Deep Neural Networks

Identity transformations, used as skip-connections in residual networks,...
research
01/31/2017

Skip Connections Eliminate Singularities

Skip connections made the training of very deep networks possible and ha...
research
10/09/2022

SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction

In light of the smoothness property brought by skip connections in ResNe...
research
09/25/2019

Residual Networks Behave Like Boosting Algorithms

We show that Residual Networks (ResNet) is equivalent to boosting featur...
research
11/04/2016

Learning Identity Mappings with Residual Gates

We propose a new layer design by adding a linear gating mechanism to sho...

Please sign up or login with your details

Forgot password? Click here to reset