Towards Understanding the Importance of Shortcut Connections in Residual Networks

09/10/2019
by   Tianyi Liu, et al.
2

Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum. We show, however, that gradient descent combined with proper normalization, avoids being trapped by the spurious local optimum, and converges to a global optimum in polynomial time, when the weight of the first layer is initialized at 0, and that of the second layer is initialized arbitrarily in a ball. Numerical experiments are provided to support our theory.

READ FULL TEXT
research
09/07/2019

Towards Understanding the Importance of Noise in Training Neural Networks

Numerous empirical evidence has corroborated that the noise plays a cruc...
research
02/22/2023

Considering Layerwise Importance in the Lottery Ticket Hypothesis

The Lottery Ticket Hypothesis (LTH) showed that by iteratively training ...
research
11/09/2018

Gradient Descent Finds Global Minima of Deep Neural Networks

Gradient descent finds a global minimum in training deep neural networks...
research
07/07/2020

Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

Gradient descent yields zero training loss in polynomial time for deep n...
research
01/03/2019

Towards Global Remote Discharge Estimation: Using the Few to Estimate The Many

Learning hydrologic models for accurate riverine flood prediction at sca...
research
07/22/2019

Channel Normalization in Convolutional Neural Network avoids Vanishing Gradients

Normalization layers are widely used in deep neural networks to stabiliz...
research
04/18/2018

Are ResNets Provably Better than Linear Predictors?

A residual network (or ResNet) is a standard deep neural net architectur...

Please sign up or login with your details

Forgot password? Click here to reset