Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

07/07/2020
by   Yuqing Li, et al.
0

Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in <cit.>. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in <cit.>. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width m with respect to the number of training samples n from quartic to cubic. Our analysis suggests strongly that the particular skip-connection structure of ResNet is the main reason for its triumph over fully-connected network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Dynamics of Deep Neural Networks and Neural Tangent Hierarchy

The evolution of a deep neural network trained by the gradient descent c...
research
01/07/2020

Kinetic Theory for Residual Neural Networks

Deep residual neural networks (ResNet) are performing very well for many...
research
11/09/2018

Gradient Descent Finds Global Minima of Deep Neural Networks

Gradient descent finds a global minimum in training deep neural networks...
research
03/30/2021

Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

In an attempt to better understand structural benefits and generalizatio...
research
09/10/2019

Towards Understanding the Importance of Shortcut Connections in Residual Networks

Residual Network (ResNet) is undoubtedly a milestone in deep learning. R...
research
05/30/2021

Overparameterization of deep ResNet: zero loss and mean-field analysis

Finding parameters in a deep neural network (NN) that fit training data ...
research
02/17/2023

On Equivalent Optimization of Machine Learning Methods

At the core of many machine learning methods resides an iterative optimi...

Please sign up or login with your details

Forgot password? Click here to reset