
Dynamics of Deep Neural Networks and Neural Tangent Hierarchy
The evolution of a deep neural network trained by the gradient descent c...
read it

Kinetic Theory for Residual Neural Networks
Deep residual neural networks (ResNet) are performing very well for many...
read it

Gradient Descent Finds Global Minima of Deep Neural Networks
Gradient descent finds a global minimum in training deep neural networks...
read it

Towards Understanding the Importance of Shortcut Connections in Residual Networks
Residual Network (ResNet) is undoubtedly a milestone in deep learning. R...
read it

Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks
In an attempt to better understand structural benefits and generalizatio...
read it

When and why PINNs fail to train: A neural tangent kernel perspective
Physicsinformed neural networks (PINNs) have lately received great atte...
read it

Deep Networks and the Multiple Manifold Problem
We study the multiple manifold problem, a binary classification task mod...
read it
Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)
Gradient descent yields zero training loss in polynomial time for deep neural networks despite nonconvex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in <cit.>. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in <cit.>. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width m with respect to the number of training samples n from quartic to cubic. Our analysis suggests strongly that the particular skipconnection structure of ResNet is the main reason for its triumph over fullyconnected network.
READ FULL TEXT
Comments
There are no comments yet.