Towards an Understanding of Residual Networks Using Neural Tangent Hierarchy (NTH)

07/07/2020 ∙ by Yuqing Li, et al. ∙ 0

Gradient descent yields zero training loss in polynomial time for deep neural networks despite non-convex nature of the objective function. The behavior of network in the infinite width limit trained by gradient descent can be described by the Neural Tangent Kernel (NTK) introduced in <cit.>. In this paper, we study dynamics of the NTK for finite width Deep Residual Network (ResNet) using the neural tangent hierarchy (NTH) proposed in <cit.>. For a ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width m with respect to the number of training samples n from quartic to cubic. Our analysis suggests strongly that the particular skip-connection structure of ResNet is the main reason for its triumph over fully-connected network.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.