Log In Sign Up

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? – A Neural Tangent Kernel Perspective

by   Kaixuan Huang, et al.

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called "neural tangent kernel" perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.


page 1

page 2

page 3

page 4


Neural Tangent Kernel: A Survey

A seminal work [Jacot et al., 2018] demonstrated that training a neural ...

Kernel-Based Smoothness Analysis of Residual Networks

A major factor in the success of deep neural networks is the use of soph...

Residual Tangent Kernels

A recent body of work has focused on the theoretical study of neural net...

On the Generalization Power of Overfitted Two-Layer Neural Tangent Kernel Models

In this paper, we study the generalization performance of min ℓ_2-norm o...

Vanilla feedforward neural networks as a discretization of dynamic systems

Deep learning has made significant applications in the field of data sci...

Neural Tangent Kernel Eigenvalues Accurately Predict Generalization

Finding a quantitative theory of neural network generalization has long ...

Why bigger is not always better: on finite and infinite neural networks

Recent work has shown that the outputs of convolutional neural networks ...