DeepAI
Log In Sign Up

Spectral Bias Outside the Training Set for Deep Networks in the Kernel Regime

06/06/2022
by   Benjamin Bowman, et al.
0

We provide quantitative bounds measuring the L^2 difference in function space between the trajectory of a finite-width network trained on finitely many samples from the idealized kernel dynamics of infinite width and infinite data. An implication of the bounds is that the network is biased to learn the top eigenfunctions of the Neural Tangent Kernel not just on the training set but over the entire input space. This bias depends on the model architecture and input distribution alone and thus does not depend on the target function which does not need to be in the RKHS of the kernel. The result is valid for deep architectures with fully connected, convolutional, and residual layers. Furthermore the width does not need to grow polynomially with the number of samples in order to obtain high probability bounds up to a stopping time. The proof exploits the low-effective-rank property of the Fisher Information Matrix at initialization, which implies a low effective dimension of the model (far smaller than the number of parameters). We conclude that local capacity control from the low effective rank of the Fisher Information Matrix is still underexplored theoretically.

READ FULL TEXT

page 1

page 2

page 3

page 4

02/01/2022

Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization

Neural Tangent Kernel (NTK) is widely used to analyze overparametrized n...
05/08/2021

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at ini...
07/07/2020

Doubly infinite residual networks: a diffusion process approach

When neural network's parameters are initialized as i.i.d., neural netwo...
01/28/2020

Residual Tangent Kernels

A recent body of work has focused on the theoretical study of neural net...
10/10/2022

Efficient NTK using Dimensionality Reduction

Recently, neural tangent kernel (NTK) has been used to explain the dynam...
08/25/2020

Deep Networks and the Multiple Manifold Problem

We study the multiple manifold problem, a binary classification task mod...