Generalization Ability of Wide Neural Networks on ℝ

02/12/2023
by   Jianfa Lai, et al.
0

We perform a study on the generalization ability of the wide two-layer ReLU neural network on ℝ. We first establish some spectral properties of the neural tangent kernel (NTK): a) K_d, the NTK defined on ℝ^d, is positive definite; b) λ_i(K_1), the i-th largest eigenvalue of K_1, is proportional to i^-2. We then show that: i) when the width m→∞, the neural network kernel (NNK) uniformly converges to the NTK; ii) the minimax rate of regression over the RKHS associated to K_1 is n^-2/3; iii) if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; iv) if one trains the neural network till it overfits the data, the resulting neural network can not generalize well. Finally, we provide an explanation to reconcile our theory and the widely observed “benign overfitting phenomenon”.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/29/2023

Generalization Ability of Wide Residual Networks

In this paper, we study the generalization ability of the wide residual ...
research
05/04/2023

Statistical Optimality of Deep Wide Neural Networks

In this paper, we consider the generalization ability of deep wide feedf...
research
12/03/2019

Towards Understanding the Spectral Bias of Deep Learning

An intriguing phenomenon observed during training neural networks is the...
research
07/06/2020

Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

Overparametrized neural networks trained by gradient descent (GD) can pr...
research
09/02/2023

On the training and generalization of deep operator networks

We present a novel training method for deep operator networks (DeepONets...
research
09/08/2023

Optimal Rate of Kernel Regression in Large Dimensions

We perform a study on kernel regression for large-dimensional data (wher...
research
10/22/2020

Label-Aware Neural Tangent Kernel: Toward Better Generalization and Local Elasticity

As a popular approach to modeling the dynamics of training overparametri...

Please sign up or login with your details

Forgot password? Click here to reset