Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network

by   Wenjia Wang, et al.

Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises. We establish a lower bound on the L_2 estimation error with respect to the GD iteration, which is away from zero without a delicate choice of early stopping. In turn, through a comprehensive analysis of ℓ_2-regularized GD trajectories, we prove that for overparametrized one-hidden-layer ReLU neural network with the ℓ_2 regularization: (1) the output is close to that of the kernel ridge regression with the corresponding neural tangent kernel; (2) minimax optimal rate of L_2 estimation error is achieved. Numerical experiments confirm our theory and further demonstrate that the ℓ_2 regularization approach improves the training robustness and works for a wider range of neural networks.



There are no comments yet.


page 34


Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping

We explore the ability of overparameterized shallow neural networks to l...

Understanding Generalization of Deep Neural Networks Trained with Noisy Labels

Over-parameterized deep neural networks trained by simple first-order me...

Harmless Overparametrization in Two-layer Neural Networks

Overparametrized neural networks, where the number of active parameters ...

The Efficacy of L_1 Regularization in Two-Layer Neural Networks

A crucial problem in neural networks is to select the most appropriate n...

Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

Overfitting is one of the most critical challenges in deep neural networ...

Regularized Kernel and Neural Sobolev Descent: Dynamic MMD Transport

We introduce Regularized Kernel and Neural Sobolev Descent for transport...

Nonparametric regression using needlet kernels for spherical data

Needlets have been recognized as state-of-the-art tools to tackle spheri...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.