Towards an Understanding of Benign Overfitting in Neural Networks

06/06/2021
by   Zhu Li, et al.
15

Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss; yet surprisingly, they possess near-optimal prediction performance, contradicting classical learning theory. We examine how these benign overfitting phenomena occur in a two-layer neural network setting where sample covariates are corrupted with noise. We address the high dimensional regime, where the data dimension d grows with the number n of data points. Our analysis combines an upper bound on the bias with matching upper and lower bounds on the variance of the interpolator (an estimator that interpolates the data). These results indicate that the excess learning risk of the interpolator decays under mild conditions. We further show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate, which to our knowledge is the first generalization result for such networks. Finally, our theory predicts that the excess learning risk starts to increase once the number of parameters s grows beyond O(n^2), matching recent empirical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2020

Benign Overfitting and Noisy Features

Modern machine learning often operates in the regime where the number of...
research
06/01/2022

Realistic Deep Learning May Not Fit Benignly

Studies on benign overfitting provide insights for the success of overpa...
research
03/07/2023

Benign Overfitting for Two-layer ReLU Networks

Modern deep learning models with great expressive power can be trained t...
research
10/19/2018

A Modern Take on the Bias-Variance Tradeoff in Neural Networks

We revisit the bias-variance tradeoff for neural networks in light of mo...
research
01/06/2019

Scaling description of generalization with number of parameters in deep learning

We provide a description for the evolution of the generalization perform...
research
10/06/2021

VC dimension of partially quantized neural networks in the overparametrized regime

Vapnik-Chervonenkis (VC) theory has so far been unable to explain the sm...
research
08/08/2022

Generalization and Overfitting in Matrix Product State Machine Learning Architectures

While overfitting and, more generally, double descent are ubiquitous in ...

Please sign up or login with your details

Forgot password? Click here to reset