Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks

03/27/2019
by   Mingchen Li, et al.
0

Modern neural networks are typically trained in an over-parameterized regime where the parameters of the model far exceed the size of the training data. Due to over-parameterization these neural networks in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, neural network models trained via first-order methods continue to predict well on yet unseen test data. In this paper we take a step towards demystifying this phenomena. In particular we show that first order methods such as gradient descent are provably robust to noise/corruption on a constant fraction of the labels despite over-parametrization under a rich dataset model. In particular: i) First, we show that in the first few iterations where the updates are still in the vicinity of the initialization these algorithms only fit to the correct labels essentially ignoring the noisy labels. ii) Secondly, we prove that to start to overfit to the noisy labels these algorithms must stray rather far from from the initial model which can only occur after many more iterations. Together, these show that gradient descent with early stopping is provably robust to label noise and shed light on empirical robustness of deep networks as well as commonly adopted heuristics to prevent overfitting.

READ FULL TEXT
research
02/11/2022

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Benign overfitting, the phenomenon where interpolating models generalize...
research
10/20/2019

Leveraging inductive bias of neural networks for learning without explicit human annotations

Classification problems today are typically solved by first collecting e...
research
11/19/2019

Prestopping: How Does Early Stopping Help Generalization against Label Noise?

Noisy labels are very common in real-world training data, which lead to ...
research
11/25/2021

Predicting the success of Gradient Descent for a particular Dataset-Architecture-Initialization (DAI)

Despite their massive success, training successful deep neural networks ...
research
11/17/2022

Why Deep Learning Generalizes

Very large deep learning models trained using gradient descent are remar...
research
06/30/2021

Understanding and Improving Early Stopping for Learning with Noisy Labels

The memorization effect of deep neural network (DNN) plays a pivotal rol...
research
06/12/2019

Generalization Guarantees for Neural Networks via Harnessing the Low-rank Structure of the Jacobian

Modern neural network architectures often generalize well despite contai...

Please sign up or login with your details

Forgot password? Click here to reset