Training shallow ReLU networks on noisy data using hinge loss: when do we overfit and is it benign?

06/16/2023
by   Erin George, et al.
0

We study benign overfitting in two-layer ReLU networks trained using gradient descent and hinge loss on noisy data for binary classification. In particular, we consider linearly separable data for which a relatively small proportion of labels are corrupted or flipped. We identify conditions on the margin of the clean data that give rise to three distinct training outcomes: benign overfitting, in which zero loss is achieved and with high probability test data is classified correctly; overfitting, in which zero loss is achieved but test data is misclassified with probability lower bounded by a constant; and non-overfitting, in which clean points, but not corrupt points, achieve zero loss and again with high probability test data is classified correctly. Our analysis provides a fine-grained description of the dynamics of neurons throughout training and reveals two distinct phases: in the first phase clean points achieve close to zero loss, in the second phase clean points oscillate on the boundary of zero loss while corrupt points either converge towards zero loss or are eventually zeroed by the network. We prove these results using a combinatorial approach that involves bounding the number of clean versus corrupt updates across these phases of training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Benign overfitting, the phenomenon where interpolating models generalize...
research
03/02/2023

Benign Overfitting in Linear Classifiers and Leaky ReLU Networks from KKT Conditions for Margin Maximization

Linear classifiers and leaky ReLU networks trained by gradient flow on t...
research
07/28/2023

Noisy Interpolation Learning with Shallow Univariate ReLU Networks

We study the asymptotic overfitting behavior of interpolation with minim...
research
08/26/2023

Late Stopping: Avoiding Confidently Learning from Mislabeled Examples

Sample selection is a prevalent method in learning with noisy labels, wh...
research
04/25/2020

Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

We prove bounds on the population risk of the maximum margin algorithm f...
research
07/24/2023

Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization

This paper studies the problem of training a two-layer ReLU network for ...
research
03/24/2021

Phase transition of the monotonicity assumption in learning local average treatment effects

We consider the setting in which a strong binary instrument is available...

Please sign up or login with your details

Forgot password? Click here to reset