Early-stopped neural networks are consistent

06/10/2021
by   Ziwei Ji, et al.
0

This work studies the behavior of neural networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero. In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms of calibration, meaning the sigmoid mapping of its outputs approximates the true underlying conditional distribution arbitrarily finely. Moreover, the necessary iteration, sample, and architectural complexities of this analysis all scale naturally with a certain complexity measure of the true conditional model. Lastly, while it is not shown that early stopping is necessary, it is shown that any univariate classifier satisfying a local interpolation property is necessarily inconsistent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2022

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data

Benign overfitting, the phenomenon where interpolating models generalize...
research
12/04/2020

When does gradient descent with logistic loss find interpolating two-layer networks?

We study the training of finite-width two-layer smoothed ReLU networks f...
research
12/28/2022

Learning Lipschitz Functions by GD-trained Shallow Overparameterized ReLU Neural Networks

We explore the ability of overparameterized shallow ReLU neural networks...
research
07/27/2021

Stability Generalisation of Gradient Descent for Shallow Neural Networks without the Neural Tangent Kernel

We revisit on-average algorithmic stability of Gradient Descent (GD) for...
research
01/31/2023

Robust Linear Regression: Gradient-descent, Early-stopping, and Beyond

In this work we study the robustness to adversarial attacks, of early-st...
research
02/20/2022

On Optimal Early Stopping: Over-informative versus Under-informative Parametrization

Early stopping is a simple and widely used method to prevent over-traini...
research
02/15/2021

How to Learn when Data Reacts to Your Model: Performative Gradient Descent

Performative distribution shift captures the setting where the choice of...

Please sign up or login with your details

Forgot password? Click here to reset