When does gradient descent with logistic loss find interpolating two-layer networks?

12/04/2020
by   Niladri S. Chatterji, et al.
0

We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the first result applies. In contrast, all past analyses of fixed-width networks that we know do not guarantee that the training loss goes to zero.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2021

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

We establish conditions under which gradient descent applied to fixed-wi...
research
09/30/2019

On the convergence of gradient descent for two layer neural networks

It has been shown that gradient descent can yield the zero training loss...
research
02/18/2023

Generalization and Stability of Interpolating Neural Networks with Minimal Width

We investigate the generalization and optimization of k-homogeneous shal...
research
02/20/2020

Do We Need Zero Training Loss After Achieving Zero Training Error?

Overparameterized deep networks have the capacity to memorize training d...
research
01/30/2020

Analytic Study of Double Descent in Binary Classification: The Impact of Loss

Extensive empirical evidence reveals that, for a wide range of different...
research
06/10/2021

Early-stopped neural networks are consistent

This work studies the behavior of neural networks trained with the logis...

Please sign up or login with your details

Forgot password? Click here to reset