Gradient descent provably escapes saddle points in the training of shallow ReLU networks

08/03/2022
by   Patrick Cheridito, et al.
0

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In particular, this is the case for rectified linear unit (ReLU) networks. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements. Then, we verify that shallow ReLU networks fit into the new framework. Building on a classification of critical points of the square integral loss of shallow ReLU networks measured against an affine target function, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2016

Gradient Descent Converges to Minimizers

We show that gradient descent converges to a local minimizer, almost sur...
research
06/30/2023

The Implicit Bias of Minima Stability in Multivariate Shallow ReLU Networks

We study the type of solutions to which stochastic gradient descent conv...
research
10/20/2017

First-order Methods Almost Always Avoid Saddle Points

We establish that first-order methods avoid saddle points for almost all...
research
10/01/2022

A Combinatorial Perspective on the Optimization of Shallow ReLU Networks

The NP-hard problem of optimizing a shallow ReLU network can be characte...
research
02/09/2021

When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

We establish conditions under which gradient descent applied to fixed-wi...
research
09/27/2022

Magnitude and Angle Dynamics in Training Single ReLU Neurons

To understand learning the dynamics of deep ReLU networks, we investigat...
research
06/03/2021

Robust Learning via Persistency of Excitation

Improving adversarial robustness of neural networks remains a major chal...

Please sign up or login with your details

Forgot password? Click here to reset