Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

02/01/2022
by   Rodrigo Veiga, et al.
46

Despite the non-convex optimization landscape, over-parametrized shallow networks are able to achieve global convergence under gradient descent. The picture can be radically different for narrow networks, which tend to get stuck in badly-generalizing local minima. Here we investigate the cross-over between these two regimes in the high-dimensional setting, and in particular investigate the connection between the so-called mean-field/hydrodynamic regime and the seminal approach of Saad Solla. Focusing on the case of Gaussian data, we study the interplay between the learning rate, the time scale, and the number of hidden units in the high-dimensional dynamics of stochastic gradient descent (SGD). Our work builds on a deterministic description of SGD in high-dimensions from statistical physics, which we extend and for which we provide rigorous convergence rates.

READ FULL TEXT
research
06/10/2020

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

We analyze in a closed form the learning dynamics of stochastic gradient...
research
02/12/2023

From high-dimensional mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

This manuscript investigates the one-pass stochastic gradient descent (S...
research
04/19/2023

Leveraging the two timescale regime to demonstrate convergence of neural networks

We study the training dynamics of shallow neural networks, in a two-time...
research
12/20/2014

Explorations on high dimensional landscapes

Finding minima of a real valued non-convex function over a high dimensio...
research
07/28/2023

On Single Index Models beyond Gaussian Data

Sparse high-dimensional functions have arisen as a rich framework to stu...
research
08/17/2023

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

We analyze the dynamics of streaming stochastic gradient descent (SGD) i...
research
09/09/2023

Stochastic Gradient Descent outperforms Gradient Descent in recovering a high-dimensional signal in a glassy energy landscape

Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm use...

Please sign up or login with your details

Forgot password? Click here to reset