From high-dimensional mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

02/12/2023
by   Luca Arnaboldi, et al.
8

This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a two-layer neural network trained on Gaussian data and labels generated by a similar, though not necessarily identical, target function. We rigorously analyse the limiting dynamics via a deterministic and low-dimensional description in terms of the sufficient statistics for the population risk. Our unifying analysis bridges different regimes of interest, such as the classical gradient-flow regime of vanishing learning rate, the high-dimensional regime of large input dimension, and the overparameterised "mean-field" regime of large network width, covering as well the intermediate regimes where the limiting dynamics is determined by the interplay between these behaviours. In particular, in the high-dimensional limit, the infinite-width dynamics is found to remain close to a low-dimensional subspace spanned by the target principal directions. Our results therefore provide a unifying picture of the limiting SGD dynamics with synthetic data.

READ FULL TEXT
research
02/01/2022

Phase diagram of Stochastic Gradient Descent in high-dimensional two-layer neural networks

Despite the non-convex optimization landscape, over-parametrized shallow...
research
06/10/2020

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

We analyze in a closed form the learning dynamics of stochastic gradient...
research
02/16/2021

Analysis of feature learning in weight-tied autoencoders via the mean field lens

Autoencoders are among the earliest introduced nonlinear models for unsu...
research
02/23/2023

Phase diagram of training dynamics in deep neural networks: effect of learning rate, depth, and width

We systematically analyze optimization dynamics in deep neural networks ...
research
06/08/2022

High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

We study the scaling limits of stochastic gradient descent (SGD) with co...
research
01/12/2022

Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

We study the dynamics of a neural network in function space when optimiz...

Please sign up or login with your details

Forgot password? Click here to reset