High-dimensional limit theorems for SGD: Effective dynamics and critical scaling

06/08/2022
by   Gérard Ben Arous, et al.
0

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. Interestingly, we find a critical scaling regime for the step-size below which the effective ballistic dynamics matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram. About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate. We demonstrate our approach on popular examples including estimation for spiked matrix and tensor models and classification via two-layer networks for binary and XOR-type Gaussian mixture models. These examples exhibit surprising phenomena including multimodal timescales to convergence as well as convergence to sub-optimal solutions with probability bounded away from zero from random (e.g., Gaussian) initializations.

READ FULL TEXT
research
06/19/2020

On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

This paper analyzes the trajectories of stochastic gradient descent (SGD...
research
06/10/2020

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

We analyze in a closed form the learning dynamics of stochastic gradient...
research
04/13/2023

High-dimensional limit of one-pass SGD on least squares

We give a description of the high-dimensional limit of one-pass single-b...
research
10/13/2021

On the Double Descent of Random Features Models Trained with SGD

We study generalization properties of random features (RF) regression in...
research
02/12/2023

From high-dimensional mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks

This manuscript investigates the one-pass stochastic gradient descent (S...
research
03/23/2020

A classification for the performance of online SGD for high-dimensional inference

Stochastic gradient descent (SGD) is a popular algorithm for optimizatio...
research
10/15/2017

The Scaling Limit of High-Dimensional Online Independent Component Analysis

We analyze the dynamics of an online algorithm for independent component...

Please sign up or login with your details

Forgot password? Click here to reset