SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

02/08/2021
by   Courtney Paquette, et al.
0

We propose a new framework, inspired by random matrix theory, for analyzing the dynamics of stochastic gradient descent (SGD) when both number of samples and dimensions are large. This framework applies to any fixed stepsize and the finite sum setting. Using this new framework, we show that the dynamics of SGD on a least squares problem with random data become deterministic in the large sample and dimensional limit. Furthermore, the limiting dynamics are governed by a Volterra integral equation. This model predicts that SGD undergoes a phase transition at an explicitly given critical stepsize that ultimately affects its convergence rate, which we also verify experimentally. Finally, when input data is isotropic, we provide explicit expressions for the dynamics and average-case convergence rates (i.e., the complexity of an algorithm averaged over all possible inputs). These rates show significant improvement over the worst-case complexities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2022

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

We develop a stochastic differential equation, called homogenized SGD, f...
research
06/07/2021

Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

We analyze a class of stochastic gradient algorithms with momentum on a ...
research
06/08/2020

Halting Time is Predictable for Large Models: A Universality Property and Average-case Analysis

Average-case analysis computes the complexity of an algorithm averaged o...
research
06/15/2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Stochastic gradient descent (SGD) is a pillar of modern machine learning...
research
07/19/2021

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

In this work we explore the limiting dynamics of deep neural networks tr...
research
04/13/2023

High-dimensional limit of one-pass SGD on least squares

We give a description of the high-dimensional limit of one-pass single-b...
research
04/18/2020

On Tight Convergence Rates of Without-replacement SGD

For solving finite-sum optimization problems, SGD without replacement sa...

Please sign up or login with your details

Forgot password? Click here to reset