Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

06/15/2022
by   Courtney Paquette, et al.
0

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem. Even in the simple setting of convex quadratic problems, worst-case analyses give an asymptotic convergence rate for SGD that is no better than full-batch gradient descent (GD), and the purported implicit regularization effects of SGD lack a precise explanation. In this work, we study the dynamics of multi-pass SGD on high-dimensional convex quadratics and establish an asymptotic equivalence to a stochastic differential equation, which we call homogenized stochastic gradient descent (HSGD), whose solutions we characterize explicitly in terms of a Volterra integral equation. These results yield precise formulas for the learning and risk trajectories, which reveal a mechanism of implicit conditioning that explains the efficiency of SGD relative to GD. We also prove that the noise from SGD negatively impacts generalization performance, ruling out the possibility of any type of implicit regularization in this context. Finally, we show how to adapt the HSGD formalism to include streaming SGD, which allows us to produce an exact prediction for the excess risk of multi-pass SGD relative to that of streaming SGD (bootstrap risk).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/11/2021

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the...
research
05/14/2022

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

We develop a stochastic differential equation, called homogenized SGD, f...
research
03/07/2022

Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

Stochastic gradient descent (SGD) has achieved great success due to its ...
research
04/13/2023

High-dimensional limit of one-pass SGD on least squares

We give a description of the high-dimensional limit of one-pass single-b...
research
02/27/2022

Benign Underfitting of Stochastic Gradient Descent

We study to what extent may stochastic gradient descent (SGD) be underst...
research
02/08/2021

SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

We propose a new framework, inspired by random matrix theory, for analyz...
research
08/17/2023

Hitting the High-Dimensional Notes: An ODE for SGD learning dynamics on GLMs and multi-index models

We analyze the dynamics of streaming stochastic gradient descent (SGD) i...

Please sign up or login with your details

Forgot password? Click here to reset