Risk Bounds of Multi-Pass SGD for Least Squares in the Interpolation Regime

03/07/2022
by   Difan Zou, et al.
4

Stochastic gradient descent (SGD) has achieved great success due to its superior performance in both optimization and generalization. Most of existing generalization analyses are made for single-pass SGD, which is a less practical variant compared to the commonly-used multi-pass SGD. Besides, theoretical analyses for multi-pass SGD often concern a worst-case instance in a class of problems, which may be pessimistic to explain the superior generalization ability for some particular problem instance. The goal of this paper is to sharply characterize the generalization of multi-pass SGD, by developing an instance-dependent excess risk bound for least squares in the interpolation regime, which is expressed as a function of the iteration number, stepsize, and data covariance. We show that the excess risk of SGD can be exactly decomposed into the excess risk of GD and a positive fluctuation error, suggesting that SGD always performs worse, instance-wisely, than GD, in generalization. On the other hand, we show that although SGD needs more iterations than GD to achieve the same level of excess risk, it saves the number of stochastic gradient evaluations, and therefore is preferable in terms of computational time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2022

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

Stochastic gradient descent (SGD) is a pillar of modern machine learning...
research
02/27/2022

Benign Underfitting of Stochastic Gradient Descent

We study to what extent may stochastic gradient descent (SGD) be underst...
research
07/11/2021

SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the...
research
03/03/2023

Learning High-Dimensional Single-Neuron ReLU Networks with Finite Samples

This paper considers the problem of learning a single ReLU neuron with s...
research
05/25/2018

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

We consider stochastic gradient descent (SGD) for least-squares regressi...
research
06/05/2023

Aiming towards the minimizers: fast convergence of SGD for overparametrized problems

Modern machine learning paradigms, such as deep learning, occur in or cl...
research
08/10/2021

The Benefits of Implicit Regularization from SGD in Least Squares Problems

Stochastic gradient descent (SGD) exhibits strong algorithmic regulariza...

Please sign up or login with your details

Forgot password? Click here to reset