A Markov Chain Theory Approach to Characterizing the Minimax Optimality of Stochastic Gradient Descent (for Least Squares)

10/25/2017
by   Prateek Jain, et al.
0

This work provides a simplified proof of the statistical minimax optimality of (iterate averaged) stochastic gradient descent (SGD), for the special case of least squares. This result is obtained by analyzing SGD as a stochastic process and by sharply characterizing the stationary covariance matrix of this process. The finite rate optimality characterization captures the constant factors and addresses model mis-specification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2021

The convergence of the Stochastic Gradient Descent (SGD) : a self-contained proof

We give here a proof of the convergence of the Stochastic Gradient Desce...
research
05/25/2018

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

We consider stochastic gradient descent (SGD) for least-squares regressi...
research
06/16/2020

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

We study the problem of least squares linear regression where the data-p...
research
04/26/2017

Accelerating Stochastic Gradient Descent

There is widespread sentiment that it is not possible to effectively uti...
research
06/13/2022

Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients

Minimizing the inclusive Kullback-Leibler (KL) divergence with stochasti...
research
03/23/2021

Benign Overfitting of Constant-Stepsize SGD for Linear Regression

There is an increasing realization that algorithmic inductive biases are...
research
09/15/2022

Efficiency Ordering of Stochastic Gradient Descent

We consider the stochastic gradient descent (SGD) algorithm driven by a ...

Please sign up or login with your details

Forgot password? Click here to reset