Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

04/16/2018
by   Dmitry Babichev, et al.
0

Stochastic gradient methods enable learning probabilistic models from large amounts of data. While large step-sizes (learning rates) have shown to be best for least-squares (e.g., Gaussian noise) once combined with parameter averaging, these are not leading to convergent algorithms in general. In this paper, we consider generalized linear models, that is, conditional models based on exponential families. We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent. For finite-dimensional models, we show that this can sometimes (and surprisingly) lead to better predictions than the best linear model. For infinite-dimensional models, we show that it always converges to optimal predictions, while averaging natural parameters never does. We illustrate our findings with simulations on synthetic data and classical benchmarks with many observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2023

Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

Stochastic gradient descent is a workhorse for training deep neural netw...
research
04/03/2022

Understanding the unstable convergence of gradient descent

Most existing analyses of (stochastic) gradient descent rely on the cond...
research
05/25/2018

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

We consider stochastic gradient descent (SGD) for least-squares regressi...
research
02/22/2018

Iterate averaging as regularization for stochastic gradient descent

We propose and analyze a variant of the classic Polyak-Ruppert averaging...
research
07/25/2022

Statistical Inference with Stochastic Gradient Algorithms

Tuning of stochastic gradient algorithms (SGAs) for optimization and sam...
research
02/22/2019

Beating SGD Saturation with Tail-Averaging and Minibatching

While stochastic gradient descent (SGD) is one of the major workhorses i...
research
05/02/2023

Random Function Descent

While gradient based methods are ubiquitous in machine learning, selecti...

Please sign up or login with your details

Forgot password? Click here to reset