Convergence of Batch Stochastic Gradient Descent Methods with Approximate Gradients and/or Noisy Measurements: Theory and Computational Results

09/12/2022
by   Rajeeva L. Karandikar, et al.
0

In this paper, we study convex optimization using a very general formulation called BSGD (Block Stochastic Gradient Descent). At each iteration, some but not necessary all components of the argument are updated. The direction of the update can be one of two possibilities: (i) A noise-corrupted measurement of the true gradient, or (ii) an approximate gradient computed using a first-order approximation, using function values that might themselves be corrupted by noise. This formulation embraces most of the currently used stochastic gradient methods. We establish conditions for BSGD to converge to the global minimum, based on stochastic approximation theory. Then we verify the predicted convergence through numerical experiments. Out results show that when approximate gradients are used, BSGD converges while momentum-based methods can diverge. However, not just our BSGD, but also standard (full-update) gradient descent, and various momentum-based methods, all converge, even with noisy gradients.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

The plain stochastic gradient descent and momentum stochastic gradient d...
research
03/28/2023

Convergence of Momentum-Based Heavy Ball Method with Batch Updating and/or Approximate Gradients

In this paper, we study the well-known "Heavy Ball" method for convex an...
research
02/12/2022

Formalization of a Stochastic Approximation Theorem

Stochastic approximation algorithms are iterative procedures which are u...
research
01/29/2014

RES: Regularized Stochastic BFGS Algorithm

RES, a regularized stochastic version of the Broyden-Fletcher-Goldfarb-S...
research
07/16/2019

SGD momentum optimizer with step estimation by online parabola model

In stochastic gradient descent, especially for neural network training, ...
research
03/23/2020

Explore Aggressively, Update Conservatively: Stochastic Extragradient Methods with Variable Stepsize Scaling

Owing to their stability and convergence speed, extragradient methods ha...
research
06/05/2018

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization

Adaptive gradient methods such as AdaGrad and its variants update the st...

Please sign up or login with your details

Forgot password? Click here to reset