Understanding and Detecting Convergence for Stochastic Gradient Descent with Momentum

08/27/2020
by   Jerry Chee, et al.
38

Convergence detection of iterative stochastic optimization methods is of great practical interest. This paper considers stochastic gradient descent (SGD) with a constant learning rate and momentum. We show that there exists a transient phase in which iterates move towards a region of interest, and a stationary phase in which iterates remain bounded in that region around a minimum point. We construct a statistical diagnostic test for convergence to the stationary phase using the inner product between successive gradients and demonstrate that the proposed diagnostic works well. We theoretically and empirically characterize how momentum can affect the test statistic of the diagnostic, and how the test statistic captures a relatively sparse signal within the gradients in convergence. Finally, we demonstrate an application to automatically tune the learning rate by reducing it each time stationarity is detected, and show the procedure is robust to mis-specified initial rates.

READ FULL TEXT
research
10/17/2017

Convergence diagnostics for stochastic gradient descent with constant step size

Iterative procedures in stochastic optimization are typically comprised ...
research
10/18/2019

Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic

This paper proposes SplitSGD, a new stochastic optimization algorithm wi...
research
07/01/2020

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

Constant step-size Stochastic Gradient Descent exhibits two phases: a tr...
research
09/21/2019

Using Statistics to Automate Stochastic Optimization

Despite the development of numerous adaptive optimizers, tuning the lear...
research
01/16/2013

Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients

Recent work has established an empirically successful framework for adap...
research
09/22/2021

On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

In this study, we demonstrate that the norm test and inner product/ortho...
research
02/22/2022

Convergence of online k-means

We prove asymptotic convergence for a general class of k-means algorithm...

Please sign up or login with your details

Forgot password? Click here to reset