On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

05/21/2018
by   Xiaoyu Li, et al.
0

Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze the use of adaptive stepsizes, like the ones in AdaGrad, in the non-convex setting. We show sufficient conditions for almost sure convergence to a stationary point when the adaptive stepsizes are used, proving the first guarantee for AdaGrad in the non-convex setting. Moreover, we show explicit rates of convergence that automatically interpolates between O(1/T) and O(1/√(T)) depending on the noise of the stochastic gradients, in both the convex and non-convex setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/02/2019

Convergence rates for the stochastic gradient descent method for non-convex objective functions

We prove the local convergence to minima and estimates on the rate of co...
research
07/31/2022

Formal guarantees for heuristic optimization algorithms used in machine learning

Recently, Stochastic Gradient Descent (SGD) and its variants have become...
research
04/21/2020

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Although adaptive optimization algorithms such as Adam show fast converg...
research
11/06/2018

Double Adaptive Stochastic Gradient Optimization

Adaptive moment methods have been remarkably successful in deep learning...
research
01/25/2019

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization

Stochastic Gradient Descent (SGD) has played a central role in machine l...
research
09/23/2019

Necessary and Sufficient Conditions for Adaptive, Mirror, and Standard Gradient Methods

We study the impact of the constraint set and gradient geometry on the c...
research
06/17/2023

Adaptive Strategies in Non-convex Optimization

An algorithm is said to be adaptive to a certain parameter (of the probl...

Please sign up or login with your details

Forgot password? Click here to reset