Convergence of ADAM with Constant Step Size in Non-Convex Settings: A Simple Proof

09/15/2023
by   Alokendu Mazumder, et al.
0

In neural network training, RMSProp and ADAM remain widely favoured optimization algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. It is worth noting that these algorithms performance can vary considerably, depending on the chosen step sizes. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyze a constant stepsize version of ADAM in the non-convex setting. We show sufficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions. We also provide runtime bounds for deterministic ADAM to reach approximate criticality when working with smooth, non-convex functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on th...
research
07/31/2022

Formal guarantees for heuristic optimization algorithms used in machine learning

Recently, Stochastic Gradient Descent (SGD) and its variants have become...
research
06/05/2021

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Many popular learning-rate schedules for deep neural networks combine a ...
research
08/20/2018

Universal Stagewise Learning for Non-Convex Problems with Convergence on Averaged Solutions

Although stochastic gradient descent () method and its variants (e.g., s...
research
05/18/2020

Convergence of constant step stochastic gradient descent for non-smooth non-convex functions

This paper studies the asymptotic behavior of the constant step Stochast...
research
02/18/2018

Convergence of Online Mirror Descent Algorithms

In this paper we consider online mirror descent (OMD) algorithms, a clas...
research
10/15/2019

Adaptive Step Sizes in Variance Reduction via Regularization

The main goal of this work is equipping convex and nonconvex problems wi...

Please sign up or login with your details

Forgot password? Click here to reset