Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization

11/18/2019
by   Anas Barakat, et al.
0

Although ADAM is a very popular algorithm for optimizing the weights of neural networks, it has been recently shown that it can diverge even in simple convex optimization examples. Several variants of ADAM have been proposed to circumvent this convergence issue. In this work, we study the ADAM algorithm for smooth nonconvex optimization under a boundedness assumption on the adaptive learning rate. The bound on the adaptive step size depends on the Lipschitz constant of the gradient of the objective function and provides safe theoretical adaptive step sizes. Under this boundedness assumption, we show a novel first order convergence rate result in both deterministic and stochastic contexts. Furthermore, we establish convergence rates of the function value sequence using the Kurdyka-Lojasiewicz property.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2021

On the Convergence of Step Decay Step-Size for Stochastic Optimization

The convergence of stochastic gradient descent is highly dependent on th...
research
02/23/2023

A subgradient method with constant step-size for ℓ_1-composite optimization

Subgradient methods are the natural extension to the non-smooth case of ...
research
09/29/2018

AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods

Adam is shown not being able to converge to the optimal solution in cert...
research
11/24/2020

Sequential convergence of AdaGrad algorithm for smooth convex optimization

We prove that the iterates produced by, either the scalar step size vari...
research
08/31/2023

Frank-Wolfe algorithm for DC optimization problem

In the present paper, we formulate two versions of Frank–Wolfe algorithm...
research
10/10/2020

AEGD: Adaptive Gradient Decent with Energy

In this paper, we propose AEGD, a new algorithm for first-order gradient...
research
12/10/2020

Asymptotic study of stochastic adaptive algorithm in non-convex landscape

This paper studies some asymptotic properties of adaptive algorithms wid...

Please sign up or login with your details

Forgot password? Click here to reset