On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization

08/08/2018
by   Xiangyi Chen, et al.
0

This paper studies a class of adaptive gradient based momentum algorithms that update the search directions and learning rates simultaneously using past gradients. This class, which we refer to as the "Adam-type", includes the popular algorithms such as the Adam, AMSGrad and AdaGrad. Despite their popularity in training deep neural networks, the convergence of these algorithms for solving nonconvex problems remains an open question. This paper provides a set of mild sufficient conditions that guarantee the convergence for the Adam-type methods. We prove that under our derived conditions, these methods can achieve the convergence rate of order O(T/√(T)) for nonconvex stochastic optimization. We show the conditions are essential in the sense that violating them may make the algorithm diverge. Moreover, we propose and analyze a class of (deterministic) incremental adaptive gradient algorithms, which has the same O(T/√(T)) convergence rate. Our study could also be extended to a broader class of adaptive gradient methods in machine learning and optimization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2019

ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

The adaptive momentum method (AdaMM), which uses past gradients to updat...
research
05/19/2020

Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems

In this paper, we design and analyze a new family of adaptive subgradien...
research
04/16/2019

Global Error Bounds and Linear Convergence for Gradient-Based Algorithms for Trend Filtering and ℓ_1-Convex Clustering

We propose a class of first-order gradient-type optimization algorithms ...
research
08/16/2018

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Adaptive gradient methods are workhorses in deep learning. However, the ...
research
12/10/2020

Asymptotic study of stochastic adaptive algorithm in non-convex landscape

This paper studies some asymptotic properties of adaptive algorithms wid...
research
03/17/2017

Algorithm/Architecture Co-design of Proportionate-type LMS Adaptive Filters for Sparse System Identification

This paper investigates the problem of implementing proportionate-type L...
research
03/29/2022

Convergence and Complexity of Stochastic Subgradient Methods with Dependent Data for Nonconvex Optimization

We show that under a general dependent data sampling scheme, the classic...

Please sign up or login with your details

Forgot password? Click here to reset