SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients

06/15/2021
by   Feihu Huang, et al.
13

Adaptive gradient methods have shown excellent performance for solving many machine learning problems. Although multiple adaptive methods were recently studied, they mainly focus on either empirical or theoretical aspects and also only work for specific problems by using specific adaptive learning rates. It is desired to design a universal framework for practical algorithms of adaptive gradients with theoretical guarantee to solve general problems. To fill this gap, we propose a faster and universal framework of adaptive gradients (i.e., SUPER-ADAM) by introducing a universal adaptive matrix that includes most existing adaptive gradient forms. Moreover, our framework can flexibly integrates the momentum and variance reduced techniques. In particular, our novel framework provides the convergence analysis support for adaptive gradient methods under the nonconvex setting. In theoretical analysis, we prove that our new algorithm can achieve the best known complexity of Õ(ϵ^-3) for finding an ϵ-stationary point of nonconvex optimization, which matches the lower bound for stochastic smooth nonconvex optimization. In numerical experiments, we employ various deep learning tasks to validate that our algorithm consistently outperforms the existing adaptive algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2018

On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization

Adaptive gradient methods are workhorses in deep learning. However, the ...
research
03/07/2023

Enhanced Adaptive Gradient Algorithms for Nonconvex-PL Minimax Optimization

In the paper, we study a class of nonconvex nonconcave minimax optimizat...
research
10/03/2020

Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties

Many popular adaptive gradient methods such as Adam and RMSProp rely on ...
research
11/03/2022

Faster Adaptive Momentum-Based Federated Methods for Distributed Composition Optimization

Composition optimization recently appears in many machine learning appli...
research
06/30/2021

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive gradient descent asc...
research
06/21/2021

BiAdam: Fast Adaptive Bilevel Optimization Methods

Bilevel optimization recently has attracted increased interest in machin...
research
06/04/2022

A Control Theoretic Framework for Adaptive Gradient Optimizers in Machine Learning

Adaptive gradient methods have become popular in optimizing deep neural ...

Please sign up or login with your details

Forgot password? Click here to reset