AEGD: Adaptive Gradient Decent with Energy

10/10/2020
by   Hailiang Liu, et al.
0

In this paper, we propose AEGD, a new algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive updates of quadratic energy. As long as an objective function is bounded from below, AEGD can be applied, and it is shown to be unconditionally energy stable, irrespective of the step size. In addition, AEGD enjoys tight convergence rates, yet allows a large step size. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for various optimization problems: it is robust with respect to initial data, capable of making rapid initial progress, shows comparable and most times better generalization performance than SGD with momentum for deep neural networks. The implementation of the algorithm can be found at https://github.com/txping/AEGD.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/18/2019

Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization

Although ADAM is a very popular algorithm for optimizing the weights of ...
01/18/2018

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem whi...
06/26/2020

Relative gradient optimization of the Jacobian term in unsupervised deep learning

Learning expressive probabilistic models correctly describing the data i...
09/12/2019

diffGrad: An Optimization Method for Convolutional Neural Networks

Stochastic Gradient Decent (SGD) is one of the core techniques behind th...
06/11/2020

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

As adaptive gradient methods are typically used for training over-parame...
11/30/2021

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

Inspired by ideas from optimal transport theory we present Trust the Cri...
01/31/2019

Improving SGD convergence by tracing multiple promising directions and estimating distance to minimum

Deep neural networks are usually trained with stochastic gradient descen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.