AEGD: Adaptive Gradient Decent with Energy

by   Hailiang Liu, et al.

In this paper, we propose AEGD, a new algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive updates of quadratic energy. As long as an objective function is bounded from below, AEGD can be applied, and it is shown to be unconditionally energy stable, irrespective of the step size. In addition, AEGD enjoys tight convergence rates, yet allows a large step size. The method is straightforward to implement and requires little tuning of hyper-parameters. Experimental results demonstrate that AEGD works well for various optimization problems: it is robust with respect to initial data, capable of making rapid initial progress, shows comparable and most times better generalization performance than SGD with momentum for deep neural networks. The implementation of the algorithm can be found at



There are no comments yet.


page 1

page 2

page 3

page 4


Convergence Analysis of a Momentum Algorithm with Adaptive Step Size for Non Convex Optimization

Although ADAM is a very popular algorithm for optimizing the weights of ...

When Does Stochastic Gradient Algorithm Work Well?

In this paper, we consider a general stochastic optimization problem whi...

Relative gradient optimization of the Jacobian term in unsupervised deep learning

Learning expressive probabilistic models correctly describing the data i...

diffGrad: An Optimization Method for Convolutional Neural Networks

Stochastic Gradient Decent (SGD) is one of the core techniques behind th...

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

As adaptive gradient methods are typically used for training over-parame...

Trust the Critics: Generatorless and Multipurpose WGANs with Initial Convergence Guarantees

Inspired by ideas from optimal transport theory we present Trust the Cri...

Improving SGD convergence by tracing multiple promising directions and estimating distance to minimum

Deep neural networks are usually trained with stochastic gradient descen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.