DeepAI AI Chat
Log In Sign Up

Decaying momentum helps neural network training

10/11/2019
by   John Chen, et al.
Rice University
0

Momentum is a simple and popular technique in deep learning for gradient-based optimizers. We propose a decaying momentum (Demon) rule, motivated by decaying the total contribution of a gradient to all future updates. Applying Demon to Adam leads to significantly improved training, notably competitive to momentum SGD with learning rate decay, even in settings in which adaptive methods are typically non-competitive. Similarly, applying Demon to momentum SGD rivals momentum SGD with learning rate decay, and in many cases leads to improved performance. Demon is trivial to implement and incurs limited extra computational overhead, compared to the vanilla counterparts.

READ FULL TEXT
07/27/2023

The Marginal Value of Momentum for Small Learning Rate SGD

Momentum is known to accelerate the convergence of gradient descent in s...
07/02/2020

Adaptive Braking for Mitigating Gradient Delay

Neural network training is commonly accelerated by using multiple synchr...
11/01/2017

Don't Decay the Learning Rate, Increase the Batch Size

It is common practice to decay the learning rate. Here we show one can u...
09/05/2023

A Simple Asymmetric Momentum Make SGD Greatest Again

We propose the simplest SGD enhanced method ever, Loss-Controlled Asymme...
02/02/2022

Robust Training of Neural Networks using Scale Invariant Architectures

In contrast to SGD, adaptive gradient methods like Adam allow robust tra...
05/14/2019

Robust Neural Network Training using Periodic Sampling over Model Weights

Deep neural networks provide best-in-class performance for a number of c...
11/04/2020

Reverse engineering learned optimizers reveals known and novel mechanisms

Learned optimizers are algorithms that can themselves be trained to solv...