Investigating Alternatives to the Root Mean Square for Adaptive Gradient Methods

06/10/2021
by   Brett Daley, et al.
0

Adam is an adaptive gradient method that has experienced widespread adoption due to its fast and reliable training performance. Recent approaches have not offered significant improvement over Adam, often because they do not innovate upon one of its core features: normalization by the root mean square (RMS) of recent gradients. However, as noted by Kingma and Ba (2015), any number of L^p normalizations are possible, with the RMS corresponding to the specific case of p=2. In our work, we theoretically and empirically characterize the influence of different L^p norms on adaptive gradient methods for the first time. We show mathematically how the choice of p influences the size of the steps taken, while leaving other desirable properties unaffected. We evaluate Adam with various L^p norms on a suite of deep learning benchmarks, and find that p > 2 consistently leads to improved learning speed and final performance. The choices of p=3 or p=6 also match or outperform state-of-the-art methods in all of our experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/13/2019

Generating Adversarial Perturbation with Root Mean Square Gradient

Deep Neural Models are vulnerable to adversarial perturbations in classi...
research
09/24/2022

Two Bicomplex Least Mean Square (BLMS) algorithms

We study and introduce new gradient operators in the complex and bicompl...
research
10/16/2019

Root Mean Square Layer Normalization

Layer normalization (LayerNorm) has been successfully applied to various...
research
09/20/2016

Distributed Adaptive Learning of Graph Signals

The aim of this paper is to propose distributed strategies for adaptive ...
research
12/20/2019

Second-order Information in First-order Optimization Methods

In this paper, we try to uncover the second-order essence of several fir...
research
08/06/2019

On cylindrical regression in three-dimensional Euclidean space

The three-dimensional cylindrical regression problem is a problem of fin...

Please sign up or login with your details

Forgot password? Click here to reset