Exploit Where Optimizer Explores via Residuals

04/11/2020
by   An Xu, et al.
0

To train neural networks faster, many research efforts have been devoted to exploring a better gradient descent trajectory, but few have been put into exploiting the intermediate results. In this work we propose a novel optimization method named (momentum) stochastic gradient descent with residuals (RSGD(m)) to exploit the gradient descent trajectory using proper residual schemes, which leads to a performance boost of both the convergence and generalization. We provide theoretic analysis to show that RSGD can achieve a smaller growth rate of the generalization error and the same convergence rate compared with SGD. Extensive deep learning experimental results of the image classification and word-level language model empirically show that both the convergence and generalization of our RSGD(m) method are improved significantly compared with the existing SGD(m) algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Scheduled Restart Momentum for Accelerated Stochastic Gradient Descent

Stochastic gradient descent (SGD) with constant momentum and its variant...
research
03/31/2022

Exploiting Explainable Metrics for Augmented SGD

Explaining the generalization characteristics of deep learning is an eme...
research
12/04/2020

A Variant of Gradient Descent Algorithm Based on Gradient Averaging

In this work, we study an optimizer, Grad-Avg to optimize error function...
research
07/13/2022

Towards understanding how momentum improves generalization in deep learning

Stochastic gradient descent (SGD) with momentum is widely used for train...
research
06/12/2022

Stochastic Gradient Descent without Full Data Shuffle

Stochastic gradient descent (SGD) is the cornerstone of modern machine l...
research
02/28/2020

Do optimization methods in deep learning applications matter?

With advances in deep learning, exponential data growth and increasing m...
research
12/24/2020

AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy

Optimizers that further adjust the scale of gradient, such as Adam, Natu...

Please sign up or login with your details

Forgot password? Click here to reset