Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent

06/12/2021
by   Kun Zeng, et al.
0

Currently, researchers have proposed the adaptive gradient descent algorithm and its variants, such as AdaGrad, RMSProp, Adam, AmsGrad, etc. Although these algorithms have a faster speed in the early stage, the generalization ability in the later stage of training is often not as good as the stochastic gradient descent. Recently, some researchers have combined the adaptive gradient descent and stochastic gradient descent to obtain the advantages of both and achieved good results. Based on this research, we propose a decreasing scaling transition from adaptive gradient descent to stochastic gradient descent method(DSTAda). For the training stage of the stochastic gradient descent, we use a learning rate that decreases linearly with the number of iterations instead of a constant learning rate. We achieve a smooth and stable transition from adaptive gradient descent to stochastic gradient descent through scaling. At the same time, we give a theoretical proof of the convergence of DSTAda under the framework of online learning. Our experimental results show that the DSTAda algorithm has a faster convergence speed, higher accuracy, and better stability and robustness. Our implementation is available at: https://github.com/kunzeng/DSTAdam.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

The plain stochastic gradient descent and momentum stochastic gradient d...
research
12/24/2019

CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity

Most optimizers including stochastic gradient descent (SGD) and its adap...
research
04/21/2016

Stabilized Sparse Online Learning for Sparse Data

Stochastic gradient descent (SGD) is commonly used for optimization in l...
research
08/24/2020

Noise-induced degeneration in online learning

In order to elucidate the plateau phenomena caused by vanishing gradient...
research
02/09/2016

Poor starting points in machine learning

Poor (even random) starting points for learning/training/optimization ar...
research
01/19/2020

Dual Stochastic Natural Gradient Descent

Although theoretically appealing, Stochastic Natural Gradient Descent (S...
research
06/21/2019

Adaptive Learning Rate Clipping Stabilizes Learning

Artificial neural network training with stochastic gradient descent can ...

Please sign up or login with your details

Forgot password? Click here to reset