Scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent

06/12/2021
by   Kun Zeng, et al.
0

The plain stochastic gradient descent and momentum stochastic gradient descent have extremely wide applications in deep learning due to their simple settings and low computational complexity. The momentum stochastic gradient descent uses the accumulated gradient as the updated direction of the current parameters, which has a faster training speed. Because the direction of the plain stochastic gradient descent has not been corrected by the accumulated gradient. For the parameters that currently need to be updated, it is the optimal direction, and its update is more accurate. We combine the advantages of the momentum stochastic gradient descent with fast training speed and the plain stochastic gradient descent with high accuracy, and propose a scaling transition from momentum stochastic gradient descent to plain stochastic gradient descent(TSGD) method. At the same time, a learning rate that decreases linearly with the iterations is used instead of a constant learning rate. The TSGD algorithm has a larger step size in the early stage to speed up the training, and training with a smaller step size in the later stage can steadily converge. Our experimental results show that the TSGD algorithm has faster training speed, higher accuracy and better stability. Our implementation is available at: https://github.com/kunzeng/TSGD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2021

Decreasing scaling transition from adaptive gradient descent to stochastic gradient descent

Currently, researchers have proposed the adaptive gradient descent algor...
research
12/03/2020

SSGD: A safe and efficient method of gradient descent

With the vigorous development of artificial intelligence technology, var...
research
03/14/2017

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of grad...
research
09/05/2017

Stochastic Gradient Descent: Going As Fast As Possible But Not Faster

When applied to training deep neural networks, stochastic gradient desce...
research
06/08/2020

The Golden Ratio of Learning and Momentum

Gradient descent has been a central training principle for artificial ne...

Please sign up or login with your details

Forgot password? Click here to reset