Gradient descent with momentum — to accelerate or to super-accelerate?

01/17/2020
by   Goran Nakerst, et al.
23

We consider gradient descent with `momentum', a widely used method for loss function minimization in machine learning. This method is often used with `Nesterov acceleration', meaning that the gradient is evaluated not at the current position in parameter space, but at the estimated position after one step. In this work, we show that the algorithm can be improved by extending this `acceleration' — by using the gradient at an estimated position several steps ahead rather than just one step ahead. How far one looks ahead in this `super-acceleration' algorithm is determined by a new hyperparameter. Considering a one-parameter quadratic loss function, the optimal value of the super-acceleration can be exactly calculated and analytically estimated. We show explicitly that super-accelerating the momentum algorithm is beneficial, not only for this idealized problem, but also for several synthetic loss landscapes and for the MNIST classification task with neural networks. Super-acceleration is also easy to incorporate into adaptive algorithms like RMSProp or Adam, and is shown to improve these algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2020

Provable Acceleration of Neural Net Training via Polyak's Momentum

Incorporating a so-called "momentum" dynamic in gradient descent methods...
research
07/01/2021

Fast Margin Maximization via Dual Acceleration

We present and analyze a momentum-based gradient method for training lin...
research
04/16/2022

On Acceleration of Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

We study the acceleration of the Local Polynomial Interpolation-based Gr...
research
08/25/2022

Accelerated Sparse Recovery via Gradient Descent with Nonlinear Conjugate Gradient Momentum

This paper applies an idea of adaptive momentum for the nonlinear conjug...
research
06/08/2020

The Golden Ratio of Learning and Momentum

Gradient descent has been a central training principle for artificial ne...
research
05/24/2018

Nonlinear Acceleration of Deep Neural Networks

Regularized nonlinear acceleration (RNA) is a generic extrapolation sche...
research
02/12/2020

Average-case Acceleration Through Spectral Density Estimation

We develop a framework for designing optimal quadratic optimization meth...

Please sign up or login with your details

Forgot password? Click here to reset