Provable Acceleration of Neural Net Training via Polyak's Momentum

10/04/2020
by   Jun-Kun Wang, et al.
0

Incorporating a so-called "momentum" dynamic in gradient descent methods is widely used in neural net training as it has been broadly observed that, at least empirically, it often leads to significantly faster convergence. At the same time, there are very few theoretical guarantees in the literature to explain this apparent acceleration effect. In this paper we show that Polyak's momentum, in combination with over-parameterization of the model, helps achieve faster convergence in training a one-layer ReLU network on n examples. We show specifically that gradient descent with Polyak's momentum decreases the initial training error at a rate much faster than that of vanilla gradient descent. We provide a bound for a fixed sample size n, and we show that gradient descent with Polyak's momentum converges at an accelerated rate to a small error that is controllable by the number of neurons m. Prior work [DZPS19] showed that using vanilla gradient descent, and with a similar method of over-parameterization, the error decays as (1-κ_n)^t after t iterations, where κ_n is a problem-specific parameter. Our result shows that with the appropriate choice of parameters one has a rate of (1-√(κ_n))^t. This work establishes that momentum does indeed speed up neural net training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Towards understanding how momentum improves generalization in deep learning

Stochastic gradient descent (SGD) with momentum is widely used for train...
research
09/14/2020

A Qualitative Study of the Dynamic Behavior of Adaptive Gradient Algorithms

The dynamic behavior of RMSprop and Adam algorithms is studied through a...
research
04/05/2022

Gradient Descent Bit-Flipping Decoding with Momentum

In this paper, we propose a Gradient Descent Bit-Flipping (GDBF) decodin...
research
01/17/2020

Gradient descent with momentum — to accelerate or to super-accelerate?

We consider gradient descent with `momentum', a widely used method for l...
research
04/01/2018

Aggregated Momentum: Stability Through Passive Damping

Momentum is a simple and widely used trick which allows gradient-based o...
research
11/02/2021

An Asymptotic Analysis of Minibatch-Based Momentum Methods for Linear Regression Models

Momentum methods have been shown to accelerate the convergence of the st...
research
06/08/2022

Hidden Markov Models with Momentum

Momentum is a popular technique for improving convergence rates during g...

Please sign up or login with your details

Forgot password? Click here to reset