Provable Acceleration of Neural Net Training via Polyak's Momentum

10/04/2020

∙

Incorporating a so-called "momentum" dynamic in gradient descent methods is widely used in neural net training as it has been broadly observed that, at least empirically, it often leads to significantly faster convergence. At the same time, there are very few theoretical guarantees in the literature to explain this apparent acceleration effect. In this paper we show that Polyak's momentum, in combination with over-parameterization of the model, helps achieve faster convergence in training a one-layer ReLU network on n examples. We show specifically that gradient descent with Polyak's momentum decreases the initial training error at a rate much faster than that of vanilla gradient descent. We provide a bound for a fixed sample size n, and we show that gradient descent with Polyak's momentum converges at an accelerated rate to a small error that is controllable by the number of neurons m. Prior work [DZPS19] showed that using vanilla gradient descent, and with a similar method of over-parameterization, the error decays as (1-κ_n)^t after t iterations, where κ_n is a problem-specific parameter. Our result shows that with the appropriate choice of parameters one has a rate of (1-√(κ_n))^t. This work establishes that momentum does indeed speed up neural net training.

READ FULL TEXT

Provable Acceleration of Neural Net Training via Polyak's Momentum

Sign in with Google

Consider DeepAI Pro