Does Momentum Help? A Sample Complexity Analysis

10/29/2021
by   Gugan Thoppe, et al.
0

Momentum methods are popularly used in accelerating stochastic iterative methods. Although a fair amount of literature is dedicated to momentum in stochastic optimisation, there are limited results that quantify the benefits of using heavy ball momentum in the specific case of stochastic approximation algorithms. We first show that the convergence rate with optimal step size does not improve when momentum is used (under some assumptions). Secondly, to quantify the behaviour in the initial phase we analyse the sample complexity of iterates with and without momentum. We show that the sample complexity bound for SA without momentum is 𝒪̃(1/αλ_min(A)) while for SA with momentum is 𝒪̃(1/√(αλ_min(A))), where α is the step size and λ_min(A) is the smallest eigenvalue of the driving matrix A. Although the sample complexity bound for SA with momentum is better for small enough α, it turns out that for optimal choice of α in the two cases, the sample complexity bounds are of the same order.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2021

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular clas...
research
06/15/2022

On the fast convergence of minibatch heavy ball momentum

Simple stochastic momentum methods are widely used in machine learning o...
research
09/17/2018

Zap Meets Momentum: Stochastic Approximation Algorithms with Optimal Convergence Rate

There are two well known Stochastic Approximation techniques that are kn...
research
02/12/2022

From Online Optimization to PID Controllers: Mirror Descent with Momentum

We study a family of first-order methods with momentum based on mirror d...
research
03/14/2016

On the Influence of Momentum Acceleration on Online Learning

The article examines in some detail the convergence rate and mean-square...
research
10/08/2021

Heavy Ball Momentum for Conditional Gradient

Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documen...
research
08/20/2021

Practical and Fast Momentum-Based Power Methods

The power method is a classical algorithm with broad applications in mac...

Please sign up or login with your details

Forgot password? Click here to reset