Heavy Ball Momentum for Conditional Gradient

by   Bingcong Li, et al.
University of Minnesota

Conditional gradient, aka Frank Wolfe (FW) algorithms, have well-documented merits in machine learning and signal processing applications. Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW. Specifically, it is established that heavy ball offers a unifying perspective on the primal-dual (PD) convergence, and enjoys a tighter per iteration PD error rate, for multiple choices of step sizes, where PD error can serve as the stopping criterion in practice. In addition, it is asserted that restart, a scheme typically employed jointly with Nesterov's momentum, can further tighten this PD error bound. Numerical results demonstrate the usefulness of heavy ball momentum in FW iterations.


page 1

page 2

page 3

page 4


Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular clas...

On the fast convergence of minibatch heavy ball momentum

Simple stochastic momentum methods are widely used in machine learning o...

Online Signal Recovery via Heavy Ball Kaczmarz

Recovering a signal x^∗∈ℝ^n from a sequence of linear measurements is an...

Minimal error momentum Bregman-Kaczmarz

The Bregman-Kaczmarz method is an iterative method which can solve stron...

A high-resolution dynamical view on momentum methods for over-parameterized neural networks

In this paper, we present the convergence analysis of momentum methods i...

Momentum Accelerated Multigrid Methods

In this paper, we propose two momentum accelerated MG cycles. The main i...

Does Momentum Help? A Sample Complexity Analysis

Momentum methods are popularly used in accelerating stochastic iterative...

Please sign up or login with your details

Forgot password? Click here to reset