Convergence of Momentum-Based Heavy Ball Method with Batch Updating and/or Approximate Gradients

In this paper, we study the well-known "Heavy Ball" method for convex and nonconvex optimization introduced by Polyak in 1964, and establish its convergence under a variety of situations. Traditionally, most algorthms use "full-coordinate update," that is, at each step, very component of the argument is updated. However, when the dimension of the argument is very high, it is more efficient to update some but not all components of the argument at each iteration. We refer to this as "batch updating" in this paper. When gradient-based algorithms are used together with batch updating, in principle it is sufficient to compute only those components of the gradient for which the argument is to be updated. However, if a method such as back propagation is used to compute these components, computing only some components of gradient does not offer much savings over computing the entire gradient. Therefore, to achieve a noticeable reduction in CPU usage at each step, one can use first-order differences to approximate the gradient. The resulting estimates are biased, and also have unbounded variance. Thus some delicate analysis is required to ensure that the HB algorithm converge when batch updating is used instead of full-coordinate updating, and/or approximate gradients are used instead of true gradients. In this paper, we not only establish the almost sure convergence of the iterations to the stationary point(s) of the objective function, but also derive upper bounds on the rate of convergence. To the best of our knowledge, there is no other paper that combines all of these features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/26/2017

A Robust Multi-Batch L-BFGS Method for Machine Learning

This paper describes an implementation of the L-BFGS method designed to ...
research
05/19/2016

A Multi-Batch L-BFGS Method for Machine Learning

The question of how to parallelize the stochastic gradient descent (SGD)...
research
08/31/2023

Frank-Wolfe algorithm for DC optimization problem

In the present paper, we formulate two versions of Frank–Wolfe algorithm...
research
09/08/2021

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

The stochastic approximation (SA) algorithm is a widely used probabilist...
research
07/23/2019

Heavy-ball Algorithms Always Escape Saddle Points

Nonconvex optimization algorithms with random initialization have attrac...
research
08/12/2019

An Adaptive s-step Conjugate Gradient Algorithm with Dynamic Basis Updating

The adaptive s-step CG algorithm is a solver for sparse, symmetric posit...

Please sign up or login with your details

Forgot password? Click here to reset