From Averaging to Acceleration, There is Only a Step-size

by   Nicolas Flammarion, et al.
Cole Normale Suprieure

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system , showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.


page 1

page 2

page 3

page 4


Convergence Acceleration via Chebyshev Step: Plausible Interpretation of Deep-Unfolded Gradient Descent

Deep unfolding is a promising deep-learning technique, whose network arc...

Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

Can we accelerate convergence of gradient descent without changing the a...

Learning to Accelerate by the Methods of Step-size Planning

Gradient descent is slow to converge for ill-conditioned problems and no...

Gradient Temporal Difference with Momentum: Stability and Convergence

Gradient temporal difference (Gradient TD) algorithms are a popular clas...

Non-ergodic Convergence Analysis of Heavy-Ball Algorithms

In this paper, we revisit the convergence of the Heavy-ball method, and ...

Theoretical Interpretation of Learned Step Size in Deep-Unfolded Gradient Descent

Deep unfolding is a promising deep-learning technique in which an iterat...

Please sign up or login with your details

Forgot password? Click here to reset