Dynamics of Stochastic Momentum Methods on Large-scale, Quadratic Models

06/07/2021
by   Courtney Paquette, et al.
0

We analyze a class of stochastic gradient algorithms with momentum on a high-dimensional random least squares problem. Our framework, inspired by random matrix theory, provides an exact (deterministic) characterization for the sequence of loss values produced by these algorithms which is expressed only in terms of the eigenvalues of the Hessian. This leads to simple expressions for nearly-optimal hyperparameters, a description of the limiting neighborhood, and average-case complexity. As a consequence, we show that (small-batch) stochastic heavy-ball momentum with a fixed momentum parameter provides no actual performance improvement over SGD when step sizes are adjusted correctly. For contrast, in the non-strongly convex setting, it is possible to get a large improvement over SGD using momentum. By introducing hyperparameters that depend on the number of samples, we propose a new algorithm sDANA (stochastic dimension adjusted Nesterov acceleration) which obtains an asymptotically optimal average-case complexity while remaining linearly convergent in the strongly convex setting without adjusting parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/02/2022

Trajectory of Mini-Batch Momentum: Batch Size Saturation and Convergence in High Dimensions

We analyze the dynamics of large batch stochastic gradient descent with ...
research
03/04/2021

Correcting Momentum with Second-order Information

We develop a new algorithm for non-convex stochastic optimization that f...
research
02/08/2021

SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality

We propose a new framework, inspired by random matrix theory, for analyz...
research
08/10/2018

Weighted AdaGrad with Unified Momentum

Integrating adaptive learning rate and momentum techniques into SGD lead...
research
06/14/2020

On the convergence of the Stochastic Heavy Ball Method

We provide a comprehensive analysis of the Stochastic Heavy Ball (SHB) m...
research
02/12/2020

Average-case Acceleration Through Spectral Density Estimation

We develop a framework for designing optimal quadratic optimization meth...
research
02/23/2021

Just a Momentum: Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problem

When optimizing over loss functions it is common practice to use momentu...

Please sign up or login with your details

Forgot password? Click here to reset