Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

09/14/2023
by   Jason M. Altschuler, et al.
0

Can we accelerate convergence of gradient descent without changing the algorithm – just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in k^log_ρ 2≈ k^0.7864 iterations, where ρ=1+√(2) is the silver ratio and k is the condition number. This is intermediate between the textbook unaccelerated rate k and the accelerated rate √(k) due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate ε^-log_ρ 2≈ε^-0.7864. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period k^log_ρ 2. This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).

READ FULL TEXT
research
04/07/2015

From Averaging to Acceleration, There is Only a Step-size

We show that accelerated gradient descent, averaged gradient descent and...
research
02/26/2020

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Due to the high communication cost in distributed and federated learning...
research
06/01/2018

Run Procrustes, Run! On the convergence of accelerated Procrustes Flow

In this work, we present theoretical results on the convergence of non-c...
research
03/03/2022

Accelerated SGD for Non-Strongly-Convex Least Squares

We consider stochastic approximation for the least squares regression pr...
research
02/07/2022

Nesterov Accelerated Shuffling Gradient Method for Convex Optimization

In this paper, we propose Nesterov Accelerated Shuffling Gradient (NASG)...
research
03/01/2021

Acceleration via Fractal Learning Rate Schedules

When balancing the practical tradeoffs of iterative methods for large-sc...
research
06/16/2020

Hessian-Free High-Resolution Nesterov Acceleration for Sampling

We propose an accelerated-gradient-based MCMC method. It relies on a mod...

Please sign up or login with your details

Forgot password? Click here to reset