Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

02/02/2022
by   Maksim Velikanov, et al.
0

Performance of optimization on quadratic problems sensitively depends on the low-lying part of the spectrum. For large (effectively infinite-dimensional) problems, this part of the spectrum can often be naturally represented or approximated by power law distributions. In this paper we perform a systematic study of a range of classical single-step and multi-step first order optimization algorithms, with adaptive and non-adaptive, constant and non-constant learning rates: vanilla Gradient Descent, Steepest Descent, Heavy Ball, and Conjugate Gradients. For each of these, we prove that a power law spectral assumption entails a power law for convergence rate of the algorithm, with the convergence rate exponent given by a specific multiple of the spectral exponent. We establish both upper and lower bounds, showing that the results are tight. Finally, we demonstrate applications of these results to kernel learning and training of neural networks in the NTK regime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2013

Optimal rates for zero-order convex optimization: the power of two function evaluations

We consider derivative-free algorithms for stochastic and non-stochastic...
research
06/22/2020

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

We analyze the convergence of the averaged stochastic gradient descent f...
research
07/05/2021

The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities

In this paper, we analyze the local convergence rate of optimistic mirro...
research
05/02/2021

Universal scaling laws in the gradient descent training of neural networks

Current theoretical results on optimization trajectories of neural netwo...
research
01/05/2021

On the convergence rate of the Kačanov scheme for shear-thinning fluids

We explore the convergence rate of the Kačanov iteration scheme for diff...
research
10/10/2021

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

We study the random reshuffling (RR) method for smooth nonconvex optimiz...
research
07/30/2019

Universality of power-law exponents by means of maximum likelihood estimation

Power-law type distributions are extensively found when studying the beh...

Please sign up or login with your details

Forgot password? Click here to reset