Tight Convergence Rate Bounds for Optimization Under Power Law Spectral Conditions

02/02/2022
by   Maksim Velikanov, et al.
0

Performance of optimization on quadratic problems sensitively depends on the low-lying part of the spectrum. For large (effectively infinite-dimensional) problems, this part of the spectrum can often be naturally represented or approximated by power law distributions. In this paper we perform a systematic study of a range of classical single-step and multi-step first order optimization algorithms, with adaptive and non-adaptive, constant and non-constant learning rates: vanilla Gradient Descent, Steepest Descent, Heavy Ball, and Conjugate Gradients. For each of these, we prove that a power law spectral assumption entails a power law for convergence rate of the algorithm, with the convergence rate exponent given by a specific multiple of the spectral exponent. We establish both upper and lower bounds, showing that the results are tight. Finally, we demonstrate applications of these results to kernel learning and training of neural networks in the NTK regime.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

12/07/2013

Optimal rates for zero-order convex optimization: the power of two function evaluations

We consider derivative-free algorithms for stochastic and non-stochastic...
06/22/2020

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

We analyze the convergence of the averaged stochastic gradient descent f...
07/05/2021

The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities

In this paper, we analyze the local convergence rate of optimistic mirro...
05/02/2021

Universal scaling laws in the gradient descent training of neural networks

Current theoretical results on optimization trajectories of neural netwo...
10/10/2021

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

We study the random reshuffling (RR) method for smooth nonconvex optimiz...
01/05/2021

On the convergence rate of the Kačanov scheme for shear-thinning fluids

We explore the convergence rate of the Kačanov iteration scheme for diff...
06/06/2019

A Look at the Effect of Sample Design on Generalization through the Lens of Spectral Analysis

This paper provides a general framework to study the effect of sampling ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.