Provably Faster Gradient Descent via Long Steps

07/12/2023
by   Benjamin Grimmer, et al.
Johns Hopkins University
0

This work establishes provably faster convergence rates for gradient descent in smooth convex optimization via a computer-assisted analysis technique. Our theory allows nonconstant stepsize policies with frequent long steps potentially violating descent by analyzing the overall effect of many iterations at once rather than the typical one-iteration inductions used in most first-order method analyses. We show that long steps, which may increase the objective value in the short term, lead to provably faster convergence in the long term. A conjecture towards proving a faster O(1/Tlog T) rate for gradient descent is also motivated along with simple numerical validation.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/28/2017

Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent

Nesterov's accelerated gradient descent (AGD), an instance of the genera...
04/21/2020

AdaX: Adaptive Gradient Descent with Exponential Long Term Memory

Although adaptive optimization algorithms such as Adam show fast converg...
05/15/2023

Convex optimization over a probability simplex

We propose a new iteration scheme, the Cauchy-Simplex, to optimize conve...
11/15/2016

The Power of Normalization: Faster Evasion of Saddle Points

A commonly used heuristic in non-convex optimization is Normalized Gradi...
11/30/2021

Survey Descent: A Multipoint Generalization of Gradient Descent for Nonsmooth Optimization

For strongly convex objectives that are smooth, the classical theory of ...
12/04/2019

Exponential convergence of Sobolev gradient descent for a class of nonlinear eigenproblems

We propose to use the Łojasiewicz inequality as a general tool for analy...
03/20/2022

Convergence rates of the stochastic alternating algorithm for bi-objective optimization

Stochastic alternating algorithms for bi-objective optimization are cons...

Code Repositories

LongStepCertificates

Certificates proving the convergence rates claimed in Table 1 of the (forthcoming) paper "Provably Faster Gradient Descent via Long Steps" by Benjamin Grimmer. The Mathematica notebooks include everything in rational form and computations (exact arithmetic) verifying all of the need (spectral) properties of the certificates.


view repo

Please sign up or login with your details

Forgot password? Click here to reset