DoWG Unleashed: An Efficient Universal Parameter-Free Gradient Descent Method

05/25/2023
by   Ahmed Khaled, et al.
0

This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient – matching the convergence rate of optimally tuned gradient descent in convex optimization up to a logarithmic factor without tuning any parameters, and universal – automatically adapting to both smooth and nonsmooth problems. While popular algorithms such as AdaGrad, Adam, or DoG compute a running average of the squared gradients, DoWG maintains a new distance-based weighted version of the running average, which is crucial to achieve the desired properties. To our best knowledge, DoWG is the first parameter-free, efficient, and universal algorithm that does not require backtracking search procedures. It is also the first parameter-free AdaGrad style algorithm that adapts to smooth optimization. To complement our theory, we also show empirically that DoWG trains at the edge of stability, and validate its effectiveness on practical machine learning tasks. This paper further uncovers the underlying principle behind the success of the AdaGrad family of algorithms by presenting a novel analysis of Normalized Gradient Descent (NGD), that shows NGD adapts to smoothness when it exists, with no change to the stepsize. This establishes the universality of NGD and partially explains the empirical observation that it trains at the edge of stability in a much more general setup compared to standard gradient descent. The latter might be of independent interest to the community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2023

ELRA: Exponential learning rate adaption gradient descent optimization method

We present a novel, fast (exponential rate adaption), ab initio (hyper-p...
research
02/06/2023

Optimization using Parallel Gradient Evaluations on Multiple Parameters

We propose a first-order method for convex optimization, where instead o...
research
05/31/2023

Parameter-free projected gradient descent

We consider the problem of minimizing a convex function over a closed co...
research
06/11/2019

Power Gradient Descent

The development of machine learning is promoting the search for fast and...
research
01/09/2019

The Lingering of Gradients: How to Reuse Gradients over Time

Classically, the time complexity of a first-order method is estimated by...
research
02/12/2021

MetaGrad: Adaptation using Multiple Learning Rates in Online Learning

We provide a new adaptive method for online convex optimization, MetaGra...
research
11/08/2021

Gradient-Descent for Randomized Controllers under Partial Observability

Randomization is a powerful technique to create robust controllers, in p...

Please sign up or login with your details

Forgot password? Click here to reset