Optimal learning rate schedules in high-dimensional non-convex optimization problems

02/09/2022
by   Stéphane d'Ascoli, et al.
0

Learning rate schedules are ubiquitously used to speed up and improve optimisation. Many different policies have been introduced on an empirical basis, and theoretical analyses have been developed for convex settings. However, in many realistic problems the loss-landscape is high-dimensional and non convex – a case for which results are scarce. In this paper we present a first analytical study of the role of learning rate scheduling in this setting, focusing on Langevin optimization with a learning rate decaying as η(t)=t^-β. We begin by considering models where the loss is a Gaussian random function on the N-dimensional sphere (N→∞), featuring an extensive number of critical points. We find that to speed up optimization without getting stuck in saddles, one must choose a decay rate β<1, contrary to convex setups where β=1 is generally optimal. We then add to the problem a signal to be recovered. In this setting, the dynamics decompose into two phases: an exploration phase where the dynamics navigates through rough parts of the landscape, followed by a convergence phase where the signal is detected and the dynamics enter a convex basin. In this case, it is optimal to keep a large learning rate during the exploration phase to escape the non-convex region as quickly as possible, then use the convex criterion β=1 to converge rapidly to the solution. Finally, we demonstrate that our conclusions hold in a common regression task involving neural networks.

READ FULL TEXT
research
05/15/2020

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

Learning rate schedule can significantly affect generalization performan...
research
09/20/2019

Learning an Adaptive Learning Rate Schedule

The learning rate is one of the most important hyper-parameters for mode...
research
02/15/2015

Equilibrated adaptive learning rates for non-convex optimization

Parameter-specific adaptive learning rate methods are computationally ef...
research
03/27/2023

Learning Rate Schedules in the Presence of Distribution Shift

We design learning rate schedules that minimize regret for SGD-based onl...
research
04/09/2023

μ^2-SGD: Stable Stochastic Optimization via a Double Momentum Mechanism

We consider stochastic convex optimization problems where the objective ...
research
12/09/2021

Extending AdamW by Leveraging Its Second Moment and Magnitude

Recent work [4] analyses the local convergence of Adam in a neighbourhoo...
research
11/23/2018

A Sufficient Condition for Convergences of Adam and RMSProp

Adam and RMSProp, as two of the most influential adaptive stochastic alg...

Please sign up or login with your details

Forgot password? Click here to reset