Learning Rate Schedules in the Presence of Distribution Shift

03/27/2023
by   Matthew Fahrbach, et al.
0

We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2020

Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems

Learning rate schedule can significantly affect generalization performan...
research
03/02/2022

Adaptive Gradient Methods with Local Guarantees

Adaptive gradient methods are the method of choice for optimization in m...
research
06/23/2021

Best-Case Lower Bounds in Online Learning

Much of the work in online learning focuses on the study of sublinear up...
research
04/29/2019

The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure

There is a stark disparity between the step size schedules used in pract...
research
02/09/2022

Optimal learning rate schedules in high-dimensional non-convex optimization problems

Learning rate schedules are ubiquitously used to speed up and improve op...
research
06/11/2023

Parameter-free version of Adaptive Gradient Methods for Strongly-Convex Functions

The optimal learning rate for adaptive gradient methods applied to λ-str...
research
10/28/2011

Adaptive Hedge

Most methods for decision-theoretic online learning are based on the Hed...

Please sign up or login with your details

Forgot password? Click here to reset