BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

05/30/2023
by   Chen Fan, et al.
0

The popularity of bi-level optimization (BO) in deep learning has spurred a growing interest in studying gradient-based BO algorithms. However, existing algorithms involve two coupled learning rates that can be affected by approximation errors when computing hypergradients, making careful fine-tuning necessary to ensure fast convergence. To alleviate this issue, we investigate the use of recently proposed adaptive step-size methods, namely stochastic line search (SLS) and stochastic Polyak step size (SPS), for computing both the upper and lower-level learning rates. First, we revisit the use of SLS and SPS in single-level optimization without the additional interpolation condition that is typically assumed in prior works. For such settings, we investigate new variants of SLS and SPS that improve upon existing suggestions in the literature and are simpler to implement. Importantly, these two variants can be seen as special instances of general family of methods with an envelope-type step-size. This unified envelope strategy allows for the extension of the algorithms and their convergence guarantees to BO settings. Finally, our extensive experiments demonstrate that the new algorithms, which are available in both SGD and Adam versions, can find large learning rates with minimal tuning and converge faster than corresponding vanilla SGD or Adam BO algorithms that require fine-tuning.

READ FULL TEXT
research
07/20/2022

Adaptive Step-Size Methods for Compressed SGD

Compressed Stochastic Gradient Descent (SGD) algorithms have been recent...
research
06/11/2020

Adaptive Gradient Methods Converge Faster with Over-Parameterization (and you can do a line-search)

As adaptive gradient methods are typically used for training over-parame...
research
02/24/2022

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

Tuning the step size of stochastic gradient descent is tedious and error...
research
10/21/2019

Faster Stochastic Algorithms via History-Gradient Aided Batch Size Adaptation

Various schemes for adapting batch size have been recently proposed to a...
research
06/22/2023

Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Recent works have shown that line search methods can speed up Stochastic...
research
10/10/2020

AEGD: Adaptive Gradient Decent with Energy

In this paper, we propose AEGD, a new algorithm for first-order gradient...
research
08/11/2023

Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

The recently proposed stochastic Polyak stepsize (SPS) and stochastic li...

Please sign up or login with your details

Forgot password? Click here to reset