Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

06/22/2023
by   Leonardo Galli, et al.
0

Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function. We explore nonmonotone line search methods to relax this condition and possibly accept larger step sizes. Despite the lack of a monotonic decrease, we prove the same fast rates of convergence as in the monotone case. Our experiments show that nonmonotone methods improve the speed of convergence and generalization properties of SGD/Adam even beyond the previous monotone line searches. We propose a POlyak NOnmonotone Stochastic (PoNoS) method, obtained by combining a nonmonotone line search with a Polyak initial step size. Furthermore, we develop a new resetting technique that in the majority of the iterations reduces the amount of backtracks to zero while still maintaining a large initial step size. To the best of our knowledge, a first runtime comparison shows that the epoch-wise advantage of line-search-based methods gets reflected in the overall computational time.

READ FULL TEXT
research
05/24/2019

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Recent works have shown that stochastic gradient descent (SGD) achieves ...
research
08/31/2021

Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training

A fundamental challenge in Deep Learning is to find optimal step sizes f...
research
10/02/2020

A straightforward line search approach on the expected empirical loss for stochastic deep learning problems

A fundamental challenge in deep learning is that the optimal step sizes ...
research
05/30/2023

BiSLS/SPS: Auto-tune Step Sizes for Stable Bi-level Optimization

The popularity of bi-level optimization (BO) in deep learning has spurre...
research
03/22/2019

Gradient-only line searches: An Alternative to Probabilistic Line Searches

Step sizes in neural network training are largely determined using prede...
research
09/04/2021

Nonmonotone Local Minimax Methods for Finding Multiple Saddle Points

In this paper, combining normalized nonmonotone search strategies with t...
research
07/26/2023

Function Value Learning: Adaptive Learning Rates Based on the Polyak Stepsize and Function Splitting in ERM

Here we develop variants of SGD (stochastic gradient descent) with an ad...

Please sign up or login with your details

Forgot password? Click here to reset