Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training

08/31/2021
by   Maximus Mutschler, et al.
0

A fundamental challenge in Deep Learning is to find optimal step sizes for stochastic gradient descent. In traditional optimization, line searches are a commonly used method to determine step sizes. One problem in Deep Learning is that finding appropriate step sizes on the full-batch loss is unfeasible expensive. Therefore, classical line search approaches, designed for losses without inherent noise, are usually not applicable. Recent empirical findings suggest that the full-batch loss behaves locally parabolically in the direction of noisy update step directions. Furthermore, the trend of the optimal update step size is changing slowly. By exploiting these findings, this work introduces a line-search method that approximates the full-batch loss with a parabola estimated over several mini-batches. Learning rates are derived from such parabolas during training. In the experiments conducted, our approach mostly outperforms SGD tuned with a piece-wise constant learning rate schedule and other line search approaches for Deep Learning across models, datasets, and batch sizes on validation and test accuracy.

READ FULL TEXT

page 6

page 7

page 14

research
10/02/2020

A straightforward line search approach on the expected empirical loss for stochastic deep learning problems

A fundamental challenge in deep learning is that the optimal step sizes ...
research
03/31/2021

Empirically explaining SGD from a line search perspective

Optimization in Deep Learning is mainly guided by vague intuitions and s...
research
10/19/2016

An Efficient Minibatch Acceptance Test for Metropolis-Hastings

We present a novel Metropolis-Hastings method for large datasets that us...
research
06/22/2023

Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models

Recent works have shown that line search methods can speed up Stochastic...
research
09/15/2019

Empirical study towards understanding line search approximations for training neural networks

Choosing appropriate step sizes is critical for reducing the computation...
research
03/22/2019

Gradient-only line searches: An Alternative to Probabilistic Line Searches

Step sizes in neural network training are largely determined using prede...
research
10/06/2020

A Closer Look at Codistillation for Distributed Training

Codistillation has been proposed as a mechanism to share knowledge among...

Please sign up or login with your details

Forgot password? Click here to reset