AI Chat AI Image Generator AI Video Text to Speech

QLAB: Quadratic Loss Approximation-Based Optimal Learning Rate for Deep Learning

02/01/2023

∙

by Minghan Fu, et al.

∙

∙

We propose a learning rate adaptation scheme, called QLAB, for descent optimizers. We derive QLAB by optimizing the quadratic approximation of the loss function and QLAB can be combined with any optimizer who can provide the descent update direction. The computation of an adaptive learning rate with QLAB requires only computing an extra loss function value. We theoretically prove the convergence of the descent optimizers with QLAB. We demonstrate the effectiveness of QLAB in a range of optimization problems by combining with conclusively stochastic gradient descent, stochastic gradient descent with momentum, and Adam. The performance is validated on multi-layer neural networks, CNN, VGG-Net, ResNet and ShuffleNet with two datasets, MNIST and CIFAR10.

Minghan Fu
2 publications
Fang-Xiang Wu
1 publication

page 1

page 2

page 3

page 4

research

∙ 03/14/2017

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of grad...

0 Atilim Gunes Baydin, et al. ∙

research

∙ 03/15/2020

Stochastic gradient descent with random learning rate

We propose to optimize neural networks with a uniformly-distributed rand...

0 Daniele Musso, et al. ∙

research

∙ 04/07/2020

Automatic, Dynamic, and Nearly Optimal Learning Rate Specification by Local Quadratic Approximation

In deep learning tasks, the learning rate determines the update step siz...

3 Yingqiu Zhu, et al. ∙

research

∙ 06/08/2020

The Golden Ratio of Learning and Momentum

Gradient descent has been a central training principle for artificial ne...

0 Stefan Jaeger, et al. ∙

research

∙ 12/20/2022

Normalized Stochastic Gradient Descent Training of Deep Neural Networks

In this paper, we introduce a novel optimization algorithm for machine l...

0 Salih Atici, et al. ∙

research

∙ 02/14/2018

L4: Practical loss-based stepsize adaptation for deep learning

We propose a stepsize adaptation scheme for stochastic gradient descent....

0 Rolinek Michal, et al. ∙

research

∙ 02/07/2019

Combining learning rate decay and weight decay with complexity gradient descent - Part I

The role of L^2 regularization, in the specific case of deep neural netw...

0 Pierre H. Richemond, et al. ∙