Neograd: gradient descent with an adaptive learning rate

10/15/2020
by   Michael F. Zimmer, et al.
0

Since its inception by Cauchy in 1847, the gradient descent algorithm has been without guidance as to how to efficiently set the learning rate. This paper identifies a concept, defines metrics, and introduces algorithms to provide such guidance. The result is a family of algorithms (Neograd) based on a constant ρ ansatz, where ρ is a metric based on the error of the updates. This allows one to adjust the learning rate at each step, using a formulaic estimate based on ρ. It is now no longer necessary to do trial runs beforehand to estimate a single learning rate for an entire optimization run. The additional costs to operate this metric are trivial. One member of this family of algorithms, NeogradM, can quickly reach much lower cost function values than other first order algorithms. Comparisons are made mainly between NeogradM and Adam on an array of test functions and on a neural network model for identifying hand-written digits. The results show great performance improvements with NeogradM.

READ FULL TEXT
research
01/27/2018

Gradient descent revisited via an adaptive online learning rate

Any gradient descent optimization requires to choose a learning rate. Wi...
research
08/23/2023

Non-ergodic linear convergence property of the delayed gradient descent under the strongly convexity and the Polyak-Łojasiewicz condition

In this work, we establish the linear convergence estimate for the gradi...
research
10/19/2022

Differentiable Self-Adaptive Learning Rate

Learning rate adaptation is a popular topic in machine learning. Gradien...
research
09/27/2020

Faster Biological Gradient Descent Learning

Back-propagation is a popular machine learning algorithm that uses gradi...
research
03/05/2021

Unintended Effects on Adaptive Learning Rate for Training Neural Network with Output Scale Change

A multiplicative constant scaling factor is often applied to the model o...
research
05/11/2019

Linear Range in Gradient Descent

This paper defines linear range as the range of parameter perturbations ...
research
08/01/2022

Dynamic Batch Adaptation

Current deep learning adaptive optimizer methods adjust the step magnitu...

Please sign up or login with your details

Forgot password? Click here to reset