Linear Range in Gradient Descent

05/11/2019
by   Angxiu Ni, et al.
0

This paper defines linear range as the range of parameter perturbations which approximately leads to linear perturbations in states. We compute linear range by comparing the actual perturbations in states and the tangent solution of a network. Linear range is a new criterion for gradients to be meaningful, thus having many possible applications. In particular, we propose that the optimal learning rate at the beginning of training can be found automatically, by selecting a stepsize such that all minibatches are within linear range. We demonstrate our algorithm on a network with canonical architecture and a ResNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/23/2020

Implicit Gradient Regularization

Gradient descent can be surprisingly good at optimizing deep neural netw...
research
04/20/2023

Angle based dynamic learning rate for gradient descent

In our work, we propose a novel yet simple approach to obtain an adaptiv...
research
09/27/2020

Faster Biological Gradient Descent Learning

Back-propagation is a popular machine learning algorithm that uses gradi...
research
05/17/2021

An SDE Framework for Adversarial Training, with Convergence and Robustness Analysis

Adversarial training has gained great popularity as one of the most effe...
research
10/15/2020

Neograd: gradient descent with an adaptive learning rate

Since its inception by Cauchy in 1847, the gradient descent algorithm ha...
research
02/11/2022

Improving Generalization via Uncertainty Driven Perturbations

Recently Shah et al., 2020 pointed out the pitfalls of the simplicity bi...
research
02/09/2016

Herding as a Learning System with Edge-of-Chaos Dynamics

Herding defines a deterministic dynamical system at the edge of chaos. I...

Please sign up or login with your details

Forgot password? Click here to reset