Adaptive scaling of the learning rate by second order automatic differentiation

10/26/2022
by   Frédéric De Gournay, et al.
0

In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the curvature, a second order information whose computational complexity is in between the computation of the gradient and the one of the Hessian-vector product. If (1C,1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C,2M) or (2C,1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameters set and convergence of the algorithm. The rescaling is adaptive, it depends on the data and on the direction of descent. The numerical experiments highlight the different exploration/convergence regimes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/09/2022

Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization

A new gradient-based optimization approach by automatically scheduling t...
research
12/20/2019

Second-order Information in First-order Optimization Methods

In this paper, we try to uncover the second-order essence of several fir...
research
03/14/2017

Online Learning Rate Adaptation with Hypergradient Descent

We introduce a general method for improving the convergence rate of grad...
research
05/08/2023

ASDL: A Unified Interface for Gradient Preconditioning in PyTorch

Gradient preconditioning is a key technique to integrate the second-orde...
research
06/03/2023

Correcting auto-differentiation in neural-ODE training

Does the use of auto-differentiation yield reasonable updates to deep ne...
research
09/11/2021

Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information

We present a novel adaptive optimization algorithm for large-scale machi...
research
02/10/2020

Super-efficiency of automatic differentiation for functions defined as a minimum

In min-min optimization or max-min optimization, one has to compute the ...

Please sign up or login with your details

Forgot password? Click here to reset