Resolving learning rates adaptively by locating Stochastic Non-Negative Associated Gradient Projection Points using line searches

01/15/2020
by   Dominic Kafka, et al.
0

Learning rates in stochastic neural network training are currently determined a priori to training, using expensive manual or automated iterative tuning. This study proposes gradient-only line searches to resolve the learning rate for neural network training algorithms. Stochastic sub-sampling during training decreases computational cost and allows the optimization algorithms to progress over local minima. However, it also results in discontinuous cost functions. Minimization line searches are not effective in this context, as they use a vanishing derivative (first order optimality condition), which often do not exist in a discontinuous cost function and therefore converge to discontinuities as opposed to minima from the data trends. Instead, we base candidate solutions along a search direction purely on gradient information, in particular by a directional derivative sign change from negative to positive (a Non-negative Associative Gradient Projection Point (NN- GPP)). Only considering a sign change from negative to positive always indicates a minimum, thus NN-GPPs contain second order information. Conversely, a vanishing gradient is purely a first order condition, which may indicate a minimum, maximum or saddle point. This insight allows the learning rate of an algorithm to be reliably resolved as the step size along a search direction, increasing convergence performance and eliminating an otherwise expensive hyperparameter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2019

Gradient-only line searches: An Alternative to Probabilistic Line Searches

Step sizes in neural network training are largely determined using prede...
research
02/23/2020

Investigating the interaction between gradient-only line searches and different activation functions

Gradient-only line searches (GOLS) adaptively determine step sizes along...
research
06/29/2020

Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms

Gradient-only and probabilistic line searches have recently reintroduced...
research
05/23/2021

GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks

Mini-batch sub-sampling (MBSS) is favored in deep neural network trainin...
research
09/11/2021

Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information

We present a novel adaptive optimization algorithm for large-scale machi...
research
03/29/2017

Probabilistic Line Searches for Stochastic Optimization

In deterministic optimization, line searches are a standard tool ensurin...

Please sign up or login with your details

Forgot password? Click here to reset