Empirical study towards understanding line search approximations for training neural networks

09/15/2019
by   Younghwan Chae, et al.
45

Choosing appropriate step sizes is critical for reducing the computational cost of training large-scale neural network models. Mini-batch sub-sampling (MBSS) is often employed for computational tractability. However, MBSS introduces a sampling error, that can manifest as a bias or variance in a line search. This is because MBSS can be performed statically, where the mini-batch is updated only when the search direction changes, or dynamically, where the mini-batch is updated every-time the function is evaluated. Static MBSS results in a smooth loss function along a search direction, reflecting low variance but large bias in the estimated "true" (or full batch) minimum. Conversely, dynamic MBSS results in a point-wise discontinuous function, with computable gradients using backpropagation, along a search direction, reflecting high variance but lower bias in the estimated "true" (or full batch) minimum. In this study, quadratic line search approximations are considered to study the quality of function and derivative information to construct approximations for dynamic MBSS loss functions. An empirical study is conducted where function and derivative information are enforced in various ways for the quadratic approximations. The results for various neural network problems show that being selective on what information is enforced helps to reduce the variance of predicted step sizes.

READ FULL TEXT

page 20

page 21

page 23

research
05/23/2021

GOALS: Gradient-Only Approximations for Line Searches Towards Robust and Consistent Training of Deep Neural Networks

Mini-batch sub-sampling (MBSS) is favored in deep neural network trainin...
research
08/31/2021

Using a one dimensional parabolic model of the full-batch loss to estimate learning rates during training

A fundamental challenge in Deep Learning is to find optimal step sizes f...
research
02/23/2020

Investigating the interaction between gradient-only line searches and different activation functions

Gradient-only line searches (GOLS) adaptively determine step sizes along...
research
04/01/2022

Estimating the Jacobian matrix of an unknown multivariate function from sample values by means of a neural network

We describe, implement and test a novel method for training neural netwo...
research
03/22/2019

Gradient-only line searches: An Alternative to Probabilistic Line Searches

Step sizes in neural network training are largely determined using prede...
research
05/05/2020

Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

The choice of hyper-parameters affects the performance of neural models....

Please sign up or login with your details

Forgot password? Click here to reset