An Adaptive Memory Multi-Batch L-BFGS Algorithm for Neural Network Training

12/14/2020
by   Federico Zocco, et al.
0

Motivated by the potential for parallel implementation of batch-based algorithms and the accelerated convergence achievable with approximated second order information a limited memory version of the BFGS algorithm has been receiving increasing attention in recent years for large neural network training problems. As the shape of the cost function is generally not quadratic and only becomes approximately quadratic in the vicinity of a minimum, the use of second order information by L-BFGS can be unreliable during the initial phase of training, i.e. when far from a minimum. Therefore, to control the influence of second order information as training progresses, we propose a multi-batch L-BFGS algorithm, namely MB-AM, that gradually increases its trust in the curvature information by implementing a progressive storage and use of curvature data through a development-based increase (dev-increase) scheme. Using six discriminative modelling benchmark problems we show empirically that MB-AM has slightly faster convergence and, on average, achieves better solutions than the standard multi-batch L-BFGS algorithm when training MLP and CNN models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2020

Second-order Neural Network Training Using Complex-step Directional Derivative

While the superior performance of second-order optimization methods such...
research
06/02/2023

MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates

This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer U...
research
05/01/2023

ISAAC Newton: Input-based Approximate Curvature for Newton's Method

We present ISAAC (Input-baSed ApproximAte Curvature), a novel method tha...
research
05/23/2018

A Two-Stage Subspace Trust Region Approach for Deep Neural Network Training

In this paper, we develop a novel second-order method for training feed-...
research
11/08/2018

Measuring the Effects of Data Parallelism on Neural Network Training

Recent hardware developments have made unprecedented amounts of data par...
research
10/10/2022

Second-order regression models exhibit progressive sharpening to the edge of stability

Recent studies of gradient descent with large step sizes have shown that...
research
11/28/2015

Newton-Stein Method: An optimization method for GLMs via Stein's Lemma

We consider the problem of efficiently computing the maximum likelihood ...

Please sign up or login with your details

Forgot password? Click here to reset