On the efficiency of Stochastic Quasi-Newton Methods for Deep Learning

05/18/2022
by   Mahsa Yousefi, et al.
0

While first-order methods are popular for solving optimization problems that arise in large-scale deep learning problems, they come with some acute deficiencies. To diminish such shortcomings, there has been recent interest in applying second-order methods such as quasi-Newton based methods which construct Hessians approximations using only gradient information. The main focus of our work is to study the behaviour of stochastic quasi-Newton algorithms for training deep neural networks. We have analyzed the performance of two well-known quasi-Newton updates, the limited memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) and the Symmetric Rank One (SR1). This study fills a gap concerning the real performance of both updates and analyzes whether more efficient training is obtained when using the more robust BFGS update or the cheaper SR1 formula which allows for indefinite Hessian approximations and thus can potentially help to better navigate the pathological saddle points present in the non-convex loss functions found in deep learning. We present and discuss the results of an extensive experimental study which includes the effect of batch normalization and network's architecture, the limited memory parameter, the batch size and the type of sampling strategy. we show that stochastic quasi-Newton optimizers are efficient and able to outperform in some instances the well-known first-order Adam optimizer run with the optimal combination of its numerous hyperparameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2019

Quasi-Newton Optimization Methods For Deep Learning Applications

Deep learning algorithms often require solving a highly non-linear and n...
research
01/28/2019

Quasi-Newton Methods for Deep Learning: Forget the Past, Just Sample

We present two sampled quasi-Newton methods for deep learning: sampled L...
research
06/05/2019

Efficient Subsampled Gauss-Newton and Natural Gradient Methods for Training Neural Networks

We present practical Levenberg-Marquardt variants of Gauss-Newton and na...
research
10/11/2022

Learning to Optimize Quasi-Newton Methods

We introduce a novel machine learning optimizer called LODO, which onlin...
research
02/15/2018

A Progressive Batching L-BFGS Method for Machine Learning

The standard L-BFGS method relies on gradient approximations that are no...
research
07/25/2023

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

Quasi-Newton methods still face significant challenges in training large...
research
05/19/2016

A Multi-Batch L-BFGS Method for Machine Learning

The question of how to parallelize the stochastic gradient descent (SGD)...

Please sign up or login with your details

Forgot password? Click here to reset