Deep Neural Network Learning with Second-Order Optimizers – a Practical Study with a Stochastic Quasi-Gauss-Newton Method

04/06/2020
by   Christopher Thiele, et al.
0

Training in supervised deep learning is computationally demanding, and the convergence behavior is usually not fully understood. We introduce and study a second-order stochastic quasi-Gauss–Newton (SQGN) optimization method that combines ideas from stochastic quasi-Newton methods, Gauss–Newton methods, and variance reduction to address this problem. SQGN provides excellent accuracy without the need for experimenting with many hyper-parameter configurations, which is often computationally prohibitive given the number of combinations and the cost of each training process. We discuss the implementation of SQGN with TensorFlow, and we compare its convergence and computational performance to selected first-order methods using the MNIST benchmark and a large-scale seismic tomography application from Earth science.

READ FULL TEXT
research
10/17/2019

A Stochastic Variance Reduced Nesterov's Accelerated Quasi-Newton Method

Recently algorithms incorporating second order curvature information hav...
research
09/09/2019

A Stochastic Quasi-Newton Method with Nesterov's Accelerated Gradient

Incorporating second order curvature information in gradient based metho...
research
10/26/2020

An Efficient Newton Method for Extreme Similarity Learning with Nonlinear Embeddings

We study the problem of learning similarity by using nonlinear embedding...
research
06/26/2020

Newton retraction as approximate geodesics on submanifolds

Efficient approximation of geodesics is crucial for practical algorithms...
research
09/03/2019

Stochastic quasi-Newton with line-search regularization

In this paper we present a novel quasi-Newton algorithm for use in stoch...
research
01/19/2022

Variance-Reduced Stochastic Quasi-Newton Methods for Decentralized Learning: Part I

In this work, we investigate stochastic quasi-Newton methods for minimiz...
research
07/25/2023

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization

Quasi-Newton methods still face significant challenges in training large...

Please sign up or login with your details

Forgot password? Click here to reset