Block-diagonal Hessian-free Optimization for Training Neural Networks

12/20/2017
by   Huishuai Zhang, et al.
0

Second-order methods for neural network optimization have several advantages over methods based on first-order gradient descent, including better scaling to large mini-batch sizes and fewer updates needed for convergence. But they are rarely applied to deep learning in practice because of high computational cost and the need for model-dependent algorithmic variations. We introduce a variant of the Hessian-free method that leverages a block-diagonal approximation of the generalized Gauss-Newton matrix. Our method computes the curvature approximation matrix only for pairs of parameters from the same layer or block of the neural network and performs conjugate gradient updates independently for each block. Experiments on deep autoencoders, deep convolutional networks, and multilayer LSTMs demonstrate better convergence and generalization compared to the original Hessian-free approach and the Adam method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2018

BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

We propose a block-diagonal approximation of the positive-curvature Hess...
research
02/12/2021

Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks

Second-order methods have the capability of accelerating optimization by...
research
01/16/2013

Training Neural Networks with Stochastic Hessian-Free Optimization

Hessian-free (HF) optimization has been successfully used for training d...
research
05/25/2017

Diagonal Rescaling For Neural Networks

We define a second-order neural network stochastic gradient training alg...
research
10/27/2019

Fast Evaluation and Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron

We introduce a fast algorithm for entry-wise evaluation of the Gauss-New...
research
09/03/2022

Quadratic Gradient: Uniting Gradient Algorithm and Newton Method as One

It might be inadequate for the line search technique for Newton's method...
research
08/03/2022

Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

Training deep neural networks consumes increasing computational resource...

Please sign up or login with your details

Forgot password? Click here to reset