DeepAI AI Chat
Log In Sign Up

BDA-PCH: Block-Diagonal Approximation of Positive-Curvature Hessian for Training Neural Networks

02/19/2018
by   Sheng-Wei Chen, et al.
HTC
0

We propose a block-diagonal approximation of the positive-curvature Hessian (BDA-PCH) matrix to measure curvature. Our proposed BDAPCH matrix is memory efficient and can be applied to any fully-connected neural networks where the activation and criterion functions are twice differentiable. Particularly, our BDA-PCH matrix can handle non-convex criterion functions. We devise an efficient scheme utilizing the conjugate gradient method to derive Newton directions for mini-batch setting. Empirical studies show that our method outperforms the competing second-order methods in convergence speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

12/20/2017

Block-diagonal Hessian-free Optimization for Training Neural Networks

Second-order methods for neural network optimization have several advant...
02/12/2021

Kronecker-factored Quasi-Newton Methods for Convolutional Neural Networks

Second-order methods have the capability of accelerating optimization by...
06/27/2012

Estimating the Hessian by Back-propagating Curvature

In this work we develop Curvature Propagation (CP), a general technique ...
01/14/2020

On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width

The Hessian of neural networks can be decomposed into a sum of two matri...
05/25/2017

Diagonal Rescaling For Neural Networks

We define a second-order neural network stochastic gradient training alg...
06/04/2021

ViViT: Curvature access through the generalized Gauss-Newton's low-rank structure

Curvature in form of the Hessian or its generalized Gauss-Newton (GGN) a...
12/02/2019

On the Delta Method for Uncertainty Approximation in Deep Learning

The Delta method is a well known procedure used to quantify uncertainty ...