NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

06/14/2021
by   Minghan Yang, et al.
0

In this paper, a novel second-order method called NG+ is proposed. By following the rule “the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.

READ FULL TEXT
research
06/10/2020

Sketchy Empirical Natural Gradient Methods for Deep Learning

In this paper, we develop an efficient sketchy empirical natural gradien...
research
06/07/2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

This work proposes a time-efficient Natural Gradient Descent method, cal...
research
07/07/2021

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

Efficiently approximating local curvature information of the loss functi...
research
06/05/2021

Tensor Normal Training for Deep Learning Models

Despite the predominant use of first-order methods for training deep lea...
research
09/01/2021

Explicit natural gradient updates for Cholesky factor in Gaussian variational approximation

Stochastic gradient methods have enabled variational inference for high-...
research
05/30/2023

KrADagrad: Kronecker Approximation-Domination Gradient Preconditioned Stochastic Optimization

Second order stochastic optimizers allow parameter update step size and ...
research
05/21/2020

On the Locality of the Natural Gradient for Deep Learning

We study the natural gradient method for learning in deep Bayesian netwo...

Please sign up or login with your details

Forgot password? Click here to reset