NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

by   Minghan Yang, et al.

In this paper, a novel second-order method called NG+ is proposed. By following the rule “the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.



page 4


Sketchy Empirical Natural Gradient Methods for Deep Learning

In this paper, we develop an efficient sketchy empirical natural gradien...

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

This work proposes a time-efficient Natural Gradient Descent method, cal...

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

Efficiently approximating local curvature information of the loss functi...

Tensor Normal Training for Deep Learning Models

Despite the predominant use of first-order methods for training deep lea...

Explicit natural gradient updates for Cholesky factor in Gaussian variational approximation

Stochastic gradient methods have enabled variational inference for high-...

Optimization of Generalized Jacobian Chain Products without Memory Constraints

The efficient computation of Jacobians represents a fundamental challeng...

Fast Gradient Methods with Alignment for Symmetric Linear Systems without Using Cauchy Step

The performance of gradient methods has been considerably improved by th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.