NG+ : A Multi-Step Matrix-Product Natural Gradient Method for Deep Learning

06/14/2021
by   Minghan Yang, et al.
0

In this paper, a novel second-order method called NG+ is proposed. By following the rule “the shape of the gradient equals the shape of the parameter", we define a generalized fisher information matrix (GFIM) using the products of gradients in the matrix form rather than the traditional vectorization. Then, our generalized natural gradient direction is simply the inverse of the GFIM multiplies the gradient in the matrix form. Moreover, the GFIM and its inverse keeps the same for multiple steps so that the computational cost can be controlled and is comparable with the first-order methods. A global convergence is established under some mild conditions and a regret bound is also given for the online learning setting. Numerical results on image classification with ResNet50, quantum chemistry modeling with Schnet, neural machine translation with Transformer and recommendation system with DLRM illustrate that GN+ is competitive with the state-of-the-art methods.

READ FULL TEXT

Authors

page 4

06/10/2020

Sketchy Empirical Natural Gradient Methods for Deep Learning

In this paper, we develop an efficient sketchy empirical natural gradien...
06/07/2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

This work proposes a time-efficient Natural Gradient Descent method, cal...
07/07/2021

Efficient Matrix-Free Approximations of Second-Order Information, with Applications to Pruning and Optimization

Efficiently approximating local curvature information of the loss functi...
06/05/2021

Tensor Normal Training for Deep Learning Models

Despite the predominant use of first-order methods for training deep lea...
09/01/2021

Explicit natural gradient updates for Cholesky factor in Gaussian variational approximation

Stochastic gradient methods have enabled variational inference for high-...
03/12/2020

Optimization of Generalized Jacobian Chain Products without Memory Constraints

The efficient computation of Jacobians represents a fundamental challeng...
09/03/2019

Fast Gradient Methods with Alignment for Symmetric Linear Systems without Using Cauchy Step

The performance of gradient methods has been considerably improved by th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.