Fisher Information and Natural Gradient Learning of Random Deep Networks

08/22/2018
by   Shun-ichi Amari, et al.
0

A deep neural network is a hierarchical nonlinear model transforming input signals to output signals. Its input-output relation is considered to be stochastic, being described for a given input by a parameterized conditional probability distribution of outputs. The space of parameters consisting of weights and biases is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, so it is effective in learning, avoiding plateaus. It requires inversion of the Fisher information matrix, however, which is practically impossible when the matrix has a huge number of dimensions. Many methods for approximating the natural gradient have therefore been introduced. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections under the mean field approximation. We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements, which provides a justification for the quasi-diagonal natural gradient method by Y. Ollivier. A unitwise block-diagonal Fisher metrix reduces to the tensor product of the Fisher information matrices of single units. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the natural gradient, without relying on the numerical matrix inversion, which drastically speeds up stochastic gradient learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

A Mini-Block Natural Gradient Method for Deep Neural Networks

The training of deep neural networks (DNNs) is currently predominantly d...
research
06/05/2021

Tensor Normal Training for Deep Learning Models

Despite the predominant use of first-order methods for training deep lea...
research
03/19/2015

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

We propose an efficient method for approximating natural gradient descen...
research
10/11/2022

Component-Wise Natural Gradient Descent – An Efficient Neural Network Optimization

Natural Gradient Descent (NGD) is a second-order neural network training...
research
02/25/2016

Practical Riemannian Neural Networks

We provide the first experimental results on non-synthetic datasets for ...
research
08/15/2020

Natural Wake-Sleep Algorithm

The benefits of using the natural gradient are well known in a wide rang...
research
05/21/2020

On the Locality of the Natural Gradient for Deep Learning

We study the natural gradient method for learning in deep Bayesian netwo...

Please sign up or login with your details

Forgot password? Click here to reset