Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition

01/25/2022
by   Abdoulaye Koroko, et al.
0

Several studies have shown the ability of natural gradient descent to minimize the objective function more efficiently than ordinary gradient descent based methods. However, the bottleneck of this approach for training deep neural networks lies in the prohibitive cost of solving a large dense linear system corresponding to the Fisher Information Matrix (FIM) at each iteration. This has motivated various approximations of either the exact FIM or the empirical one. The most sophisticated of these is KFAC, which involves a Kronecker-factored block diagonal approximation of the FIM. With only a slight additional cost, a few improvements of KFAC from the standpoint of accuracy are proposed. The common feature of the four novel methods is that they rely on a direct minimization problem, the solution of which can be computed via the Kronecker product singular value decomposition technique. Experimental results on the three standard deep auto-encoder benchmarks showed that they provide more accurate approximations to the FIM. Furthermore, they outperform KFAC and state-of-the-art first-order methods in terms of optimization speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Natural Gradient Descent (NGD) helps to accelerate the convergence of gr...
research
03/31/2023

Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks

As a second-order method, the Natural Gradient Descent (NGD) has the abi...
research
08/15/2020

Natural Wake-Sleep Algorithm

The benefits of using the natural gradient are well known in a wide rang...
research
06/07/2021

TENGraD: Time-Efficient Natural Gradient Descent with Exact Fisher-Block Inversion

This work proposes a time-efficient Natural Gradient Descent method, cal...
research
06/11/2018

Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis

Optimization algorithms that leverage gradient covariance information, s...
research
01/29/2022

A Stochastic Bundle Method for Interpolating Networks

We propose a novel method for training deep neural networks that are cap...
research
01/13/2022

GradMax: Growing Neural Networks using Gradient Information

The architecture and the parameters of neural networks are often optimiz...

Please sign up or login with your details

Forgot password? Click here to reset