Analysis and Comparison of Two-Level KFAC Methods for Training Deep Neural Networks

03/31/2023
by   Abdoulaye Koroko, et al.
0

As a second-order method, the Natural Gradient Descent (NGD) has the ability to accelerate training of neural networks. However, due to the prohibitive computational and memory costs of computing and inverting the Fisher Information Matrix (FIM), efficient approximations are necessary to make NGD scalable to Deep Neural Networks (DNNs). Many such approximations have been attempted. The most sophisticated of these is KFAC, which approximates the FIM as a block-diagonal matrix, where each block corresponds to a layer of the neural network. By doing so, KFAC ignores the interactions between different layers. In this work, we investigate the interest of restoring some low-frequency interactions between the layers by means of two-level methods. Inspired from domain decomposition, several two-level corrections to KFAC using different coarse spaces are proposed and assessed. The obtained results show that incorporating the layer interactions in this fashion does not really improve the performance of KFAC. This suggests that it is safe to discard the off-diagonal blocks of the FIM, since the block-diagonal approach is sufficiently robust, accurate and economical in computation time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

A Mini-Block Natural Gradient Method for Deep Neural Networks

The training of deep neural networks (DNNs) is currently predominantly d...
research
10/02/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Natural Gradient Descent (NGD) helps to accelerate the convergence of gr...
research
10/11/2022

Component-Wise Natural Gradient Descent – An Efficient Neural Network Optimization

Natural Gradient Descent (NGD) is a second-order neural network training...
research
09/12/2023

A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale

Shampoo is an online and stochastic optimization algorithm belonging to ...
research
01/25/2022

Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition

Several studies have shown the ability of natural gradient descent to mi...
research
11/21/2016

Scalable Adaptive Stochastic Optimization Using Random Projections

Adaptive stochastic gradient methods such as AdaGrad have gained popular...
research
05/25/2017

Diagonal Rescaling For Neural Networks

We define a second-order neural network stochastic gradient training alg...

Please sign up or login with your details

Forgot password? Click here to reset