Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis

06/11/2018
by   Thomas George, et al.
0

Optimization algorithms that leverage gradient covariance information, such as variants of natural gradient descent (Amari, 1998), offer the prospect of yielding more effective descent directions. For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.

READ FULL TEXT
research
04/29/2014

Fast Approximation of Rotations and Hessians matrices

A new method to represent and approximate rotation matrices is introduce...
research
03/19/2015

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

We propose an efficient method for approximating natural gradient descen...
research
10/02/2020

Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks

Natural Gradient Descent (NGD) helps to accelerate the convergence of gr...
research
01/25/2022

Efficient Approximations of the Fisher Matrix in Neural Networks using Kronecker Product Singular Value Decomposition

Several studies have shown the ability of natural gradient descent to mi...
research
10/11/2022

Component-Wise Natural Gradient Descent – An Efficient Neural Network Optimization

Natural Gradient Descent (NGD) is a second-order neural network training...
research
04/20/2012

Numerical Analysis of Diagonal-Preserving, Ripple-Minimizing and Low-Pass Image Resampling Methods

Image resampling is a necessary component of any operation that changes ...

Please sign up or login with your details

Forgot password? Click here to reset