A Formalization of The Natural Gradient Method for General Similarity Measures

02/24/2019
by   Anton Mallasto, et al.
0

In optimization, the natural gradient method is well-known for likelihood maximization. The method uses the Kullback-Leibler divergence, corresponding infinitesimally to the Fisher-Rao metric, which is pulled back to the parameter space of a family of probability distributions. This way, gradients with respect to the parameters respect the Fisher-Rao geometry of the space of distributions, which might differ vastly from the standard Euclidean geometry of the parameter space, often leading to faster convergence. However, when minimizing an arbitrary similarity measure between distributions, it is generally unclear which metric to use. We provide a general framework that, given a similarity measure, derives a metric for the natural gradient. We then discuss connections between the natural gradient method and multiple other optimization techniques in the literature. Finally, we provide computations of the formal natural gradient to show overlap with well-known cases and to compute natural gradients in novel frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2022

Invariance Properties of the Natural Gradient in Overparametrised Systems

The natural gradient field is a vector field that lives on a model equip...
research
05/21/2020

On the Locality of the Natural Gradient for Deep Learning

We study the natural gradient method for learning in deep Bayesian netwo...
research
06/10/2022

Fisher SAM: Information Geometry and Sharpness Aware Minimisation

Recent sharpness-aware minimisation (SAM) is known to find flat minima w...
research
11/16/2012

Objective Improvement in Information-Geometric Optimization

Information-Geometric Optimization (IGO) is a unified framework of stoch...
research
08/13/2022

Natural differentiable structures on statistical models and the Fisher metric

In this paper I discuss the relation between the concept of the Fisher m...
research
10/21/2019

Kernelized Wasserstein Natural Gradient

Many machine learning problems can be expressed as the optimization of s...
research
06/04/2012

Theoretical foundation for CMA-ES from information geometric perspective

This paper explores the theoretical basis of the covariance matrix adapt...

Please sign up or login with your details

Forgot password? Click here to reset