Rethinking Exponential Averaging of the Fisher

04/10/2022
by   Constantin Octavian Puiu, et al.
0

In optimization for Machine learning (ML), it is typical that curvature-matrix (CM) estimates rely on an exponential average (EA) of local estimates (giving EA-CM algorithms). This approach has little principled justification, but is very often used in practice. In this paper, we draw a connection between EA-CM algorithms and what we call a "Wake of Quadratic regularized models". The outlined connection allows us to understand what EA-CM algorithms are doing from an optimization perspective. Generalizing from the established connection, we propose a new family of algorithms, "KL-Divergence Wake-Regularized Models" (KLD-WRM). We give three different practical instantiations of KLD-WRM, and show numerical results where we outperform K-FAC.

READ FULL TEXT

page 31

page 32

research
10/09/2014

Distributed Estimation, Information Loss and Exponential Families

Distributed learning of probabilistic models from multiple data reposito...
research
12/03/2022

Learning-Assisted Algorithm Unrolling for Online Optimization with Budget Constraints

Online optimization with multiple budget constraints is challenging sinc...
research
11/09/2022

Regularized Rényi divergence minimization through Bregman proximal gradient algorithms

We study the variational inference problem of minimizing a regularized R...
research
12/10/2017

Logarithmic divergences from optimal transport and Rényi geometry

Divergences, also known as contrast functions, are distance-like quantit...
research
11/06/2015

Towards a Better Understanding of Predict and Count Models

In a recent paper, Levy and Goldberg pointed out an interesting connecti...

Please sign up or login with your details

Forgot password? Click here to reset