Kernelized Wasserstein Natural Gradient

10/21/2019
by   Michael Arbel, et al.
13

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2018

Natural gradient via optimal transport I

We study a natural Wasserstein gradient flow on manifolds of probability...
research
02/20/2020

Stochastic Optimization for Regularized Wasserstein Estimators

Optimal transport is a foundational problem in optimization, that allows...
research
10/21/2021

Sliced-Wasserstein Gradient Flows

Minimizing functionals in the space of probability distributions can be ...
research
10/12/2020

Efficient Wasserstein Natural Gradients for Reinforcement Learning

A novel optimization approach is proposed for application to policy grad...
research
02/24/2019

A Formalization of The Natural Gradient Method for General Similarity Measures

In optimization, the natural gradient method is well-known for likelihoo...
research
01/26/2021

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

We introduce MADGRAD, a novel optimization method in the family of AdaGr...
research
07/21/2014

Scalable Kernel Methods via Doubly Stochastic Gradients

The general perception is that kernel methods are not scalable, and neur...

Please sign up or login with your details

Forgot password? Click here to reset