Posterior inference unchained with EL_2O

01/14/2019
by   Uros Seljak, et al.
0

Statistical inference of analytically non-tractable posteriors is a difficult problem because of marginalization of correlated variables and stochastic methods such as MCMC and VI are commonly used. We argue that stochastic KL divergence minimization used by MCMC and VI is noisy, and we propose instead EL_2O, expectation optimization of L_2 distance squared between the approximate log posterior q and the un-normalized log posterior of p. When sampling from q the solutions agree with stochastic KL divergence minimization based VI in the large sample limit, however EL_2O method is free of sampling noise, has better optimization properties, and requires only as many sample evaluations as the number of parameters we are optimizing if q covers p. As a consequence, increasing the expressivity of q improves both the quality of results and the convergence rate, allowing EL_2O to approach exact inference. Use of automatic differentiation methods enables us to develop Hessian, gradient and gradient free versions of the method, which can determine M(M+2)/2+1, M+1 and 1 parameter(s) of q with a single sample, respectively. EL_2O provides a reliable estimate of the quality of the approximating posterior, and converges rapidly on full rank gaussian approximation for q and extensions beyond it, such as nonlinear transformations and gaussian mixtures. These can handle general posteriors, while still allowing fast analytic marginalizations. We test it on several examples, including a realistic 13 dimensional galaxy clustering analysis, showing that it is several orders of magnitude faster than MCMC, while giving smooth and accurate non-gaussian posteriors, often requiring a few to a few dozen of iterations only.

READ FULL TEXT
research
05/10/2019

A Contrastive Divergence for Combining Variational Inference and MCMC

We develop a method to combine Markov chain Monte Carlo (MCMC) and varia...
research
05/25/2017

Convergence of Langevin MCMC in KL-divergence

Langevin diffusion is a commonly used tool for sampling from a given dis...
research
08/09/2021

Pathfinder: Parallel quasi-Newton variational inference

We introduce Pathfinder, a variational method for approximately sampling...
research
10/03/2020

An adaptive Hessian approximated stochastic gradient MCMC method

Bayesian approaches have been successfully integrated into training deep...
research
04/10/2023

Forward-backward Gaussian variational inference via JKO in the Bures-Wasserstein Space

Variational inference (VI) seeks to approximate a target distribution π ...
research
12/23/2015

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

Learning in deep models using Bayesian methods has generated significant...
research
06/20/2012

Bayesian structure learning using dynamic programming and MCMC

MCMC methods for sampling from the space of DAGs can mix poorly due to t...

Please sign up or login with your details

Forgot password? Click here to reset