DeepAI AI Chat
Log In Sign Up

Bayes-Newton Methods for Approximate Bayesian Inference with PSD Guarantees

by   William J. Wilkinson, et al.

We formulate natural gradient variational inference (VI), expectation propagation (EP), and posterior linearisation (PL) as extensions of Newton's method for optimising the parameters of a Bayesian posterior distribution. This viewpoint explicitly casts inference algorithms under the framework of numerical optimisation. We show that common approximations to Newton's method from the optimisation literature, namely Gauss-Newton and quasi-Newton methods (e.g., the BFGS algorithm), are still valid under this 'Bayes-Newton' framework. This leads to a suite of novel algorithms which are guaranteed to result in positive semi-definite covariance matrices, unlike standard VI and EP. Our unifying viewpoint provides new insights into the connections between various inference schemes. All the presented methods apply to any model with a Gaussian prior and non-conjugate likelihood, which we demonstrate with (sparse) Gaussian processes and state space models.


page 1

page 2

page 3

page 4


Disentangling the Gauss-Newton Method and Approximate Inference for Neural Networks

In this thesis, we disentangle the generalized Gauss-Newton and approxim...

State Space Expectation Propagation: Efficient Inference Schemes for Temporal Gaussian Processes

We formulate approximate Bayesian inference in non-conjugate temporal an...

Sparse Algorithms for Markovian Gaussian Processes

Approximate Bayesian inference methods that scale to very large datasets...

Probabilistic Interpretation of Linear Solvers

This manuscript proposes a probabilistic framework for algorithms that i...

Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement

Bayesian coresets approximate a posterior distribution by building a sma...

Quasi-Bayes properties of a recursive procedure for mixtures

Bayesian methods are attractive and often optimal, yet nowadays pressure...

Multiple Kernel Learning: A Unifying Probabilistic Viewpoint

We present a probabilistic viewpoint to multiple kernel learning unifyin...