An ABC interpretation of the multiple auxiliary variable method

We show that the auxiliary variable method (Møller et al., 2006; Murray et al., 2006) for inference of Markov random fields can be viewed as an approximate Bayesian computation method for likelihood estimation.

• 13 publications
• 9 publications
05/03/2021

Estimation of underreporting in Brazilian tuberculosis data, 2012-2014

Analysis of burden of underregistration in tuberculosis data in Brazil, ...
10/28/2020

On Learning Continuous Pairwise Markov Random Fields

We consider learning a sparse pairwise Markov Random Field (MRF) with co...
05/12/2020

Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error

We study the L_1-regularized maximum likelihood estimator/estimation (ML...
11/30/2017

Inference of Dynamic Regimes in the Microbiome

Many studies have been performed to characterize the dynamics and stabil...
10/03/2017

Parameter estimation of platelets deposition: Approximate Bayesian computation with high performance computing

A numerical model that quantitatively describes how platelets in a shear...
10/23/2018

Comments on "Towards Unambiguous Edge Bundling: Investigating Confluent Drawings for Network Visualization"

Bach et al. [1] recently presented an algorithm for constructing general...
04/19/2020

stochprofML: Stochastic Profiling Using Maximum Likelihood Estimation in R

Tissues are often heterogeneous in their single-cell molecular expressio...

1 Introduction

Markov random fields (MRFs) have densities of the form

 f(y|θ)=γ(y|θ)/Z(θ), (1)

where can be evaluated numerically but cannot in a reasonable time. This makes it challenging to perform inference.

This note considers two approaches which both use simulation from . The single auxiliary variable (SAV) method (Møller et al., 2006) and the multiple auxiliary variable (MAV) method (Murray et al., 2006) provide unbiased likelihood estimates. Approximate Bayesian computation (Marin et al., 2012) finds parameters which produce simulations similar to the observed data. We will demonstrate that these two methods are in fact closely linked.

An additional challenge for inference of MRFs is that exact sampling from

is difficult. It is possible to implement Markov chain Monte Carlo (MCMC) algorithms which sample from a close approximation to this distribution. These MCMC algorithms have been used for inference through their use as a replacement for an exact sampler in SAV and MAV

(Caimo and Friel, 2011; Everitt, 2012) as well as ABC (Grelaud et al., 2009). We will use this approach and discuss it further below.

The remainder of the paper is as follows. Section 2 reviews ABC and MAV methods, and Section 3 derives our result. Throughout the paper refers to an observed dataset, and variables refer to simulated datasets used in inference.

2 Background

2.1 Auxiliary variable methods

The SAV method makes use of an unbiased estimate of

, given by using the following importance sampling (IS) estimate of

 ˆ1Z=qx(x|y,θ)γ(x|θ),

where is some arbitrary (normalised) density and . MAV extends this idea by instead using annealed IS (AIS) (Neal, 2001) for this estimate

 ˆ1Z=a∏i=2γi−1(xi|θ,y)γi(xi|θ,y)., (2)

where are bridging densities between and , and for , where is a reversible Markov kernel with invariant distribution . In this description we have imposed that is normalised in order to obtain an estimate of . However we note that a common choice for is for some estimate , in which case the normalising constant is not available. In this case we obtain an estimate of from Equation (2).

The SAV and MAV estimates are usually used as constituent parts of other Monte Carlo algorithms for parameter inference: in MCMC (Møller et al., 2006) or IS (Everitt et al., 2016). The estimates of just described may be used here since only an unbiased estimate of the posterior up to proportionality is required (Andrieu and Roberts, 2009).

As noted in the introduction, the requirement of being able to draw exactly from is potentially problematic. Caimo and Friel (2011) and Everitt (2012) explore the possibility of replacing this exact sampler with a long run (of iterations) of an MCMC sampler targeting , and taking the final point. Such an approach results in biased estimates of , although as this bias goes to zero. Everitt et al. (2016) observes empirically that a similar argument appears to hold when but is large.

2.2 Approximate Bayesian computation

ABC refers to a family of inference algorithms (described in Marin et al., 2012)

which perform an approximation to Bayesian inference when numerical evaluation of the likelihood function is intractable. They instead use simulation from the model of interest. The core of these algorithms is producing estimates of the likelihood

using some version of the following method. Simulate a dataset from and return the ABC likelihood estimate:

 LABC=1(||y−x||≤ϵ).

Here represents an indicator function, is some distance norm, and the acceptance threshold

is a tuning parameter. The expectation of the random variable

is

 ∫f(x|θ)1(||y−x||≤ϵ)dx.

This is often referred to as the ABC likelihood. It is proportional to a convolution of the likelihood with a uniform density, evaluated at . For this is generally an inexact approximation to the likelihood. For discrete data it is possible to use in which case the ABC likelihood equals the exact likelihood, and so is unbiased.

For MRFs empirically it is observed that, compared with competitors such as the exchange algorithm (Murray et al., 2006), ABC requires a relatively large number of simulations to yield an efficient algorithm (Friel, 2013).

3 Derivation

3.1 ABC for MRF models

Suppose that the model has an intractable likelihood but can be targeted by a MCMC chain . Let represent densities relating to this chain. Then is an approximation of which can be estimated by ABC. For now suppose that is discrete and consider the ABC likelihood estimate requiring an exact match: simulate from and return . We will consider an IS variation on this: simulate from and return . Under the mild assumption that has the same support as (typically true unless is small), both estimates have the expectation .

This can be generalised to cover continuous data using the identity

 πn(y|θ)=∫xn=yπ(x|θ)dx1:n−1,

where represents . An importance sampling estimate of this integral is

 w=π(x|θ)g(x1:n−1|θ) (3)

where is sampled from , with representing a Dirac delta measure. Then, under mild conditions on the support of , is an unbiased estimate of .

The ideal choice of is , as then exactly. This represents sampling from the Markov chain conditional on its final state being .

3.2 Equivalence to MAV

We now show that natural choices of and in the ABC method just outlined results in the MAV estimator (2). Our choices are

 g(x1:n−1|θ) =n−1∏i=1Ki(xi|xi+1) π(x|θ) =f1(x1|θ,y)n−1∏i=1Ki(xi+1|xi).

Here defines a MCMC chain with transitions . Suppose is as in Section 2.1 for , and for it is a reversible Markov kernel with invariant distribution . Also assume . Then the MCMC chain ends in a long sequence of steps targeting so that . Thus the likelihood being estimated converges on the true likelihood for large . Note this is the case even for fixed .

The importance density specifies a reverse time MCMC chain starting from with transitions . Simulating is straightforward by sampling , then and so on. This importance density is an approximation to the ideal choice stated at the end of Section 3.1.

The resulting likelihood estimator is

 w=f1(x1|θ,y)n−1∏i=1Ki(xi+1|xi)Ki(xi|xi+1).

Using detailed balance gives

 Ki(xi+1|xi)Ki(xi|xi+1)=fi(xi+1|θ,y)fi(xi|θ,y)=γi(xi+1|θ,y)γi(xi|θ,y),

so that

 w=f1(x1|θ,y)n−1∏i=1γi(xi+1|θ,y)γi(xi|θ,y)=γ(y|θ)n∏i=2γi−1(xi|θ,y)γi(xi|θ,y).

This is an unbiased estimator of . Hence

 v=n∏i=2γi−1(xi|θ,y)γi(xi|θ,y)=a∏i=2γi−1(xi|θ,y)γi(xi|θ,y).

is an unbiased estimator of . In the above we have assumed, as in Section 3.1, that is normalised. When this is not the case then we instead get an estimator of , as for MAV methods. Also note that in either case a valid estimator is produced for any choice of .

The ABC estimate can be viewed by a two stage procedure. First run a MCMC chain of length with any starting value, targeting . Let its final value be . Secondly run a MCMC chain using kernels and evaluate the estimator . This is unbiased in the limit , so the first stage could be replaced by perfect sampling methods where these exist.

The resulting procedure is thus equivalent to that for MAV.

4 Conclusion

We have demonstrated that the MAV method can be interpreted as an ABC algorithm. We hope this insight will be useful for the development of novel methods for MRFs.

References

• Andrieu and Roberts (2009) Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, pages 697–725.
• Caimo and Friel (2011) Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Social Networks, 33:41–55.
• Everitt (2012) Everitt, R. G. (2012). Bayesian parameter estimation for latent markov random fields and social networks. Journal of Computational and Graphical Statistics, 21:940–960.
• Everitt et al. (2016) Everitt, R. G., Johansen, A. M., Rowing, E., and Evdemon-Hogan, M. (2016). Bayesian model comparison with un-normalised likelihoods. Statistics and Computing: early online version.
• Friel (2013) Friel, N. (2013).

Evidence and Bayes factor estimation for Gibbs random fields.

Journal of Computational and Graphical Statistics, 22:518–532.
• Grelaud et al. (2009) Grelaud, A., Robert, C. P., Marin, J.-M., Rodolphe, F., and Taly, J. F. (2009). ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis, 4(2):317–336.
• Marin et al. (2012) Marin, J.-M., Pudlo, P., Robert, C. P., and Ryder, R. J. (2012). Approximate Bayesian computational methods. Statistics and Computing, 22(6):1167–1180.
• Møller et al. (2006) Møller, J., Pettitt, A. N., Reeves, R. W., and Berthelsen, K. K. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93:451–458.
• Murray et al. (2006) Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006). MCMC for doubly-intractable distributions. In

Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI)

, pages 359–366.
• Neal (2001) Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11:125–139.