Markov random fields (MRFs) have densities of the form
where can be evaluated numerically but cannot in a reasonable time. This makes it challenging to perform inference.
This note considers two approaches which both use simulation from . The single auxiliary variable (SAV) method (Møller et al., 2006) and the multiple auxiliary variable (MAV) method (Murray et al., 2006) provide unbiased likelihood estimates. Approximate Bayesian computation (Marin et al., 2012) finds parameters which produce simulations similar to the observed data. We will demonstrate that these two methods are in fact closely linked.
An additional challenge for inference of MRFs is that exact sampling from
is difficult. It is possible to implement Markov chain Monte Carlo (MCMC) algorithms which sample from a close approximation to this distribution. These MCMC algorithms have been used for inference through their use as a replacement for an exact sampler in SAV and MAV(Caimo and Friel, 2011; Everitt, 2012) as well as ABC (Grelaud et al., 2009). We will use this approach and discuss it further below.
2.1 Auxiliary variable methods
The SAV method makes use of an unbiased estimate of, given by using the following importance sampling (IS) estimate of
where is some arbitrary (normalised) density and . MAV extends this idea by instead using annealed IS (AIS) (Neal, 2001) for this estimate
where are bridging densities between and , and for , where is a reversible Markov kernel with invariant distribution . In this description we have imposed that is normalised in order to obtain an estimate of . However we note that a common choice for is for some estimate , in which case the normalising constant is not available. In this case we obtain an estimate of from Equation (2).
The SAV and MAV estimates are usually used as constituent parts of other Monte Carlo algorithms for parameter inference: in MCMC (Møller et al., 2006) or IS (Everitt et al., 2016). The estimates of just described may be used here since only an unbiased estimate of the posterior up to proportionality is required (Andrieu and Roberts, 2009).
As noted in the introduction, the requirement of being able to draw exactly from is potentially problematic. Caimo and Friel (2011) and Everitt (2012) explore the possibility of replacing this exact sampler with a long run (of iterations) of an MCMC sampler targeting , and taking the final point. Such an approach results in biased estimates of , although as this bias goes to zero. Everitt et al. (2016) observes empirically that a similar argument appears to hold when but is large.
2.2 Approximate Bayesian computation
ABC refers to a family of inference algorithms (described in Marin et al., 2012)
which perform an approximation to Bayesian inference when numerical evaluation of the likelihood function is intractable. They instead use simulation from the model of interest. The core of these algorithms is producing estimates of the likelihoodusing some version of the following method. Simulate a dataset from and return the ABC likelihood estimate:
Here represents an indicator function, is some distance norm, and the acceptance threshold
is a tuning parameter. The expectation of the random variableis
This is often referred to as the ABC likelihood. It is proportional to a convolution of the likelihood with a uniform density, evaluated at . For this is generally an inexact approximation to the likelihood. For discrete data it is possible to use in which case the ABC likelihood equals the exact likelihood, and so is unbiased.
3.1 ABC for MRF models
Suppose that the model has an intractable likelihood but can be targeted by a MCMC chain . Let represent densities relating to this chain. Then is an approximation of which can be estimated by ABC. For now suppose that is discrete and consider the ABC likelihood estimate requiring an exact match: simulate from and return . We will consider an IS variation on this: simulate from and return . Under the mild assumption that has the same support as (typically true unless is small), both estimates have the expectation .
This can be generalised to cover continuous data using the identity
where represents . An importance sampling estimate of this integral is
where is sampled from , with representing a Dirac delta measure. Then, under mild conditions on the support of , is an unbiased estimate of .
The ideal choice of is , as then exactly. This represents sampling from the Markov chain conditional on its final state being .
3.2 Equivalence to MAV
We now show that natural choices of and in the ABC method just outlined results in the MAV estimator (2). Our choices are
Here defines a MCMC chain with transitions . Suppose is as in Section 2.1 for , and for it is a reversible Markov kernel with invariant distribution . Also assume . Then the MCMC chain ends in a long sequence of steps targeting so that . Thus the likelihood being estimated converges on the true likelihood for large . Note this is the case even for fixed .
The importance density specifies a reverse time MCMC chain starting from with transitions . Simulating is straightforward by sampling , then and so on. This importance density is an approximation to the ideal choice stated at the end of Section 3.1.
The resulting likelihood estimator is
Using detailed balance gives
This is an unbiased estimator of . Hence
is an unbiased estimator of . In the above we have assumed, as in Section 3.1, that is normalised. When this is not the case then we instead get an estimator of , as for MAV methods. Also note that in either case a valid estimator is produced for any choice of .
The ABC estimate can be viewed by a two stage procedure. First run a MCMC chain of length with any starting value, targeting . Let its final value be . Secondly run a MCMC chain using kernels and evaluate the estimator . This is unbiased in the limit , so the first stage could be replaced by perfect sampling methods where these exist.
The resulting procedure is thus equivalent to that for MAV.
We have demonstrated that the MAV method can be interpreted as an ABC algorithm. We hope this insight will be useful for the development of novel methods for MRFs.
- Andrieu and Roberts (2009) Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, pages 697–725.
- Caimo and Friel (2011) Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Social Networks, 33:41–55.
- Everitt (2012) Everitt, R. G. (2012). Bayesian parameter estimation for latent markov random fields and social networks. Journal of Computational and Graphical Statistics, 21:940–960.
- Everitt et al. (2016) Everitt, R. G., Johansen, A. M., Rowing, E., and Evdemon-Hogan, M. (2016). Bayesian model comparison with un-normalised likelihoods. Statistics and Computing: early online version.
Friel, N. (2013).
Evidence and Bayes factor estimation for Gibbs random fields.Journal of Computational and Graphical Statistics, 22:518–532.
- Grelaud et al. (2009) Grelaud, A., Robert, C. P., Marin, J.-M., Rodolphe, F., and Taly, J. F. (2009). ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis, 4(2):317–336.
- Marin et al. (2012) Marin, J.-M., Pudlo, P., Robert, C. P., and Ryder, R. J. (2012). Approximate Bayesian computational methods. Statistics and Computing, 22(6):1167–1180.
- Møller et al. (2006) Møller, J., Pettitt, A. N., Reeves, R. W., and Berthelsen, K. K. (2006). An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93:451–458.
Murray et al. (2006)
Murray, I., Ghahramani, Z., and MacKay, D. J. C. (2006).
MCMC for doubly-intractable distributions.
Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 359–366.
- Neal (2001) Neal, R. M. (2001). Annealed importance sampling. Statistics and Computing, 11:125–139.