Bayesian Computation with Intractable Likelihoods

04/08/2020
by   Matthew T. Moores, et al.
0

This article surveys computational methods for posterior inference with intractable likelihoods, that is where the likelihood function is unavailable in closed form, or where evaluation of the likelihood is infeasible. We review recent developments in pseudo-marginal methods, approximate Bayesian computation (ABC), the exchange algorithm, thermodynamic integration, and composite likelihood, paying particular attention to advancements in scalability for large datasets. We also mention R and MATLAB source code for implementations of these algorithms, where they are available.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

08/03/2020

Fixing Bias in Zipf's Law Estimators Using Approximate Bayesian Computation

The prevailing Bayesian maximum likelihood estimators for inferring powe...
03/19/2018

Bayesian design of experiments for intractable likelihood models using coupled auxiliary models and multivariate emulation

A Bayesian design is given by maximising the expected utility over the d...
04/05/2021

Adjusted composite likelihood for robust Bayesian meta-analysis

A composite likelihood is a non-genuine likelihood function that allows ...
04/06/2021

Statistical Network Analysis with Bergm

Recent advances in computational methods for intractable models have mad...
12/28/2019

A practical guide to pseudo-marginal methods for computational inference in systems biology

For many stochastic models of interest in systems biology, such as those...
03/03/2019

Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference

In likelihood-free settings where likelihood evaluations are intractable...
09/06/2019

A review of Approximate Bayesian Computation methods via density estimation: inference for simulator-models

This paper provides a review of Approximate Bayesian Computation (ABC) m...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Auxiliary Variable Methods

1.1 Pseudo-Marginal Algorithms

Pseudo-marginal algorithms (Beaumont, 2003; Andrieu and Roberts, 2009) are computational methods for fitting latent variable models, that is where the observed data can be considered as noisy observations of some unobserved or hidden states,

. For example, hidden Markov models (HMMs) are commonly used in time series analysis and signal processing. Models of this form can also arise as the result of data augmentation approaches, such as for mixture models

(Dempster et al., 1977; Tanner and Wong, 1987). The marginal likelihood is of the following form:

(8)

which can be intractable if the state space is very high-dimensional and non-Gaussian. In this case, we can substitute an unbiased, non-negative estimate of the likelihood.

O’Neill et al. (2000) introduced the Monte Carlo within Metropolis (MCWM) algorithm, which replaces both and in the Metropolis-Hastings ratio (6) with importance sampling estimates:

(9)

where the samples are drawn from a proposal distribution for and . MCWM is generally considered as an approximate algorithm, since it does not target the exact posterior distribution for . However, Medina-Aguayo et al. (2016) have established some conditions under which MCWM converges to the correct target distribution as . See also Nicholls et al. (2012) and Alquier et al. (2016) for further theoretical analysis of approximate pseudo-marginal methods.

Beaumont (2003) introduced the grouped independence Metropolis-Hastings (GIMH) algorithm, which does target the exact posterior. The key difference is that is reused from the previous iteration, rather than being recalculated every time. The theoretical properties of this algorithm have been an active area of research, with notable contributions by Andrieu and Roberts (2009); Maire et al. (2014); Andrieu and Vihola (2015), and Sherlock et al. (2015). Andrieu et al. (2010)

introduced the particle MCMC algorithm, which is a pseudo-marginal method that uses sequential Monte Carlo (SMC) in place of importance sampling. This is particularly useful for HMMs, where SMC methods such as the bootstrap particle filter provide an unbiased estimate of the marginal likelihood

(Pitt et al., 2012). Although importance sampling and SMC are both unbiased estimators, it is necessary to use a large enough value of

so that the variance is kept at a reasonable level. Otherwise, the pseudo-marginal algorithm can fail to be variance-bounding or geometrically ergodic

(Lee and Łatuszyński, 2014). Doucet et al. (2015) recommend choosing

so that the standard deviation of the log-likelihood estimator is between 1 and 1.7.

Pseudo-marginal algorithms can be computationally-intensive, particularly for large values of . One strategy to reduce this computational burden, known as the Russian Roulette algorithm (Lyne et al., 2015), is to replace (9) with a truncated infinite series:

(10)

where is a random stopping time and

are random variables such that (

10) is almost surely finite and . There is a difficulty with this method, however, in that the likelihood estimates are not guaranteed to be non-negative. Jacob and Thiery (2015) have established that there is no general solution to this sign problem, although successful strategies have been proposed for some specific models.

Another important class of algorithms for accelerating pseudo-marginal methods involve approximating the intractable likelihood function using a surrogate model. For example, the delayed-acceptance (DA) algorithm of Christen and Fox (2005) first evaluates the Metropolis-Hastings ratio (6) using a fast, approximate likelihood . The proposal is rejected at this screening stage with probability . Otherwise, a second ratio is calculated using a full evaluation of the likelihood function (9). The acceptance probability is modified at the second stage according to:

(11)

which corrects for the conditional dependence on acceptance at the first stage and therefore preserves the exact target distribution. DA has been used for PMCMC by Golightly et al. (2015), where the linear noise approximation (Fearnhead et al., 2014) was used for . Sherlock et al. (2017) instead used -nearest-neighbours for in a pseudo-marginal algorithm.

Drovandi et al. (2018) proposed an approximate pseudo-marginal algorithm, using a Gaussian process (GP) as a surrogate log-likelihood. The GP is trained using a pilot run of MCWM, then at each iteration is either approximated using the GP or else using SMC or importance sampling, depending on the level of uncertainty in the surrogate model for . MATLAB source code is available from http://www.runmycode.org/companion/view/2663. Stuart and Teckentrup (2018) have shown that, under certain assumptions, a GP provides a consistent estimator of the negative log-likelihood, and they provide error bounds on the approximation.

1.2 Exchange Algorithm

Møller et al. (2006) introduced a MCMC algorithm for the Ising model that targets the exact posterior distribution for . An auxiliary variable is defined on the same state space as , so that . This is a data augmentation approach, where we simulate from the joint posterior , which admits the posterior for as its marginal. Given a proposed parameter value , a proposal is simulated from the model to obtain an unbiased sample from (1). This requires perfect simulation methods, such as coupling from the past (Propp and Wilson, 1996), perfect slice sampling (Mira et al., 2001), or bounding chains (Huber, 2003; Butts, 2018). Refer to Huber (2016) for further explanation of perfect simulation. Instead of (7), the joint ratio for and becomes:

(12)

where the normalising constants and cancel out with each other. This is analogous to an importance-sampling estimate of the normalising constant with samples, since:

(13)

where the proposal distribution is (1). This algorithm is therefore closely-related with pseudo-marginal methods such as GIMH.

Murray et al. (2006) found that (12) could be simplified even further, removing the need for a fixed value of . The exchange algorithm replaces (7) with the ratio:

(14)

However, perfect sampling is still required to simulate at each iteration, which can be infeasible when the state space is very large. Cucala et al. (2009) proposed an approximate exchange algorithm (AEA) by replacing the perfect sampling step with 500 iterations of Gibbs sampling. Caimo and Friel (2011) were the first to employ AEA for fully-Bayesian inference on the parameters of an ERGM. AEA for the hidden Potts model is implemented in the R package ‘bayesImageS(Moores et al., 2019) and AEA for ERGM is implemented in ‘Bergm(Caimo and Friel, 2014).

1.3 Approximate Bayesian Computation

Like the exchange algorithm, ABC uses an auxiliary variable to decide whether to accept or reject the proposed value of . In the terminology of ABC, is referred to as “pseudo-data.” Instead of a Metropolis-Hastings ratio such as (7), the summary statistics of the pseudo-data and the observed data are directly compared. The proposal is accepted if the distance between these summary statistics is within the ABC tolerance, . This produces the following approximation:

(15)

where is a suitable norm, such as Euclidean distance. Since are jointly-sufficient statistics for Ising, Potts, or ERGM, the ABC approximation (15) approaches the true posterior as and . In practice there is a tradeoff between the number of parameter values that are accepted and the size of the ABC tolerance.

Grelaud et al. (2009) were the first to use ABC to obtain an approximate posterior for in the Ising/Potts model. Everitt (2012) used ABC within sequential Monte Carlo (ABC-SMC) for Ising and ERGM. ABC-SMC uses a sequence of target distributions such that , where the number of SMC iterations can be determined dynamically using a stopping rule. The ABC-SMC algorithm of Drovandi and Pettitt (2011) uses multiple MCMC steps for each SMC iteration, while the algorithm of Del Moral et al. (2012) uses multiple replicates of the summary statistics for each particle. Everitt (2012) has provided a MATLAB implementation of ABC-SMC with the online supplementary material accompanying his paper.

The computational efficiency of ABC is dominated by the cost of drawing updates to the auxiliary variable, as reported by Everitt (2012). Thus, we would expect that the execution time for ABC would be similar to AEA or pseudo-marginal methods. Various approaches to improving this runtime have recently been proposed. “Lazy ABC” (Prangle, 2016) involves early termination of the simulation step at a random stopping time, hence it bears some similarities with Russian Roulette. Surrogate models have also been applied in ABC, using a method known as Bayesian indirect likelihood (BIL; Drovandi et al., 2011, 2015). Gaussian processes (GPs) have been used as surrogate models by Wilkinson (2014) and Meeds and Welling (2014). Järvenpää et al. (2018) used a heteroskedastic GP model and demonstrated how the output of the precomputation step could be used for Bayesian model choice. Moores et al. (2015) introduced a piecewise linear approximation for ABC-SMC with Ising/Potts models. Boland et al. (2018) derived a theoretical upper bound on the bias introduced by this and similar piecewise approximations. They also developed a piecewise linear approximation for ERGM. Moores et al. (2020) introduced a parametric functional approximate Bayesian (PFAB) algorithm for the Potts model, which is a form of BIL where is derived from an integral curve.

2 Other Methods

2.1 Thermodynamic Integration

Since the Ising, Potts, and ERGM are all exponential families of distributions, the expectation of their sufficient statistic/s can be expressed in terms of the normalising constant:

(16)

Gelman and Meng (1998) derived an approximation to the log-ratio of normalising constants for the Ising/Potts model, using the path sampling identity:

(17)

which follows from (16). The value of the expectation can be estimated by simulating from the Gibbs distribution (1) for fixed values of . At each iteration, (7) can then be approximated by numerical integration methods, such as Gaussian quadrature or the trapezoidal rule. Figure 1

illustrates linear interpolation of

on a 2D lattice for labels and ranging from 0 to 2 in increments of 0.05. This approximation was precomputed using the algorithm of Swendsen and Wang (1987).

TI is explained in further detail by Chen et al. (2000, chap. 5). A reference implementation in R is available from the website accompanying Marin and Robert (2007). Friel and Pettitt (2008) introduced the method of power posteriors to estimate the marginal likelihood or model evidence using TI. Calderhead and Girolami (2009) provide bounds on the discretisation error and derive an optimal temperature schedule by minimising the variance of the Monte Carlo estimate. Oates et al. (2016) introduced control variates for further reducing the variance of TI.

Figure 1: Approximation of by simulation for fixed values of , with linear interpolation.

The TI algorithm has an advantage over auxiliary variable methods because the additional simulations are performed prior to fitting the model, rather than at each iteration. This is particularly the case when analysing multiple images that all have approximately the same dimensions. Since these simulations are independent, they can make use of massively parallel hardware. However, the computational cost is still slightly higher than pseudolikelihood, which does not require a pre-computation step.

2.2 Composite Likelihood

Pseudolikelihood is the simplest of the methods that we have considered and also the fastest. Rydén and Titterington (1998) showed that the intractable distribution (1) could be approximated using the product of the conditional densities:

(18)

This enables the Metropolis-Hastings ratio (6) to be evaluated using (18) to approximate both and at each iteration. The conditional density function for the Ising/Potts model is given by:

(19)

where are the first-order (nearest) neighbours of pixel . The conditional density for an ERGM is given by the logistic function:

(20)
(a) Expectation.
(b) Standard deviation.
Figure 2: Approximation error of pseudolikelihood for in comparison to the exact likelihood calculated using a brute force method: (a) using either Equation (1) or (18); (b)

Pseudolikelihood is exact when and provides a reasonable approximation for small values of the parameters. However, the approximation error increases rapidly for the Potts/Ising model as approaches the critical temperature, , as illustrated by Figure 2

. This is due to long-range dependence between the labels, which is inadequately modelled by the local approximation. Similar issues can arise for ERGM, which can also exhibit a phase transition.

Rydén and Titterington (1998) referred to Equation (18) as point pseudolikelihood, since the conditional distributions are computed for each pixel individually. They suggested that the accuracy could be improved using block pseudolikelihood. This is where the likelihood is calculated exactly for small blocks of pixels, then (18) is modified to be the product of the blocks:

(21)

where is the number of blocks, are the labels of the pixels in block , and are all of the labels except for . This is a form of composite likelihood, where the likelihood function is approximated as a product of simplified factors (Varin et al., 2011). Friel (2012) compared point pseudolikelihood to composite likelihood with blocks of , , , and pixels. Friel showed that (21) outperformed (18) for the Ising () model with . Okabayashi et al. (2011) discuss composite likelihood for the Potts model with

and have provided an open source implementation in the

R package ‘potts(Geyer and Johnson, 2014).

Evaluating the conditional likelihood in (21) involves the normalising constant for , which is a sum over all of the possible configurations . This is a limiting factor on the size of blocks that can be used. The brute force method that was used to compute Figure 2 is too computationally intensive for this purpose. Pettitt et al. (2003)

showed that the normalising constant can be calculated exactly for a cylindrical lattice by computing eigenvalues of a

matrix, where is the smaller of the number of rows or columns. The value of (2) for a free-boundary lattice can then be approximated using path sampling. Friel and Pettitt (2004) extended this method to larger lattices using a composite likelihood approach.

The reduced dependence approximation (RDA) is another form of composite likelihood. Reeves and Pettitt (2004) introduced a recursive algorithm to calculate the normalising constant using a lag- representation. Friel et al. (2009) divided the image lattice into sub-lattices of size , then approximated the normalising constant of the full lattice using RDA:

(22)

McGrory et al. (2009) compared RDA to pseudolikelihood and the exact method of Møller et al. (2006), reporting similar computational cost to pseudolikelihood but with improved accuracy in estimating . Ogden (2017) showed that if is chosen proportional to , then RDA gives asymptotically valid inference when . However, the error increases exponentially as approaches the phase transition. This is similar to the behaviour of pseudolikelihood in Figure 2. Source code for RDA is available in the online supplementary material for McGrory et al. (2012).

3 Conclusion

This chapter has reviewed a variety of computational methods for Bayesian inference with intractable likelihoods. Auxiliary variable methods, such as the exchange algorithm and pseudo-marginal algorithms, target the exact posterior distribution. However, their computational cost can be prohibitive for large datasets. Algorithms such as delayed acceptance, Russian Roulette, and “lazy ABC” can accelerate inference by reducing the number of auxiliary variables that need to be simulated, without modifying the target distribution. Bayesian indirect likelihood (BIL) algorithms approximate the intractable likelihood using a surrogate model, such as a Gaussian process or piecewise function. As with thermodynamic integration, BIL can take advantage of a precomputation step to train the surrogate model in parallel. This enables these methods to be applied to much larger datasets by managing the tradeoff between approximation error and computational cost.

Acknowledgements

This research was conducted by the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (project number CE140100049) and funded by the Australian Government.

References

  • Alquier et al. [2016] P. Alquier, N. Friel, R. Everitt, and A. Boland.

    Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels.

    Stat. Comput., 26(1–2):29–47, 2016. doi: 10.1007/s11222-014-9521-x.
  • Andrieu and Roberts [2009] C. Andrieu and G. O. Roberts. The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist., 37(2):697–725, 2009. doi: 10.1214/07-AOS574.
  • Andrieu and Thoms [2008] C. Andrieu and J. Thoms. A tutorial on adaptive MCMC. Stat. Comput., 18(4):343–373, 2008. doi: 10.1007/s11222-008-9110-y.
  • Andrieu and Vihola [2015] C. Andrieu and M. Vihola. Convergence properties of pseudo-marginal Markov chain Monte Carlo algorithms. Ann. Appl. Prob., 25(2):1030–1077, 04 2015. doi: 10.1214/14-AAP1022.
  • Andrieu et al. [2010] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B, 72(3):269–342, 2010. doi: 10.1111/j.1467-9868.2009.00736.x.
  • Beaumont [2003] M. A. Beaumont. Estimation of population growth or decline in genetically monitored populations. Genetics, 164(3):1139–1160, 2003.
  • Boland et al. [2018] A. Boland, N. Friel, and F. Maire. Efficient MCMC for Gibbs random fields using pre-computation. Electron. J. Statist., 12(2):4138–4179, 2018. doi: 10.1214/18-EJS1504.
  • Butts [2018] C. T. Butts. A perfect sampling method for exponential family random graph models. J. Math. Soc., 42(1):17–36, 2018. doi: 10.1080/0022250X.2017.1396985.
  • Caimo and Friel [2011] A. Caimo and N. Friel. Bayesian inference for exponential random graph models. Social Networks, 33(1):41–55, 2011. doi: 10.1016/j.socnet.2010.09.004.
  • Caimo and Friel [2014] A. Caimo and N. Friel. Bergm: Bayesian exponential random graphs in R. J. Stat. Soft., 61(2):1–25, 2014. doi: 10.18637/jss.v061.i02.
  • Calderhead and Girolami [2009] B. Calderhead and M. Girolami.

    Estimating Bayes factors via thermodynamic integration and population MCMC.

    Comput. Stat. Data Anal., 53(12):4028–4045, 2009. doi: 10.1016/j.csda.2009.07.025.
  • Cameron and Pettitt [2012] E. Cameron and A. N. Pettitt. Approximate Bayesian computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift. Mon. Not. R. Astron. Soc., 425(1):44–65, 2012. doi: 10.1111/j.1365-2966.2012.21371.x.
  • Chen et al. [2000] M.-H. Chen, Q.-M. Shao, and J. G. Ibrahim. Monte Carlo Methods in Bayesian Computation. Springer Series in Statistics. Springer-Verlag, New York, 2000.
  • Christen and Fox [2005] J. A. Christen and C. Fox. Markov chain Monte Carlo using an approximation. J. Comput. Graph. Stat., 14(4):795–810, 2005. doi: 10.1198/106186005X76983.
  • Cucala et al. [2009] L. Cucala, J.-M. Marin, C. P. Robert, and D. M. Titterington. A Bayesian reassessment of nearest-neighbor classification. J. Am. Stat. Assoc., 104(485):263–273, 2009. doi: 10.1198/jasa.2009.0125.
  • Del Moral et al. [2012] P. Del Moral, A. Doucet, and A. Jasra. An adaptive sequential Monte Carlo method for approximate Bayesian computation. Stat. Comput., 22(5):1009–20, 2012. doi: 10.1007/s11222-011-9271-y.
  • Dempster et al. [1977] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B, 39(1):1–38, 1977.
  • Doucet et al. [2015] A. Doucet, M. Pitt, G. Deligiannidis, and R. Kohn. Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika, 102(2):295–313, 2015. doi: 10.1093/biomet/asu075.
  • Drovandi and Pettitt [2011] C. C. Drovandi and A. N. Pettitt. Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics, 67(1):225–233, 2011. doi: 10.1111/j.1541-0420.2010.01410.x.
  • Drovandi et al. [2011] C. C. Drovandi, A. N. Pettitt, and M. J. Faddy. Approximate Bayesian computation using indirect inference. J. R. Stat. Soc. Ser. C, 60(3):317–337, 2011. doi: 10.1111/j.1467-9876.2010.00747.x.
  • Drovandi et al. [2015] C. C. Drovandi, A. N. Pettitt, and A. Lee. Bayesian indirect inference using a parametric auxiliary model. Stat. Sci., 30(1):72–95, 2015. doi: 10.1214/14-STS498.
  • Drovandi et al. [2018] C. C. Drovandi, M. T. Moores, and R. J. Boys. Accelerating pseudo-marginal MCMC using Gaussian processes. Comput. Stat. Data Anal., 118:1–17, 2018. doi: 10.1016/j.csda.2017.09.002.
  • Erdős and Rényi [1959] P. Erdős and A. Rényi. On random graphs. Publicationes Mathematicae Debrecen, 6:290–297, 1959.
  • Everitt [2012] R. G. Everitt. Bayesian parameter estimation for latent Markov random fields and social networks. J. Comput. Graph. Stat., 21(4):940–960, 2012. doi: 10.1080/10618600.2012.687493.
  • Fearnhead et al. [2014] P. Fearnhead, V. Giagos, and C. Sherlock. Inference for reaction networks using the linear noise approximation. Biometrics, 70(2):457–466, 2014. doi: 10.1111/biom.12152.
  • Frank and Strauss [1986] O. Frank and D. Strauss. Markov graphs. J. Amer. Stat. Assoc., 81(395):832–842, 1986.
  • Friel [2012] N. Friel. Bayesian inference for Gibbs random fields using composite likelihoods. In C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, editors, Proc. Winter Simulation Conference, pages 1–8, Dec 2012. doi: 10.1109/WSC.2012.6465236.
  • Friel and Pettitt [2004] N. Friel and A. N. Pettitt. Likelihood estimation and inference for the autologistic model. J. Comp. Graph. Stat., 13(1):232–246, 2004. doi: 10.1198/1061860043029.
  • Friel and Pettitt [2008] N. Friel and A. N. Pettitt. Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B, 70(3):589–607, 2008. doi: 10.1111/j.1467-9868.2007.00650.x.
  • Friel et al. [2009] N. Friel, A. N. Pettitt, R. Reeves, and E. Wit. Bayesian inference in hidden Markov random fields for binary data defined on large lattices. J. Comp. Graph. Stat., 18(2):243–261, 2009. doi: 10.1198/jcgs.2009.06148.
  • Gelman and Meng [1998] A. Gelman and X.-L. Meng. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statist. Sci., 13(2):163–185, 1998. doi: 10.1214/ss/1028905934.
  • Geyer and Johnson [2014] C. J. Geyer and L. Johnson. potts: Markov Chain Monte Carlo for Potts Models, 2014. URL http://CRAN.R-project.org/package=potts. R package version 0.5-2.
  • Golightly et al. [2015] A. Golightly, D. A. Henderson, and C. Sherlock. Delayed acceptance particle MCMC for exact inference in stochastic kinetic models. Statistics and Computing, 25(5):1039–1055, 2015. doi: 10.1007/s11222-014-9469-x.
  • Grelaud et al. [2009] A. Grelaud, C. P. Robert, J.-M. Marin, F. Rodolphe, and J.-F. Taly. ABC likelihood-free methods for model choice in Gibbs random fields. Bayesian Analysis, 4(2):317–336, 2009. doi: 10.1214/09-BA412.
  • Huber [2003] M. L. Huber. A bounding chain for Swendsen-Wang. Random Struct. Algor., 22(1):43–59, 2003. doi: 10.1002/rsa.10071.
  • Huber [2016] M. L. Huber. Perfect Simulation. Chapman & Hall/CRC Press, 2016.
  • Jacob and Thiery [2015] P. E. Jacob and A. H. Thiery. On nonnegative unbiased estimators. Ann. Statist., 43(2):769–784, 2015. doi: 10.1214/15-AOS1311.
  • Järvenpää et al. [2018] M. Järvenpää, M. Gutmann, A. Vehtari, and P. Marttinen. Gaussian process modeling in approximate Bayesian computation to estimate horizontal gene transfer in bacteria. Ann. Appl. Stat., 12(4):2228–2251, 2018. doi: 10.1214/18-AOAS1150.
  • Lee and Łatuszyński [2014] A. Lee and K. Łatuszyński. Variance bounding and geometric ergodicity of Markov chain Monte Carlo kernels for approximate Bayesian computation. Biometrika, 101(3):655–671, 2014. doi: 10.1093/biomet/asu027.
  • Lyne et al. [2015] A.-M. Lyne, M. Girolami, Y. Atchadé, H. Strathmann, and D. Simpson. On Russian roulette estimates for Bayesian inference with doubly-intractable likelihoods. Statist. Sci., 30(4):443–467, 2015. doi: 10.1214/15-STS523.
  • Maire et al. [2014] F. Maire, R. Douc, and J. Olsson. Comparison of asymptotic variances of inhomogeneous Markov chains with application to Markov chain Monte Carlo methods. Ann. Statist., 42(4):1483–1510, 08 2014. doi: 10.1214/14-AOS1209.
  • Marin and Robert [2007] J.-M. Marin and C. P. Robert.

    Bayesian core: a practical approach to computational Bayesian statistics

    .
    Springer Texts in Statistics. Springer, New York, 2007.
  • Marjoram et al. [2003] P. Marjoram, J. Molitor, V. Plagnol, and S. Tavaré. Markov chain Monte Carlo without likelihoods. Proc. Natl Acad. Sci., 100(26):15324–15328, 2003. doi: 10.1073/pnas.0306899100.
  • McGrory et al. [2009] C. A. McGrory, D. M. Titterington, R. Reeves, and A. N. Pettitt. Variational Bayes for estimating the parameters of a hidden Potts model. Stat. Comput., 19(3):329–340, 2009. doi: 10.1007/s11222-008-9095-6.
  • McGrory et al. [2012] C. A. McGrory, A. N. Pettitt, R. Reeves, M. Griffin, and M. Dwyer. Variational Bayes and the reduced dependence approximation for the autologistic model on an irregular grid with applications. J. Comput. Graph. Stat., 21(3):781–796, 2012. doi: 10.1080/10618600.2012.632232.
  • McKinley et al. [2018] T. J. McKinley, I. Vernon, I. Andrianakis, N. McCreesh, J. E. Oakley, R. N. Nsubuga, M. Goldstein, R. G. White, et al. Approximate Bayesian computation and simulation-based inference for complex stochastic epidemic models. Statist. Sci., 33(1):4–18, 2018. doi: 10.1214/17-STS618.
  • Medina-Aguayo et al. [2016] F. J. Medina-Aguayo, A. Lee, and G. O. Roberts. Stability of noisy Metropolis-Hastings. Stat. Comput., 26(6):1187–1211, 2016. doi: 10.1007/s11222-015-9604-3.
  • Meeds and Welling [2014] E. Meeds and M. Welling. GPS-ABC: Gaussian process surrogate approximate Bayesian computation. In Proc. 30 Conf. UAI, Quebec City, Canada, 2014.
  • Mira et al. [2001] A. Mira, J. Møller, and G. O. Roberts. Perfect slice samplers. J. R. Stat. Soc. Ser. B, 63(3):593–606, 2001. doi: 10.1111/1467-9868.00301.
  • Møller et al. [2006] J. Møller, A. N. Pettitt, R. Reeves, and K. K. Berthelsen. An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451–458, 2006. doi: 10.1093/biomet/93.2.451.
  • Moores et al. [2015] M. T. Moores, C. C. Drovandi, K. Mengersen, and C. P. Robert. Pre-processing for approximate Bayesian computation in image analysis. Stat. Comput., 25(1):23–33, 2015. doi: 10.1007/s11222-014-9525-6.
  • Moores et al. [2019] M. T. Moores, D. Feng, and K. Mengersen. bayesImageS: Bayesian Methods for Image Segmentation using a Potts Model, 2019. URL http://CRAN.R-project.org/package=bayesImageS. R package version 0.6-0.
  • Moores et al. [2020] M. T. Moores, G. K. Nicholls, A. N. Pettitt, and K. Mengersen. Scalable Bayesian inference for the inverse temperature of a hidden Potts model. Bayesian Analysis, 15(1):1–27, 2020. doi: 10.1214/18-BA1130.
  • Murray et al. [2006] I. Murray, Z. Ghahramani, and D. J. C. MacKay. MCMC for doubly-intractable distributions. In Proc. Conf. UAI, pages 359–366, Arlington, VA, 2006. AUAI Press.
  • Nicholls et al. [2012] G. K. Nicholls, C. Fox, and A. Muir Watt. Coupled MCMC with a randomized acceptance probability. arXiv preprint arXiv:1205.6857 [stat.CO], 2012. URL https://arxiv.org/abs/1205.6857.
  • Oates et al. [2016] C. J. Oates, T. Papamarkou, and M. Girolami. The controlled thermodynamic integral for Bayesian model evidence evaluation. J. Am. Stat. Assoc., 111(514):634–645, 2016. doi: 10.1080/01621459.2015.1021006.
  • Ogden [2017] H. E. Ogden. On asymptotic validity of naive inference with an approximate likelihood. Biometrika, 104(1):153–164, 2017. doi: 10.1093/biomet/asx002.
  • Okabayashi et al. [2011] S. Okabayashi, L. Johnson, and C. J. Geyer. Extending pseudo-likelihood for Potts models. Statistica Sinica, 21:331–347, 2011.
  • Olbrich et al. [2010] E. Olbrich, T. Kahle, N. Bertschinger, N. Ay, and J. Jost. Quantifying structure in networks. Eur. Phys. J. B, 77(2):239–247, 2010. doi: 10.1140/epjb/e2010-00209-0.
  • O’Neill et al. [2000] P. D. O’Neill, D. J. Balding, N. G. Becker, M. Eerola, and D. Mollison. Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. C, 49(4):517–542, 2000. doi: 10.1111/1467-9876.00210.
  • Pettitt et al. [2003] A. N. Pettitt, N. Friel, and R. Reeves. Efficient calculation of the normalizing constant of the autologistic and related models on the cylinder and lattice. J. R. Stat. Soc. Ser. B, 65(1):235–246, 2003. doi: 10.1111/1467-9868.00383.
  • Pitt et al. [2012] M. K. Pitt, R. dos Santos Silva, P. Giordani, and R. Kohn. On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J. Econometr., 171(2):134–151, 2012. doi: 10.1016/j.jeconom.2012.06.004.
  • Prangle [2016] D. Prangle. Lazy ABC. Stat. Comput., 26(1):171–185, 2016. doi: 10.1007/s11222-014-9544-3.
  • Pritchard et al. [1999] J. K. Pritchard, M. T. Seielstad, A. Perez-Lezaun, and M. W. Feldman. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol, 16(12):1791–1798, 1999. doi: 10.1093/oxfordjournals.molbev.a026091.
  • Propp and Wilson [1996] J. G. Propp and D. B. Wilson. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Struct. Algor., 9(1–2):223–252, 1996. doi: 10.1002/(SICI)1098-2418(199608/09)9:1/2¡223::AID-RSA14¿3.0.CO;2-O.
  • Reeves and Pettitt [2004] R. Reeves and A. N. Pettitt. Efficient recursions for general factorisable models. Biometrika, 91(3):751–757, 2004. doi: 10.1093/biomet/91.3.751.
  • Roberts and Rosenthal [2009] G. O. Roberts and J. S. Rosenthal. Examples of adaptive MCMC. J. Comput. Graph. Stat., 18(2):349–367, 2009. doi: 10.1198/jcgs.2009.06134.
  • Rydén and Titterington [1998] T. Rydén and D. M. Titterington. Computational Bayesian analysis of hidden Markov models. J. Comput. Graph. Stat., 7(2):194–211, 1998. doi: 10.1080/10618600.1998.10474770.
  • Sherlock et al. [2015] C. Sherlock, A. H. Thiery, G. O. Roberts, and J. S. Rosenthal. On the efficiency of pseudo-marginal random walk Metropolis algorithms. Ann. Statist., 43(1):238–275, 02 2015. doi: 10.1214/14-AOS1278.
  • Sherlock et al. [2017] C. Sherlock, A. Golightly, and D. A. Henderson. Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. J. Comput. Graph. Stat., 26(2):434–444, 2017. doi: 10.1080/10618600.2016.1231064.
  • Stuart and Teckentrup [2018] A. M. Stuart and A. L. Teckentrup. Posterior consistency for Gaussian process approximations of Bayesian posterior distributions. Math. Comp., 87:721–753, 2018. doi: 10.1090/mcom/3244.
  • Swendsen and Wang [1987] R. H. Swendsen and J.-S. Wang. Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett., 58:86–88, 1987. doi: 10.1103/PhysRevLett.58.86.
  • Tanner and Wong [1987] M. A. Tanner and W. H. Wong. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc., 82(398):528–40, 1987.
  • Varin et al. [2011] C. Varin, N. Reid, and D. Firth. An overview of composite likelihood methods. Statistica Sinica, 21:5–42, 2011.
  • Wilkinson [2014] R. D. Wilkinson. Accelerating ABC methods using Gaussian processes. In S. Kaski and J. Corander, editors, Proc. 17th Int. Conf. AISTATS, volume 33 of JMLR W&CP, pages 1015–1023, 2014.