Approximate Bayesian Computation (ABC) is a useful tool for Bayesian (see, e.g., marin2012) or frequentist (see, e.g., rubio2013simple) inferences when the likelihood function is mathematically or computationally unavailable. The successfulness of the ABC method relies on a careful choice of: the summary statistics , the distance metric and tolerance level ; with the summary statistic playing arguably the most crucial role.
To set the notation, let be
realisations of the random variable, with , . Furthermore, let be a prior distribution for , for simplicity assumed to be proper, let be the likelihood function based on model and data y and let be the posterior distribution, with normalising constant
. The Bayes factor (BF), the standard Bayesian solution for model selection, involves the posterior normalising constants of the models under comparison. Thus, if the likelihood for a single model is unavailable, the BF cannot be computed.
The ABC machinery comes equipped with an ABC model choice (ABC-MC) algorithm which works as follows (grelaud2009abc).
where is the threshold value and is the model index. The
-vector of indicesproduced form Algorithm 1
can be used, in principle, to compute posterior model probabilities and BFs.
Recently, many authors have cast doubts on the validity of the ABC model choice procedure (see, e.g., marin2014relevant; robert2011lack). For instance, suppose is a vector of counts and we wish to choose between the Poisson and the Geometric model. In both cases, with ABC we can obtain (almost) the exact posterior by using as the summary statistic, since the latter is sufficient under both models. However, the BF obtained with ABC-MC in this case converges asymptotically to a positive constant (robert2011lack). With the particular exception of Gibbs random fields (grelaud2009abc), the BF obtained with ABC-MC misses the exact BF by some unknown function of the data. marin2014relevant give theoretical conditions under which the summary statistic gives valid posterior model probabilities or BFs under the ABC model choice framework.
Clearly, the issue is with the summary statistic . Even though it can be sufficient for the parameters, it is the cross-model sufficiency that plays the crucial role here, e.g., the summary statistic must be sufficient for the models themselves (see also marin2014relevant). Finding cross-model sufficient statistics in practice is impossible, and some efforts have been spent on constructing summary statistics for ABC model selection (see, e.g., barnes2012). However, at the best of our knowledge, the choice of summary statistics for ABC model selection is still an open problem. Last but not the least, summary statistics for ABC model selection are notoriously a bad choice for ABC posterior sampling (C.P. Robert, personal communication).
In Section 2 we show how the marginal likelihood can be approximated by using the sufficient summary statistic and ABC. In Section 3 we conclude by pointing to future developments.
2 Marginal likelihood from sufficient statistics
Let us focus on a single model , and suppose that is sufficient for . By the sufficiency principle we have that
From this we see that
To approximate we propose to approximate via ABC, and by simulation as follows (see Algorithm 2).
Steps 2 and 4 of Algorithm 2 can be performed by any density estimation method; in Step 5 we only need to generate a (possibly) large sample of data from the model under , a fixed value of .
A toy example: the Poisson model
Suppose , and a priori . The marginal likelihood in case is
As a numerical example, consider , which are realisations of random draws from distribution. Figure 1 shows on the left side the histogram of the ABC posterior against the exact posterior (solid line). The ABC posterior is approximated by final samples with , where is the Euclidean distance among the total number of counts.
The right side of Figure 1 shows the scatter plot of the logarithm of the marginal likelihood obtained from ABC against the logarithm of the exact marginal likelihood (2.1), in 50 random samples of size from . The approximate marginal likelihoods obtained from ABC with the sufficient statistic and the exact marginal likelihoods are virtually indistinguishable.
Obviously, in realistic scenarios sufficient summary statistics are unavailable. However, if we have a set of judiciously chosen summary statistics which give provably valid inference on the parameters of interest, then the idea can still be usefully applied. The more close to sufficiency is the summary statistic the more close to the exact value is the proposed approximation.
This work was presented at the BayesComp 2018 conference (26–29 March, Barcelona) and is partially supported by University of Padova (Progetti di Ricerca di Ateneo 2015, CPDA153257) and by PRIN 2015 (grant 2015EASZFS_003).