1. Introduction
Approximate Bayesian computation (ABC) is a form of likelihoodfree inference (see, e.g., the reviews Marin et al., 2012; Sunnåker et al., 2013)
which is used when exact Bayesian inference of a parameter
with posterior density is impossible, where is the prior density and is an intractable likelihood with data . More specifically, when the generative model of observations cannot be evaluated, but allows for simulations, ABC can be used for relatively straightforward approximate inference, based on a pseudoposterior(1) 
where is a ‘tolerance’ parameter, and is a ‘kernel’ function, which is often taken as a simple cutoff , where
extracts a vector of summary statistics from the (pseudo) observations
The summary statistics are often chosen based on the application at hand, and reflect what is relevant for the inference task; see also (Fearnhead and Prangle, 2012). Because may be regarded as a smoothed version of the true likelihood using the kernel , it is intuitive that using a too large may blur the likelihood and bias the inference. Therefore, it is generally desirable to use as small a tolerance as possible, but because the computational ABC methods suffer from inefficiency with small , the choice of tolerance level is difficult (cf. Bortot et al., 2007; Sisson and Fan, 2018; Tanaka et al., 2006).
We discuss a simple postprocessing procedure which allows for consideration of a range of values for the tolerance , based on a single run of ABC Markov chain Monte Carlo (ABCMCMC) (Marjoram et al., 2003) with tolerance . Postprocessing has been suggested earlier at least in (Wegmann et al., 2009) (in the special case of simple cutoff), and it can be regarded as an importance sampling correction of pseudomarginal type MCMC (cf. Vihola et al., 2016). The method, discussed further in Section 2, can be useful for two reasons:

A range of tolerances may be routinely inspected, which can reveal excess bias of ABCMCMC with tolerance .

The ABCMCMC may be implemented with sufficiently large to allow for good mixing, and postcorrection may be used for inference.
Our contribution is twofold. We suggest straightforwardtocalculate approximate confidence intervals for the postprocessing output, with some theoretical properties discussed in Section 3. We also introduce an adaptive ABCMCMC in Section 4 which finds a balanced during burnin, using acceptance rate as a proxy. We provide some experimental results regarding the suggested confidence interval and the tolerance adaptation in Section 5, and conclude with discussion in Section 6.
2. ABCMCMC with postprocessing over a range of tolerances
For the rest of the paper, we assume that the kernel function has the following form:
where is any ‘dissimilarity’ function and is a nonincreasing ‘cutoff’ function. Typically , where are the chosen summaries, and in case of the simple cutoff discussed in Section 1, . We also assume that the ABC posterior given in (1) is welldefined for all of interest (that is, ).
The following summarises the ABCMCMC algorithm suggested by Marjoram et al. (2003), using a proposal density and a tolerance :
Algorithm 1 (AbcMcmc()).
Suppose and are any starting values, such that and . For , iterate:

Draw and .
Note that Algorithm 1 may be implemented by storing only and the related distances , and in what follows, we regard either or as the output of Algorithm 1. Note also that in practice, the initial values should be taken as the state of the Algorithm 1 run for a number of initial ‘burnin’ iterations, during which time an adaptive algorithm for parameter tuning may be employed (Section 4).
Definition 2.
Suppose is the output of ABCMCMC() for some . For any such that for some , and for any function , define
The estimator approximates and may be used to construct a confidence interval; see Algorithm 6 below. The following algorithm shows that in case of simple cutoff, and may be calculated simultaneously for all tolerances efficiently:
Algorithm 3.
Suppose and is the output of ABCMCMC().

Sort with respect to :

Find indices such that for all .

Denote .


For all unique values , let , and define
(and for , let and .)
The sorting in Algorithm 3(i) may be performed in time, and and may all be calculated in time by forming appropriate cumulative sums.
Theorem 5 below details consistency of , and relates
to the limiting variance, in case the following (wellknown) condition ensuring a central limit theorem holds:
Assumption 4 (Finite integrated autocorrelation).
Suppose that and is finite, with , where is a stationary version of the ABCMCMC() chain, and
Theorem 5.
Proof of Theorem 5 is given in Appendix A. Based on Theorem 5, we suggest to report the following approximate confidence intervals for the suggested estimators:
Algorithm 6.
The classical choice for in Algorithm 6(ii) is windowed autocorrelation, , with some , where is the sample autocorrelation of (cf. Geyer, 1992), but also more sophisticated techniques for the calculation of the asymptotic variance have been suggested (e.g. Flegal and Jones, 2010).
Because computing an estimate of is computationally demanding, and because such an estimate is likely to be unstable for small , Algorithm 6 is based on the use of as a common autocorrelation for all . This relies on the approximation , which may not always be entirely accurate, but likely to be reasonable, as illustrated by Theorem 7 in Section 3 below.
We remark that, although we focus on the case of using a common cutoff for both the ABCMCMC and postcorrection, one could also consider using two different cutoffs. The extension to Definition 2 is straightforward, and Algorithm 3 holds with simple postcorrection cutoff, under a support condition.
3. Confidence interval and efficiency
The following result, whose proof is given in Appendix A, gives an expression for the integrated autocorrelation in case of simple cutoff.
Theorem 7.
Suppose Assumption 4 holds and , then
where , , is the integrated autocorrelation of and the rejection probability of the ABCMCMC() chain at .
We next discuss how this loosely suggests that . Note that , and under suitable regularity conditions both and are continuous with respect to , and as . Then, for , we have and therefore . For small , the terms with are of order , and are dominated by the other terms of order . The remaining ratio may be written as
where with . If , then the term is upper bounded by , and we believe it to be often less than , because the latter expression is similar to the contribution of rejections to the integrated autocorrelation; see the proof of Theorem 7.
For general , it appears to be hard to obtain similar theoretical result, but we expect the approximation to be still sensible. Theorem 7 relies on being independent of conditional on , assuming at least single acceptance. This is not true with other cutoffs, but we believe that the dependence of from given is generally weaker than dependence of and , suggesting similar behaviour.
Let us state next a general upper bound for the IScorrected ABCMCMC as we suggest, with respect to a direct ABCMCMC with a smaller tolerance.
Theorem 8.
Theorem 8 follows directly from (Franks and Vihola, 2017, Corollary 4). The upper bound guarantees that a moderate correction, that is, close to and close to , is nearly as efficient as direct ABCMCMC. Indeed, typically and as , in which case Theorem 8 implies . However, as , the bound becomes less informative.
4. A tolerance adaptive ABCMCMC algorithm
We propose Algorithm 9 below to adapt the tolerance in the ABCMCMC during burnin, in order to obtain a userspecified overall acceptance rate . Together with the approach based on postcorrection of Section 2, we thus obtain an automated ABC inference solution that does not require prior choice of an .
In Algorithm 9, we assume that a desired acceptance rate is specified. We used in our experiments, and discuss this choice later. We also assume a choice of decreasing positive step sizes . We used in our experiments. For convenience, we denote the distance distribution here as , where for .
Algorithm 9 (Ta()).
Suppose is a starting value with .

Initialise where and .

For , iterate:

Draw .

Draw .

Accept, by setting , with probability
(2) and otherwise reject, by setting .



Output .
The following simple conditions suffice for convergence of the adaptation
Assumption 10.
Suppose and the following hold:

with and a constant.

The domain , , is a nonempty open set.

is uniformly bounded on , and for .

is a uniformly bounded density, and on , .

admits a uniformly bounded density .

stays in a set almost surely, where .

for all .
Theorem 11.
Proof of Theorem 11 will follow from the more general Theorem 13 of Appendix B. Theorem 13 is phrased for geometrically ergodic chains on possibly unbounded domains without the lower bound in Assumption 10(iv). See Appendix C for the proofs of both theorems.
In practice, the tolerance adaptation is the most straightforward to apply with a symmetric random walk proposal adapted simultaneously with proposal covariance adaptation (Andrieu and Moulines, 2006; Haario et al., 2001) (see Algorithm 22 of Appendix D for a detailed description of the resulting algorithm). Such simultaneous use of different optimisation criteria within adaptive MCMC has been discussed, for example, in the review Andrieu and Thoms (2008). While we do not consider Algorithm 22 explicitly in our theoretical analysis, our results could be elaborated, along the lines of Andrieu and Moulines (2006), to accommodate Algorithm 22 in detail.
In the standard Adaptive Metropolis algorithm (Haario et al., 2001), the limiting acceptance rate is often around (Roberts et al., 1997). In the ABCMCMC context, this acceptance rate would be reached if the tolerance would be made infinite, and if the prior distribution would be regular enough (e.g. Gaussian). Because the mean acceptance rate of ABCMCMC typically decreases when tolerance is decreased (see Lemma 15 of Appendix C in case of ), and because the likelihood approximation must be reasonable, the desired acceptance rate should be substantially lower than .
As ABCMCMC may be interpreted as an instance of pseudomarginal MCMC, for which there are certain conditions under which the optimal acceptance rate of about is reached (Sherlock et al., 2015), one could take this as a guideline as well for the ABCMCMC. However, the context of Sherlock et al. (2015) is quite dissimilar to that of ABCMCMC, and so we decided to push the acceptance rate a little higher to ensure sufficient mixing. As well, we do subsequent postcorrection, which is further justification for a slightly inflated tolerance and therefore acceptance rate.
5. Experiments
We experiment with our methods on two models, a lightweight Gaussian toy example, and a LotkaVolterra model. Our experiments aim at providing information regarding the following questions:

Can ABCMCMC with larger tolerance and postcorrection to a desired tolerance deliver more accurate results than direct application of ABCMCMC?

Does our approximate confidence interval appear reliable?

How well does the adaptive ABCMCMC work in practice?
In all our experiments, we apply the Adaptive Metropolis (Haario et al., 2001; Andrieu and Moulines, 2006) covariance adaptation, which is run during the whole simulation, using an identity covariance initially.
Regarding our first question, we investigate running the ABCMCMC starting near the posterior mode with different preselected tolerances, both selected in a preliminary pilot experiment. We first attempted to perform the experiments by initialising the chains from independent samples of the prior distribution, but in this case, most of the chains did not accept a single move during the whole run. In contrast, our experiments with tolerance adaptation do start from initial points drawn from the prior distribution, and both the tolerances and the covariances are adjusted fully automatically by our algorithm. The latter assumes no prior information of the model at all, which we aim at.
In our tests about confidence intervals, we employ a simple ‘automatic window’ estimator of integrated autocorrelation of the form , where are lag sample autocorrelations, and where is the smallest positive integer such that (Sokal, 1996).
When running the covariance adaptation alone, we employ the covariance adaptation of Andrieu and Moulines (2006) with step size , which behaves similar to the original Adaptive Metropolis algorithm of Haario et al. (2001). In case we apply tolerance adaptation, we use step size for both the tolerance adaptation and for the covariance adaptation. Slower decaying step sizes such as this often behave better with acceptance rate adaptation (cf. Vihola, 2012, Remark 3).
All the experiments are implemented in Julia (Bezanson et al., 2017), and the codes are available in https://bitbucket.org/mvihola/abcmcmc.
5.1. Onedimensional Gaussian model
Our first model is a toy model with and , where
is standard Gaussian random variable. The true posterior without ABC approximation is Gaussian. While this scenario is clearly academic, the prior is far from the posterior, which we believe to be common in practice. It is clear that
has zero mean for all , and also that the distribution is spread wider for bigger . We experiment with both simple cutoff and Gaussian cutoff .We run the experiments with 10,000 independent chains, each for 11,000 iterations including 1,000 burnin. The chains were always started from . Figures 1 and 2 show results of the same experiments with simple and Gaussian cutoff. On the left, a single realisation of the estimates and confidence intervals calculated for all are shown for functions (above) and (below). The figures on the right show box plots of the final estimators calculated for each chain, for five equispaced tolerance values between and ( axis labels indicate these tolerances). The leftmost box plot in each group corresponds to the direct ABCMCMC targeting that tolerance, and the rightmost box plot corresponds to the postcorrected estimators from the ABCMCMC with , and the second from the right with etc. The colour indicates the . Some postcorrected estimates appear to be slightly more accurate than ABCMCMC, and in the results suggest that might be a good choice, if the desired tolerance is .
Table 1 indicates the frequencies of the calculated 95% confidence intervals containing the ‘ground truth’, over the 10,000 independent experiments, as well as acceptance rates. The ground truth for is known to be zero for all , and the overall mean of all the calculated estimates is used as the ground truth for . The frequencies appear close to ideal with the postcorrection approach, being slightly pessimistic in case of simple cutoff as anticipated by the theoretical considerations (see Theorem 7 and discussion below).
Cutoff  Acc.  

0.10  0.82  1.55  2.28  3.00  0.10  0.82  1.55  2.28  3.00  rate  
0.1  0.93  0.93  0.03  
0.82  0.97  0.95  0.95  0.94  0.22  
1.55  0.97  0.97  0.95  0.96  0.95  0.95  0.33  
2.28  0.98  0.97  0.96  0.95  0.96  0.96  0.96  0.95  0.4  
3.0  0.98  0.98  0.97  0.97  0.95  0.96  0.96  0.96  0.95  0.95  0.43  
0.1  0.93  0.93  0.05  
0.82  0.94  0.95  0.92  0.95  0.29  
1.55  0.94  0.94  0.95  0.94  0.94  0.95  0.38  
2.28  0.95  0.95  0.95  0.95  0.95  0.95  0.96  0.95  0.41  
3.0  0.95  0.95  0.95  0.95  0.95  0.95  0.96  0.95  0.95  0.95  0.42 
Figure 3 shows progress of tolerance adaptations during the burnin, and histogram of the mean acceptance rates of the chain after burnin. The lines on the left show the median, and the shaded regions indicate the 50%, 75%, 95% and 99% quantiles. The figures indicate concentration, but suggest that the adaptation has not fully converged yet. This is also indicated by the mean acceptance rate over all realisations, which are and with simple and Gaussian cutoff, respectively. Table 2 shows root mean square errors from the ground truth, for both the fixed tolerance estimators, and the adaptive algorithms, for tolerance . Here, only the adaptive chains with final tolerance were included (9,997 and 9,996 out of 10,000 chains for the simple and Gaussian cutoffs, respectively).
Fixed tolerance  Adapt  Fixed tolerance  Adapt  
0.1  0.82  1.55  2.28  3.0  0.64  0.1  0.82  1.55  2.28  3.0  0.28  
9.68  8.99  9.21  9.67  10.36  9.16  7.97  7.12  7.82  8.94  9.93  9.26  
5.54  5.38  5.5  5.85  6.21  5.44  4.47  4.22  4.68  5.26  5.95  5.46 
5.2. LotkaVolterra model
Our second experiment is a LotkaVolterra model suggested in (Boys et al., 2008), and also analysed in the ABC context in (Fearnhead and Prangle, 2012). The model is a Markov process of counts, corresponding to a reaction network with rate , with rate and with rate . The reaction rates are parameters, which we equip here with a uniform prior, . The data is a simulated trajectory from the model with until time . The ABC is based on Euclidean distance of a sixdimensional summary statistic, which consists of:

Sample autocorrelation of at lag 10, multiplied by 100.

10% and 90% (timeaveraged) quantiles of both and .

Number of jumps (or events), divided by 10.
The summary statistics are then for the observed series.
We first run comparisons similar to Section 5.1, but now only with 1,000 independent chains and simple cutoff. We investigate the effect of postcorrection, with 20,000 samples, including 10,000 burnin, for each chain. The MCMC is run on logtransformed , and all chains were started from near the mode, from . Figure 4 and Table 3 show similar comparisons as in Section 5.1. The results suggest that postcorrected ABC does provide slightly more accurate estimators, particularly with smaller tolerances.
100.0 
125.0 
150.0 
175.0 
200.0 
100.0 
125.0 
150.0 
175.0 
200.0 
100.0 
125.0 
150.0 
175.0 
200.0 
Acc. rate  

100.0  0.59  0.55  0.53  0.04  
125.0  0.97  0.88  0.97  0.88  0.96  0.81  0.11  
150.0  0.99  0.97  0.92  0.99  0.97  0.92  0.99  0.95  0.88  0.13  
175.0  0.99  0.97  0.96  0.92  0.99  0.98  0.96  0.92  0.99  0.98  0.97  0.92  0.16  
200.0  0.98  0.98  0.98  0.96  0.94  0.99  0.99  0.98  0.97  0.92  0.98  0.97  0.96  0.96  0.92  0.18 
In addition, we experiment with the tolerance adaptation, using also 20,000 samples out of which 10,000 are burnin. Figure 5 shows the progress of the tolerance during the burnin, and histogram of the realised mean acceptance rates during the estimation phase. The realised acceptance rates are concentrated around the desired , with mean . Table 4 shows RMSEs of both the fixed tolerance ABCMCMC outputs and with tolerance adaptation. Again, only the adaptive chains with final tolerance were included (999 out of 1,000 chains).
In this case, the chains run with the tolerance adaptation led to better results than those run only with the covariance adaptation (and fixed tolerance). This perhaps surprising result may be due to the initial behaviour of the covariance adaptation, which may be unstable when there are many rejections. Different initialisation strategies, for instance following (Haario et al., 2001, Remark 2), might lead to more stable behaviour compared to using the adaptation of Andrieu and Moulines (2006) from the start, as we do. The different step size sequences ( and ) could also play a rôle. We repeated the experiment for the chains with fixed tolerances, but now with covariance adaptation step size . This led to more stable behaviour of the ABCMCMC with tolerance . In any case, also here, the adaptive ABCMCMC using the tolerance adaptation delivered slightly better results (see supplementary results in Appendix E).
Fixed tolerance  Adapt  

100.0  125.0  150.0  175.0  200.0  119.1  
5.07  1.39  1.13  1.31  1.74  0.79  
3.15  0.85  0.69  0.74  1.02  0.54  
2.94  1.09  0.87  0.85  1.39  0.51 
6. Discussion
We believe our approach consisting of ABCMCMC with postprocessing is a useful addition, and complements some earlier and related work. As previously mentioned, trimming of ABCMCMC output to finer tolerances has been considered earlier (e.g. Wegmann et al., 2009). Our experimental results suggest that this can indeed be beneficial, and our confidence interval may make the approach more appealing in practice.
Another related approach by Bortot et al. (2007) makes tolerance an auxiliary variable with a userspecified prior, and ABCMCMC is run targeting the joint posterior of parameter and tolerance. While this approach avoids tolerance selection, we believe that our approach, where the effect of tolerance can be investigated explicitly, can be helpful in interpretation of the ABC posterior. In fact, Bortot et al. (2007) also provide tolerancedependent analysis, but we believe that our estimators, with associated confidence intervals, have a more immediate interpretation.
Automatic selection of tolerance in ABCMCMC has been considered earlier in Ratmann et al. (2007), who propose an algorithm based on tempering and a cooling schedule. It has been remarked by Sisson and Fan (2018) that acceptance rate based adaptation could be used to deal with the choice of a suitable tolerance. Based on our experiments, the adaptive ABCMCMC we present in this paper appears to perform well in practice, and provides reliable results with postcorrection. The tolerance adaptation also seems to benefit the covariance adaptation in the early phases. For the adaptive ABCMCMC to work efficiently, the MCMC chains must be taken relatively long, rendering the approach difficult for computationally demanding models. However, we believe that our approach using adaptive ABCMCMC provides a straightforward way to do inference with ABC models.
Our estimators, and their uncertainty estimators, could also turn out to be useful in the regression adjustment context (Wegmann et al., 2009; Beaumont et al., 2002; Blum, 2010). We did not consider such adjustments, but note that approximate normality and the confidence bounds may be used to derive an appropriately weighted estimator that reflects the uncertainty of the estimators.
We conclude with a brief discussion of certain extensions of the suggested postcorrection method. The first extension is based on ‘recycling’ the rejected samples in the estimator (Ceperley et al., 1977). This may improve the accuracy (but can also reduce accuracy in certain pathological cases; see Delmas and Jourdain (2009)). The ‘waste recycling’ estimator is
When is consistent under Theorem 5(i), this is also a consistent estimator. Namely, as in the proof (in Appendix A) of Theorem 5, we find that is a Harris recurrent Markov chain with invariant distribution
and , where . Therefore, is a strongly consistent estimator of
See (Rudolf and Sprungk, 2018; Schuster and Klebanov, 2018) for alternative waste recycling estimators based on importance sampling analogues.
Another extension, which could be considered, is about enhancing the accuracy of the estimator with smaller values of , by performing further simulations from the model (which may be calculated in parallel for different ). Namely, a new estimator may be formed as follows:
for , where is the number of independent random variables generated before observing where , and with independent . This ensures that
which is sufficient to ensure that is a proper weighting scheme from to ; see (Vihola et al., 2016, Proposition 17(ii)), and consequently the average is a proper weighting.
The latter extension, which involves additional simulations as postprocessing, is similar to ‘lazy ABC’, which incorporates a randomised stopping rule for simulation (Prangle, 2016, 2015), and to unbiased ‘exact’ ABC (Tran and Kohn, 2015), which may lead to estimators which get rid of bias entirely, using the debiasing approach lately investigated in (Rhee and Glynn, 2015; McLeish, 2011).
7. Acknowledgements
This work was supported by Academy of Finland (grants 274740, 284513 and 312605). The authors wish to acknowledge CSC, IT Center for Science, Finland, for computational resources. The authors wish to thank Christophe Andrieu for useful discussions.
References
 Andrieu and Moulines (2006) C. Andrieu and É. Moulines. On the ergodicity properties of some adaptive MCMC algorithms. Ann. Appl. Probab., 16(3):1462–1505, 2006.
 Andrieu and Thoms (2008) C. Andrieu and J. Thoms. A tutorial on adaptive MCMC. Statist. Comput., 18(4):343–373, Dec. 2008.
 Andrieu et al. (2005) C. Andrieu, É. Moulines, and P. Priouret. Stability of stochastic approximation under verifiable conditions. SIAM J. Control Optim., 44(1):283–312, 2005.
 Beaumont et al. (2002) M