1 Algorithms
Approximate Bayesian computation (ABC) approximates Bayesian inference on parameters
with prior given data . It must be possible to simulate data from the model of interest given . This implicitly defines a likelihood function : the density of conditional on .A standard importance sampling version of ABC samples parameters from an importance density and simulates corresponding datasets . Weights are calculated by equation (1) below. Then for a generic function
, an estimate of its posterior expectation
is . An estimate of the normalising constant , used in Bayesian model choice, is . Under the ideal choice of weights, , these estimates converge (almost surely) to the correct quantities as Liu:1996 . In applications where cannot be evaluated ABC makes inference possible with the tradeoff that it gives approximate results. That is, the estimators converge to approximations of the desired values.The ABC importance sampling weights avoid evaluating by using:
(1)  
(2) 
Here:

acts as an estimate (up to proportionality) of . This and
are random variables since they depend on
, a random draw from the model conditional on . 
maps a dataset to a lower dimensional vector of
summary statistics. 
maps two summary statistic vectors to a nonnegative value. This defines the distance between two vectors.

, the ABC kernel maps from a nonnegative value to another. A typical choice is a uniform kernel , which makes an accept/reject decision. Another choice is a normal kernel .

is a tuning parameter, the bandwidth. It controls how close a match of and is required to produce a significant weight.
The interplay between these tuning choices has been the subject of considerable research but is not considered further here. For further information on this and all aspects of ABC see the review papers Beaumont:2010 ; Marin:2012 .
Lazy ABC splits simulation of data into two stages. First the output of some initial simulation stage is simulated conditional on , then, sometimes, a full dataset is simulated conditional on and . The latter is referred to as the continuation simulation stage. The variable should encapsulate all the information which is required to resume the simulation so may be high dimensional. There is considerable freedom of what the initial simulation stage is. It may conclude after a prespecified set of operations, or after some random event is observed. Another tuning choice is introduced, the continuation probability function . This outputs a value in
which is the probability of continuing to the continuation simulation stage. The desired behaviour in choosing the initial simulation stage and
is that simulating is computationally cheap but can be used to save time by assigning small continuation probabilities to many unpromising simulations.Given all the above notation, lazy ABC is Algorithm 1. To avoid division by zero in step 5, it will be required that , although this condition can be weakened Prangle:2014 .
Lazy ABC has the same target as standard ABC importance sampling, in the sense that the Monte Carlo estimates and converge to the same values for . This is proved by Theorem 1 and related discussion in Prangle:2014 . A sketch of the argument is as follows. Standard ABC is essentially an importance sampling algorithm: each iteration samples a parameter value from and assigns it a random weight given by (1). The randomness is due to the random simulation of data . The expectation of this weight conditional on is
where expectation is taken over values of .
Lazy ABC acts similarly but uses different random weights
(3) 
The randomness here is due to simulation of and . Taking expectations gives:
From the theory of importance sampling algorithms with random weights (see Prangle:2014 ) this ensures that both algorithms target the same distribution.
This argument shows lazy ABC targets the same and quantities as standard ABC, for any choice of initial simulation stage and . However, for poor choices of these tuning decisions it may converge very slowly. The next section considers effective tuning.
2 Lazy ABC tuning
The quality of lazy ABC tuning can be judged by an appropriate measure of efficiency. Here this is defined as effective sample size (ESS) divided by computing time. The ESS for a sample with weights is
It can be shown Liu:1996 that for large
the variance of
typically equals that of independent samples. Computing time is taken to be the sum of CPU time for each core used (as the lazy ABC iterations can easily be performed in parallel.)Theorem 2 of Prangle:2014 gives the following results on the choice of which maximises the efficiency of lazy ABC in the asymptotic case of large . For now let represent . Then the optimal choice of is of the following form:
(4)  
(5) 
Here is the expectation given of , the squared weight which would be achieved under standard ABC importance sampling; is the expected time for steps 46 of Algorithm 1 given ; is a tuning parameter that controls the relative importance of maximising ESS (maximised by ) and minimising computation time (minimised by ).
A natural approach to tuning in practice is as follows. The remainder of the section discusses these steps in more detail.

Estimate and from training data.

Choose to maximise an efficiency estimate based on the training data.

Decide amongst various choices of initial simulation stage (and , see below) by maximising estimated efficiency. By collecting appropriate data for these choices in step 1 it is not necessary to repeat it.
Step 2 is a regression problem, but is not feasible for as this will typically be very high dimensional. Instead can be based on low dimensional features of , referred to as decision statistics. That is, only functions of the form are considered, where outputs a vector of decision statistics. The optimal such is again given by (4) and (5). The choice of which decision statistics to use can be included in step 4 above.
Estimating by regression is also challenging if there are regions of space for which most of the responses are zero. This is typically the case for uniform . In Prangle:2014 various tuning methods were proposed for uniform but these are complicated and rely on strong assumptions. A simpler alternative used here is to use a normal as it has full support.
Local regression techniques Hastie:2009 are suggested for step 2. This is because the behaviour of the responses typically varies considerably for different values, motivating fitting separate regressions. Firstly, the typical magnitude of varies over widely different scales. Secondly, for both regressions the distribution of the residuals may also vary with . To ensure positive predictions, the use of degree zero regression is suggested i.e. a NadarayaWatson kernel estimator.
The efficiency estimate required in steps 3 and 4 can be formed from the training data and proposed choice of . Let be the values for the training data and be the values of . The realised efficiency of the training data is not used since it is based on a small sample size. Instead the asymptotic efficiency is estimated. Under weak assumptions (see Prangle:2014 ) this is , where random variable is the CPU time for a single iteration of lazy ABC. Note that is constant (the ABC approximation for the normalising constant ) under any tuning choices, so it is omitted. This leaves an estimate up to proportionality of which can be used to calculate efficiency relative to standard ABC (found by setting ). An estimate of is . Using (3) an estimate of is
3 Example
As an example the spatial extremes application of Erhardt:2012 is used. This application and the implementation of lazy ABC is described in full in Prangle:2014 . A short sketch is that the model of interest has two parameters . Given these, data can be generated for years and locations . These represent annual extreme measurements e.g. of rainfall or temperature. An ABC approach has been proposed including choices of and . Also, given data for a subset of locations an estimate of the ABC distance can be formed.
Simulation of data is hard to interrupt and later resume. However the most expensive part of the process is calculating the summary statistics, which involves calculating certain coefficients for every triple of locations. Therefore the initial simulation stage of lazy ABC is to simulate all the data and calculate an estimated distance based on a subset of locations , which is used as the decision statistic . The continuation stage is to calculate the coefficients for the remaining triples and return the realised distance.
Tuning of lazy ABC was performed as described in Section 2, using backwards selection in step 4 to find an appropriate subset of locations to use as . To fit the regressions estimating and a NadarayaWatson kernel estimator was used with a Gaussian kernel and bandwidth 0.5, chosen manually.
Repeating the example of Prangle:2014 , 6 simulated data sets were analysed using standard and lazy ABC. Each analysis used simulations in total. In lazy ABC of these were used for training. The results are shown in Table 1. The efficiency improvements of lazy ABC relative to standard ABC are of similar magnitudes to those in Prangle:2014 but are less close to the values estimated in step 3 of tuning.
4 Conclusion
The paper has reviewed lazy ABC Prangle:2014 , a method to speed up ABC without introducing further approximations to the target distribution. Unlike Prangle:2014 , nonuniform ABC kernels have been considered. This allows a simpler approach to tuning, which provides a comparable threefold efficiency increase in a spatial extremes example.
Several extensions to lazy ABC are described in Prangle:2014 : multiple stopping decisions, choosing after running the algorithm and a similar scheme for likelihoodbased inference. Other potential extensions include using the lazy ABC approach in ABC versions of MCMC or SMC algorithms, or focusing on model choice.
References
 [1] Dennis Prangle. Lazy ABC. Statistics and Computing, 2015. To appear (available at http://arxiv.org/abs/1405.7867).
 [2] Jun S. Liu. Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Statistics and Computing, 6(2):113–119, 1996.
 [3] Mark A. Beaumont. Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution and Systematics, 41:379–406, 2010.
 [4] JeanMichel Marin, Pierre Pudlo, Christian P. Robert, and Robin J. Ryder. Approximate Bayesian computational methods. Statistics and Computing, 22(6):1167–1180, 2012.
 [5] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer, 2009.
 [6] Robert J. Erhardt and Richard L. Smith. Approximate Bayesian computing for spatial extremes. Computational Statistics & Data Analysis, 56(6):1468–1481, 2012.
Comments
There are no comments yet.