Lazier ABC

01/21/2015
by   Dennis Prangle, et al.
University of Reading
0

ABC algorithms involve a large number of simulations from the model of interest, which can be very computationally costly. This paper summarises the lazy ABC algorithm of Prangle (2015), which reduces the computational demand by abandoning many unpromising simulations before completion. By using a random stopping decision and reweighting the output sample appropriately, the target distribution is the same as for standard ABC. Lazy ABC is also extended here to the case of non-uniform ABC kernels, which is shown to simplify the process of tuning the algorithm effectively.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/05/2021

Learning Algorithms for Regenerative Stopping Problems with Applications to Shipping Consolidation in Logistics

We study regenerative stopping problems in which the system starts anew ...
01/24/2019

Reachability Problem in Non-uniform Cellular Automata

This paper deals with the CREP (Configuration REachability Problem) for ...
04/28/2020

A System for Generating Non-Uniform Random Variates using Graphene Field-Effect Transistors

We introduce a new method for hardware non-uniform random number generat...
08/10/2020

An Improved Exact Sampling Algorithm for the Standard Normal Distribution

In 2016, Karney proposed an exact sampling algorithm for the standard no...
10/30/2019

Weighted matrix completion from non-random, non-uniform sampling patterns

We study the matrix completion problem when the observation pattern is d...
04/13/2020

Non-clairvoyant Scheduling of Coflows

The coflow scheduling problem is considered: given an input/output switc...
12/19/2017

Assessing the Performance of Leja and Clenshaw-Curtis Collocation for Computational Electromagnetics with Random Input Data

We consider the problem of quantifying uncertainty regarding the output ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Algorithms

Approximate Bayesian computation (ABC) approximates Bayesian inference on parameters

with prior given data . It must be possible to simulate data from the model of interest given . This implicitly defines a likelihood function : the density of conditional on .

A standard importance sampling version of ABC samples parameters from an importance density and simulates corresponding datasets . Weights are calculated by equation (1) below. Then for a generic function

, an estimate of its posterior expectation

is . An estimate of the normalising constant , used in Bayesian model choice, is . Under the ideal choice of weights, , these estimates converge (almost surely) to the correct quantities as Liu:1996 . In applications where cannot be evaluated ABC makes inference possible with the trade-off that it gives approximate results. That is, the estimators converge to approximations of the desired values.

The ABC importance sampling weights avoid evaluating by using:

(1)
(2)

Here:

  • acts as an estimate (up to proportionality) of . This and

    are random variables since they depend on

    , a random draw from the model conditional on .

  • maps a dataset to a lower dimensional vector of

    summary statistics.

  • maps two summary statistic vectors to a non-negative value. This defines the distance between two vectors.

  • , the ABC kernel maps from a non-negative value to another. A typical choice is a uniform kernel , which makes an accept/reject decision. Another choice is a normal kernel .

  • is a tuning parameter, the bandwidth. It controls how close a match of and is required to produce a significant weight.

The interplay between these tuning choices has been the subject of considerable research but is not considered further here. For further information on this and all aspects of ABC see the review papers Beaumont:2010 ; Marin:2012 .

Lazy ABC splits simulation of data into two stages. First the output of some initial simulation stage is simulated conditional on , then, sometimes, a full dataset is simulated conditional on and . The latter is referred to as the continuation simulation stage. The variable should encapsulate all the information which is required to resume the simulation so may be high dimensional. There is considerable freedom of what the initial simulation stage is. It may conclude after a prespecified set of operations, or after some random event is observed. Another tuning choice is introduced, the continuation probability function . This outputs a value in

which is the probability of continuing to the continuation simulation stage. The desired behaviour in choosing the initial simulation stage and

is that simulating is computationally cheap but can be used to save time by assigning small continuation probabilities to many unpromising simulations.

Given all the above notation, lazy ABC is Algorithm 1. To avoid division by zero in step 5, it will be required that , although this condition can be weakened Prangle:2014 .

 
Algorithm:
Perform the following steps for :

  1. Simulate from .

  2. Simulate conditional on and set .

  3. With probability continue to step 4. Otherwise perform early stopping: let and go to step 6.

  4. Simulate conditional on and .

  5. Set .

  6. Set .

Output:
A set of pairs of values.
 

Algorithm 1: Lazy ABC

Lazy ABC has the same target as standard ABC importance sampling, in the sense that the Monte Carlo estimates and converge to the same values for . This is proved by Theorem 1 and related discussion in Prangle:2014 . A sketch of the argument is as follows. Standard ABC is essentially an importance sampling algorithm: each iteration samples a parameter value from and assigns it a random weight given by (1). The randomness is due to the random simulation of data . The expectation of this weight conditional on is

where expectation is taken over values of .

Lazy ABC acts similarly but uses different random weights

(3)

The randomness here is due to simulation of and . Taking expectations gives:

From the theory of importance sampling algorithms with random weights (see Prangle:2014 ) this ensures that both algorithms target the same distribution.

This argument shows lazy ABC targets the same and quantities as standard ABC, for any choice of initial simulation stage and . However, for poor choices of these tuning decisions it may converge very slowly. The next section considers effective tuning.

2 Lazy ABC tuning

The quality of lazy ABC tuning can be judged by an appropriate measure of efficiency. Here this is defined as effective sample size (ESS) divided by computing time. The ESS for a sample with weights is

It can be shown Liu:1996 that for large

the variance of

typically equals that of independent samples. Computing time is taken to be the sum of CPU time for each core used (as the lazy ABC iterations can easily be performed in parallel.)

Theorem 2 of Prangle:2014 gives the following results on the choice of which maximises the efficiency of lazy ABC in the asymptotic case of large . For now let represent . Then the optimal choice of is of the following form:

(4)
(5)

Here is the expectation given of , the squared weight which would be achieved under standard ABC importance sampling; is the expected time for steps 4-6 of Algorithm 1 given ; is a tuning parameter that controls the relative importance of maximising ESS (maximised by ) and minimising computation time (minimised by ).

A natural approach to tuning in practice is as follows. The remainder of the section discusses these steps in more detail.

  1. Using Algorithm 1 with simulate training data . Here is the time to perform steps 1-3 of Algorithm 1 and is the time for steps 4-6.

  2. Estimate and from training data.

  3. Choose to maximise an efficiency estimate based on the training data.

  4. Decide amongst various choices of initial simulation stage (and , see below) by maximising estimated efficiency. By collecting appropriate data for these choices in step 1 it is not necessary to repeat it.

Step 2 is a regression problem, but is not feasible for as this will typically be very high dimensional. Instead can be based on low dimensional features of , referred to as decision statistics. That is, only functions of the form are considered, where outputs a vector of decision statistics. The optimal such is again given by (4) and (5). The choice of which decision statistics to use can be included in step 4 above.

Estimating by regression is also challenging if there are regions of space for which most of the responses are zero. This is typically the case for uniform . In Prangle:2014 various tuning methods were proposed for uniform but these are complicated and rely on strong assumptions. A simpler alternative used here is to use a normal as it has full support.

Local regression techniques Hastie:2009 are suggested for step 2. This is because the behaviour of the responses typically varies considerably for different values, motivating fitting separate regressions. Firstly, the typical magnitude of varies over widely different scales. Secondly, for both regressions the distribution of the residuals may also vary with . To ensure positive predictions, the use of degree zero regression is suggested i.e. a Nadaraya-Watson kernel estimator.

The efficiency estimate required in steps 3 and 4 can be formed from the training data and proposed choice of . Let be the values for the training data and be the values of . The realised efficiency of the training data is not used since it is based on a small sample size. Instead the asymptotic efficiency is estimated. Under weak assumptions (see Prangle:2014 ) this is , where random variable is the CPU time for a single iteration of lazy ABC. Note that is constant (the ABC approximation for the normalising constant ) under any tuning choices, so it is omitted. This leaves an estimate up to proportionality of which can be used to calculate efficiency relative to standard ABC (found by setting ). An estimate of is . Using (3) an estimate of is

3 Example

As an example the spatial extremes application of Erhardt:2012 is used. This application and the implementation of lazy ABC is described in full in Prangle:2014 . A short sketch is that the model of interest has two parameters . Given these, data can be generated for years and locations . These represent annual extreme measurements e.g. of rainfall or temperature. An ABC approach has been proposed including choices of and . Also, given data for a subset of locations an estimate of the ABC distance can be formed.

Simulation of data is hard to interrupt and later resume. However the most expensive part of the process is calculating the summary statistics, which involves calculating certain coefficients for every triple of locations. Therefore the initial simulation stage of lazy ABC is to simulate all the data and calculate an estimated distance based on a subset of locations , which is used as the decision statistic . The continuation stage is to calculate the coefficients for the remaining triples and return the realised distance.

Tuning of lazy ABC was performed as described in Section 2, using backwards selection in step 4 to find an appropriate subset of locations to use as . To fit the regressions estimating and a Nadaraya-Watson kernel estimator was used with a Gaussian kernel and bandwidth 0.5, chosen manually.

Repeating the example of Prangle:2014 , 6 simulated data sets were analysed using standard and lazy ABC. Each analysis used simulations in total. In lazy ABC of these were used for training. The results are shown in Table 1. The efficiency improvements of lazy ABC relative to standard ABC are of similar magnitudes to those in Prangle:2014 but are less close to the values estimated in step 3 of tuning.

Parameters Standard Lazy Relative efficiency Time (s) Time (s) ESS Estimated Actual 0.5 1 26.7 8.0 131.6 3.9 2.2 1 1 25.6 7.1 174.2 4.5 3.1 1 3 25.5 8.3 185.3 3.8 2.8 3 1 25.6 7.6 267.2 4.2 4.5 3 3 25.2 8.2 193.5 3.9 3.0 5 3 25.7 8.4 162.4 3.7 2.5

Table 1: Simulation study on spatial extremes. Each row represents the analysis of a simulated dataset under the given values of parameters and . In each analysis a choice of was made under standard ABC so that the ESS was 200, and the same value was used for lazy ABC. The lazy ABC output sample includes the training data, as described in Prangle:2014 . Also its computation time includes the time for tuning calculation (roughly 70 seconds). Iterations were run in parallel and computation times are summed over all cores used.

4 Conclusion

The paper has reviewed lazy ABC Prangle:2014 , a method to speed up ABC without introducing further approximations to the target distribution. Unlike Prangle:2014 , non-uniform ABC kernels have been considered. This allows a simpler approach to tuning, which provides a comparable three-fold efficiency increase in a spatial extremes example.

Several extensions to lazy ABC are described in Prangle:2014 : multiple stopping decisions, choosing after running the algorithm and a similar scheme for likelihood-based inference. Other potential extensions include using the lazy ABC approach in ABC versions of MCMC or SMC algorithms, or focusing on model choice.

References

  • [1] Dennis Prangle. Lazy ABC. Statistics and Computing, 2015. To appear (available at http://arxiv.org/abs/1405.7867).
  • [2] Jun S. Liu. Metropolized independent sampling with comparisons to rejection sampling and importance sampling. Statistics and Computing, 6(2):113–119, 1996.
  • [3] Mark A. Beaumont. Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution and Systematics, 41:379–406, 2010.
  • [4] Jean-Michel Marin, Pierre Pudlo, Christian P. Robert, and Robin J. Ryder. Approximate Bayesian computational methods. Statistics and Computing, 22(6):1167–1180, 2012.
  • [5] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer, 2009.
  • [6] Robert J. Erhardt and Richard L. Smith. Approximate Bayesian computing for spatial extremes. Computational Statistics & Data Analysis, 56(6):1468–1481, 2012.