Robust Bayesian Synthetic Likelihood via a Semi-Parametric Approach

09/16/2018 ∙ by Ziwen An, et al. ∙ 0

Bayesian synthetic likelihood (BSL) is now a well established method for performing approximate Bayesian parameter estimation for simulation-based models that do not possess a tractable likelihood function. BSL approximates an intractable likelihood function of a carefully chosen summary statistic at a parameter value with a multivariate normal distribution. The mean and covariance matrix of this normal distribution are estimated from independent simulations of the model. Due to the parametric assumption implicit in BSL, it can be preferred to its non-parametric competitor, approximate Bayesian computation, in certain applications where a high-dimensional summary statistic is of interest. However, despite several successful applications of BSL, its widespread use in scientific fields may be hindered by the strong normality assumption. In this paper, we develop a semi-parametric approach to relax this assumption to an extent and maintain the computational advantages of BSL without any additional tuning. We test our new method, semiBSL, on several challenging examples involving simulated and real data and demonstrate that semiBSL can be significantly more robust than BSL and another approach in the literature. We also identify an example where semiBSL does not provide sufficient flexibility, promoting further research in robustifying BSL.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Approximate Bayesian computation (ABC) is now a well-known method for conducting Bayesian statistical inference for stochastic simulation models that do not possess a computationally tractable likelihood function (

Sisson et al. (2018)). ABC by-passes likelihood evaluations by preferring parameter configurations that generate simulated data close to the observed data, typically on the basis of a carefully chosen summarisation of the data.

Although ABC has allowed practitioners in many diverse scientific to consider more realistic models, it still has several drawbacks. ABC is effectively a non-parametric procedure, and scales poorly with summary statistic dimension. Consequently, most ABC analyses resort to a low dimensional summary statistic to maintain a manageable level of computation, which could lead to significant information loss. Secondly, ABC requires the user to select values for various tuning parameters, which can impact on the approximation.

An alternative approach, called synthetic likelihood (SL, Wood (2010), Price et al. (2018) and Drovandi et al. (2018)), assumes that the model statistic for a given parameter value has a multivariate normal distribution. The mean and covariance matrix of this distribution are estimated via independent simulations of the statistic, which is used to approximate the summary statistic likelihood at the observed statistic. Price et al. (2018) developed the first Bayesian approach for SL, called Bayesian synthetic likelihood (BSL). The parametric assumption made by the SL allows it to scale more efficiently to increasing dimension of the summary statistic (Price et al. (2018)). Further, the BSL approach requires very little tuning and is ideal for implementing within parallel computing architectures compared with ABC.

The SL method has been tested successfully and shows great potential in a wide range of application areas such as epidemiology and ecology (Barbu et al. (2017), Price et al. (2018)). Further, Price et al. (2018) and Everitt (2017) demonstrate that BSL exhibits some robustness to a departure from normality. However, a barrier to the ubiquitous use of SL is its strong normality assumption. We seek an approach that maintains the efficiency gains of BSL but enhances its robustness.

The main aim of this paper is to develop a more robust approach to BSL. We do this by developing a semi-parametric method for approximating the summary statistic likelihood, called semiBSL, which involves using kernel density estimates for the marginals and combining them with a Gaussian copula. An incidental contribution of this paper is a thorough empirical investigation into the robustness of BSL. We demonstrate through several simulated and real examples that the semiBSL can offer significantly improved posterior approximations compared to BSL. Furthermore, we find that the number of model simulations required for semiBSL does not appear to increase significantly compared to BSL, and does not require any additional tuning parameters. However, we also find that the standard BSL approach can be remarkably robust on occasions. We also identify the limits of semiBSL by considering an example with non-linear dependence structures between summaries.

There have been other approaches developed for robustifying the synthetic likelihood. Fasiolo et al. (2018)

developed an extended empirical saddlepoint (EES) estimation method, which involves shrinking the empirical saddlepoint approximation towards the multivariate normal distribution. The regularisation helps ensure that their likelihood estimator is well-defined even when the observed statistic lies in the tail of the model summary statistic distribution. The shrinkage parameter (called the “decay”) needs to be selected by the user, and effectively represents a bias/variance trade-off. More shrinkage towards the Gaussian decreases variance but increases bias. Although

Fasiolo et al. (2018) consider frequentist estimation only, we demonstrate that it is straightforward to consider it within a Bayesian algorithm, allowing direct comparison with our new approach. We demonstrate how our approach tends to outperform the saddlepoint approach in terms of accuracy, at least in our test examples we consider, with less tuning.

Another alternative to synthetic likelihood is the approach of Dutta et al. (2017), which frames of the problem of estimating an intractable likelihood as ratio estimation. After choosing an appropriate density for the denominator, the likelihood is estimated via supervised binary classification methods where pseudo datasets drawn from the likelihood are labelled with a ‘1’ and pseudo datasets drawn from the denominator distribution are labelled with a ‘0’. The features of the classification problem are the chosen summary statistics and possibly transformations of them. Dutta et al. (2017)

use logistic regression for the classification task. This approach has the potential to be more robust than synthetic likelihood. However, we do not compare with this approach for several reasons. Firstly,

Dutta et al. (2017) consider point estimation, and extension to Bayesian MCMC is not trivial. Secondly, their method requires the practitioner to make several choices, making it difficult to perform a fair comparison. Finally, their approach needs to perform a potentially expensive penalised logistic regression for each proposed parameter, making it more valuable as the computational cost of model simulation increases.

Li et al. (2017) also use copula modelling within ABC. However, they use a Gaussian copula for approximating the ABC posterior for a high-dimensional parameter, whereas we use it for modelling a high-dimensional summary statistic.

The article is structured as follows. Section 2 introduces relevant background and previous research on BSL. Section 3 presents our new method, semiBSL, for robustifying BSL. Section 5 compares the performance of BSL and semiBSL with four simulated examples and one real data example with models of varying complexity and dimension. Section 6 concludes the paper, points out the limitations of our approach and directions for future work.

2 Bayesian Synthetic Likelihood

In ABC and BSL, the objective is to simulate from the summary statistic posterior given by

where is the parameter that requires estimation with corresponding prior distribution . Here, is the observed data that is subsequently reduced to a summary statistic where is the summary statistic function. The dimension of the statistic must be at least the same size as the parameter dimension, i.e. .

The SL (Wood (2010)) involves approximating with

(1)

The mean and covariance and are not available in closed form but can be estimated via independent model simulations at . The procedure involves drawing , where for , and calculating the summary statistic for each dataset, , where is the summary statistic for . These simulations can be used to estimate and unbiasedly

(2)

The estimates in (2) can be substituted into the SL in (1) to estimate the SL as . Theoretically, the corresponding MCMC algorithm targets the following approximate posterior

(3)

where

As we obtain . For finte , is a biased estimate of , thus resulting in being theoretically dependent on (Andrieu and Roberts, 2009). However, Price et al. (2018) demonstrate empirically that the BSL posterior in (3) is remarkably insensitive to its only tuning parameter, . Thus we can choose to maximise computational efficiency. We can sample the BSL posterior using MCMC, see Algorithm 1.

Input : Summary statistic of the data, , the prior distribution, , the proposal distribution , the number of iterations, , and the initial value of the chain .
Output : MCMC sample from the BSL posterior, . Some samples can be discarded as burn-in if required.
1 Simulate and compute
2 Compute using (2)
3 for  to  do
4       Draw
5       Simulate and compute
6       Compute using (2)
7       Compute
8       if  then
9             Set and
10            
11      else
12             Set and
13            
14       end if
15      
16 end for
Algorithm 1 MCMC BSL algorithm.

The computational efficiency of BSL over ABC stems from the normality assumption. This approximation can work well in many applications. However, there are also several reasons to suggest why it may not be appropriate. For a given parameter value, there may be model summary statistics with non-normal marginal distributions and non-linear dependencies between marginals. Further, the normality assumption may not be appropriate over all the non-negligible posterior mass. Finally, as the dimension of the summary statistic grows it is likely that the multivariate normal assumption becomes less reasonable.

In the next section we develop a method to increase the robustness of BSL, whilst largely retaining its computationally convenient properties.

3 Semi-Parametric Approach to Bayesian Synthetic Likelihood

Here we propose to use copula models to approximate the joint distribution of summary statistics. The main appeal of copula models is the ability to model marginal distributions and joint dependence structures separately. In this paper we use kernel density estimates (KDEs,

Rosenblatt (1956), Parzen (1962)) for modelling each marginal. KDEs are useful as they can be flexible and are non-parametric, eliminating the need for the user to select specific parametric forms for the marginals. However, we do note that other univariate density estimators could be used in our approach. After transforming the marginals based on the fitted KDEs, we attempt to capture the dependence structure between summaries by using a Gaussian copula. The appeal of the Gaussian copula is its tractability, which makes it ideal for its repeated use within a Bayesian algorithm. The density estimator we use is similar to the non-paranormal approach of Liu et al. (2009).

Of course, the standard SL approach is a special case of ours when the marginal distributions are assumed normal. Given the semi-parametric nature of the density estimator we use, we refer to our approach as semiBSL. We now describe the technical details of our method.

Kernel Density Estimation

We model each univariate marginal distribution of the joint summary statistic using a KDE. A KDE of an unknown distribution

for continuous random variable

can be defined based on independent and identically drawn samples as

where is the kernel function, which has non-negative support over its domain and integrates to , i.e. . There are a variety of kernels that might be chosen. Here we use the Epanechnikov kernel (Epanechnikov (1969)), which is defined by a parabolic function

The Epanechnikov kernel is optimal in the sense that it minimises the asymptotic mean integrated squared error (AMISE) of the approximation (Epanechnikov (1969)). We note that other kernels could be selected especially if the user wants a univariate density estimator that is non-zero everywhere. The smoothness of the KDE is controlled by the bandwidth parameter . Unlike standard ABC which pre-specifies the bandwidth parameter to use in approximating the joint summary statistic likelihood, we use the optimal smoothing parameter (in terms of AMISE) when the true underlying density is normal, i.e. , where

is the standard deviation of the distribution, which can be estimated empirically (

Bowman and Azzalini (1997)

). We use the same bandwidth for cumulative distribution function (CDF) estimation later but it is possible to use a different bandwidth for the CDF if desired.

Copula

Suppose a random vector

has continuous marginals. Sklar’s theorem (Sklar (1959)) states that the multivariate CDF of , , can be uniquely defined by its marginal CDFs, denoted, , and its dependence structure captured with a multivariate CDF called a copula function , where

and are assumed to follow a uniform distribution. A convenient choice of copula in many applications is the Gaussian copula, which has the following probability density function

where is a

-dimensional identity matrix,

, is the inverse cumulative distribution function of the standard normal distribution. The sole parameter of the Gaussian copula is a correlation matrix.

Once the parameter is estimated, our semiBSL approach estimates the summary statistic likelihood by

(4)

where is an estimate of the correlation matrix, is the th component of , is the estimated KDE of the th marginal probability density, , , and .

Gaussian rank correlation

The standard approach to estimate the Gaussian copula parameter is to compute the sample correlation matrix based on the collection where and . However, this approach for estimating relies on each of the KDEs providing a very good fit to the actual distribution of the corresponding marginal. Although KDE models are flexible, large sample sizes are often required to capture strong irregularities, even in univariate distributions.

To obtain a procedure that is more robust to a potential lack of fit of the KDEs, we consider a non-parametric estimator using the Gaussian rank correlation (GRC, Boudt et al. (2012)). Using the notation defined earlier, the th component of the GRC estimate is given by

(5)

where , where , is the rank function. The GRC is not expensive to compute and has two favourable properties. Boudt et al. (2012)

show that the GRC is robust to a small amount of outliers. In fact, it requires at least

of data contamination to revert a positive linear correlation. In addition, the estimated correlation matrix by GRC of a multivariate normal distribution is always semi-positive definite.

The full procedure to estimate the summary statistic likelihood at the point using our semiBSL approach is shown in Algorithm 2. Importantly, our method does not require any additional tuning parameters compared with BSL. Replacing the Gaussian estimate with the semi-parametric estimate (4) in Algorithm 1 gives MCMC semiBSL.

Input : Collection of simulated summary statistics based on parameter , , and the summary statistic of the observed data, .
Output : semiBSL estimate
1 for  to  do
2       Estimate and based on the th column of using a kernel density estimation.
3      
4 end for
5for  to  do
6       for  to  do
7             Compute rank and for .
8             Estimate using the ranks above with equation (5).
9            
10       end for
11      
12 end for
Construct and compute with equation (4).
Algorithm 2 The semi-parametric procedure that we use to estimate the summary statistic likelihood in our semiBSL method.

4 Computational Efficiency Example

Here we provide a simple example to gain some insight into the computational efficiency of the semi-parametric estimator compared to the standard synthetic likelihood. We assume that observed data have been drawn from a Gaussian, where and is the dimension of . To disturb the normality assumption we consider the following transformation of each data component

(6)

where and

control the level of skewness and kurtosis, respectively. We take the transformed data as our summary statistic. Figure

1 shows the marginal distribution of any individual data point with six pairs of and . Note that corresponds to no transformation. Based on Figure 1, the EES seems to provide a reasonable approximation in the presence of significant skewness, but performs less well when there is significant kurtosis. Note that these results are based on simulations from the underlying density, producing relatively small decay values. Despite this, the EES approximation does not perform as well as KDE for this simulation size.

Figure 1: Univariate standard normal distribution transformed with the transformation in (6) applied for various combinations of and . The black solid line represents the true density. The rest indicate KDE, EES and Gaussian approximate distributions based on simulations of the underlying distribution (see labels in the legend). The decay for EES is selected based on the simulations (from left to right, top to bottom, the decay values are 0.02, 0.01, 0.01, 0.01, 0.044 and 0.039).

Here we compute 500 independent estimates of the semi-parametric and standard log synthetic likelihoods with observed datasets generated using and where the th element of is given by . We consider all combinations of and , with in all cases. Values of and indicate tails lighter and heavier than the normal distribution, respectively. The value produces a very heavy tailed distribution.

It is easy to show that the true log-likelihood for a given is given by

where the second term is the log of the Jacobian of the transformation.

The estimated standard deviation and bias of the estimated synthetic likelihoods (both semi-parametric and standard approach) for different combinations of and (the number of simulations used in the synthetic likelihood estimator) are shown in Figure 2 and Figure 3, respectively. From row 1 of these figures, it is surprising that the semi-parametric estimator performs almost as well as the synthetic likelihood despite the normality assumption being perfect (). We are effectively losing nothing using non-parametric kernel estimates of the marginals instead of the true normal marginals. It is likely that the variance of the standard synthetic likelihood estimator is dominated by having to estimate a high-dimensional covariance matrix.

The next two rows of Figure 2 and Figure 3 are for when the true marginals have lighter tails than the normal distribution. It is interesting that the semi-parametric approach gives a smaller variance and bias compared to the standard approach. The lack of normality is contributing to the relative poor performance of the standard synthetic likelihood estimator.

The final two rows of Figure 2 and Figure 3 are for when the true marginals have heavier tails than the normal distribution. Despite the lack of normality, the standard estimator appears to have a smaller variance and bias compared to the semi-parametric approach. Upon further investigation, we find that the high variance is due to the fact that when data is generated with heavy tails, an observed component of the data can fall out in the tail of the distribution. It can fall so far out that all simulated values are either less than or greater than the observed component. Given that kernels used in kernel density estimation do not have heavy tails, the marginal density based on the KDE can be severely underestimated, and numerically equal to , as occurred for and in the bottom left panel of Figure 2. This issue is generally alleviated with larger

, but can still inflate the variance. It is important to note that the odd occurrence of massively underestimated log-likelihoods will have limited impact on the computationally efficiency of BSL. These will simply be rejected by the MCMC algorithm; it is the overestimated log-likelihoods that substantially decrease efficiency as it can lead to sticky behaviour in the MCMC chain.

Figure 2: Estimated standard deviations of the semi-parametric (red dash) and standard (blue solid) log synthetic likelihoods for the toy example. The x-axis denotes the number of simulations used to estimate the log-likelihood and the y-axis denotes the estimated standard deviation of the estimated log-likelihood. Columns from left to right are for and , respectively.
Figure 3: Estimated bias of the semi-parametric (red dash) and standard (blue solid) log synthetic likelihoods for the toy example. The x-axis denotes the number of simulations used to estimate the log-likelihood and the y-axis denotes the estimated bias of the estimated log-likelihood. Columns from left to right are for and , respectively.

This simple example demonstrates that the semi-parametric estimator can still maintain the computational benefits that BSL has over ABC. The semiBSL approach may even be more computationally efficient than BSL in situations of non-normality. However, we are yet to explore the impact of non-normality on the quality of posterior approximations. We aim to do that in the next section.

5 Applications

In this section, we investigate the performance of semiBSL on five examples. All the examples have at least one noticeable non-normal marginal summary statistic regardless of the joint distribution. We also look into an example with strong non-linear dependencies between summaries, which challenges the Gaussian copula assumption in semiBSL. We also present some of the marginal distributions of summary statistics in each example to give an impression of the non-normalities that can be encountered. For these illustrations, the marginal distributions are approximated with KDE based on sufficiently large number of simulations at a point estimate of .

We compare semiBSL with BSL and also a Bayesian version of the EES approach of Fasiolo et al. (2018). The EES method requires specification of an additional tuning parameter (the “decay” parameter), which mutates the EES between an empirical saddlepoint approximation (decay equals 0) and a multivariate normal distribution (decay approaches infinity). We use the cross validation method suggested by Fasiolo et al. (2018) to select the decay, which is also available in the associated R package.

As shown in Price et al. (2018) and An et al. (2018), BSL is remarkably insensitive to in a variety of applications. We tested several different values of , and found that semiBSL appears to inherit this useful property. Therefore, for the results presented below, is chosen to achieve close to optimal computational efficiency.

We use MCMC with a multivariate normal random walk to sample from the approximate posterior distribution in all methods. Since the different approaches may result in different posterior approximations, we tune the covariance matrix of the random walk separately. We use pilot runs until we believe we have a reasonable sample from the approximate posterior to inform the random walk covariance matrix that we use in the main MCMC run. In some examples, we use transformations so that the random walk takes place in an unconstrained parameter space.

5.1 Ma(2)

The moving average (MA) time series model is a popular toy example (e.g. Chiachio et al. (2014)) in ABC-related research topics for its simplicity of simulation and availability of the likelihood function. We consider an MA(2) model with the following evolution for the data

for , where , . Constraints for are . Here the likelihood function is multivariate normal with , , , with all other covariances equal to . Here the observed dataset is simulated with parameters and .

We do not perform any data summarisation here, i.e. the summary statistic is the full dataset. Given the likelihood is multivariate normal, standard BSL will perform well. To explore its performance to lack of normality, we apply transformation (6) used in Section 4 on the original data to add some degree of skewness and kurtosis. We take the transformed data as our summary statistic.

We tested the following four different scenarios, no transformation, skewness only, kurtosis only and both skewness and kurtosis. In the last experiment, we use different transformation parameters for each marginal summary statistic to aggravate estimation difficulty, i.e. , , where , . We use both univariate density plots (Figure 4) and bivariate scatter plots (Figure 5) to visualise the posterior results. The choice of is shown in Table 1. We also use the total variation distance to quantify the disparity between the approximated posterior distribution and the “true” posterior obtained with a very long MCMC run with the exact likelihood, i.e. . This can be estimated via numerical integration with a 2-D grid covering most of the posterior mass. The values are shown in Table 2.

The figures and table indicate that semiBSL posteriors are robust to all transformed summary statistics, while BSL fails to get close to the true posterior. The results also suggest that the EES provides little robustness in this example. We also tested the EES method with a much larger , here , for the randomised dataset to see if the posterior accuracy improves. With the decay parameter is reduced to 45 (roughly one third of the value obtained with ). We find that even with significant additional computational effort, the EES posterior approximation shows little improvement.

It is important to note that the dependence structure in the data (after back-transformation) is Gaussian. Therefore the Copula assumption made by semiBSL is correct in this example up to estimation of the marginals, which are done using KDE without knowledge of true marginals. Thus, we would expect semiBSL to provide good posterior approximations here and it is necessary to test its performance in other examples. However, the example does serve to illustrate that BSL can be impacted by non-normality.

Figure 4: Estimated marginal approximate posteriors (BSL in the first row, semiBSL in the second row and EES in the third row) for the MA(2) example using different pairs of transformation parameters for the summary statistics.
Figure 5: Bivariate scatter plots of the posterior distributions of the MA(2) example using different pairs of transformation parameters for the summary statistic. The overlaid contour plots are given by the “true” posterior.
random and
BSL 500 500 300 300
semiBSL 500 500 500 500
EES 500 750 300 300
Table 1: Choices of of BSL, semiBSL and EES posteriors of the MA(2) example.
random and
BSL 0.03 0.40 0.40 0.49
semiBSL 0.04 0.17 0.09 0.09
EES 0.04 0.43 0.25 0.55
Table 2: Total variations of BSL, semiBSL and EES posteriors to the “true” posterior of the MA(2) example (smaller is better).

5.2 M/g/1

The M/G/1 (single server queueing system with Poisson arrivals and general service times) queuing model has been investigated previously in the context of ABC by Blum and François (2010) and Fearnhead and Prangle (2012). Here the observed data, are inter-departure times from customers. In this model the likelihood is cumbersome to calculate whereas simulation is trivial. The distribution of service time is uniform , and the distribution of inter-arrival time is exponential with rate parameter . The parameter of interest is . The prior distribution for is . Observed data are generated from the model with true parameter .

Here we take the log of original data as our summary statistic. Figure 6 illustrates the distributions of the first three summary statistics based on the true parameter value, which are similar to the rest as the queue has a steady state and little transient behaviour. The shown distribution has an interesting and uncommon non-symmetric curve. As can be seen in the figure, the synthetic likelihood does not approximate well near the sharp mode. The marginal posterior are shown in Figure 7, where the “true” posterior distribution is obtained with the Bayesian approach given in Shestopaloff and Neal (2014). The value of for each approach is shown in the posterior plot. The bivariate scatterplot of posteriors are shown in Figure 8, with overlaying contour of the “true” distribution. It is evident that both BSL and EES exhibit an “L” shape in the bivariate posterior and are inaccurate compared to the “true” posterior. The semiBSL posterior is not as precise as the “true” posterior, but provides reasonable estimates of the posterior means.

Figure 6: Distribution of the summary statistics of the M/G/1 example. The solid black line indicates the KDE with a large number of simulations at a point estimate of . The red dashed line is given by Gaussian estimation using sample mean and variance. Location of the observed data is marked on x-axis in dark green.
Figure 7: Approximate marginal posterior distributions of the M/G/1 example. The solid black line shows the “true” posterior distribution. The number of simulation used for each approach is given in the legend.
Figure 8: Bivariate scatter and contour plots of the posterior distributions of the M/G/1 example. The scatter plot is generated with a thinned approximate sample obtained by different approaches. The contour plot is drawn based on the “true” posterior distribution.

5.3 Stereological Extremes

During the process of steel production, the occurrence of microscopic particles, termed as an inclusion, is a critical measure of the quality of steel. It is desirable that the inclusions are kept under a certain threshold, since steel fatigue is believed to start from the largest inclusion within the block. Direct observation of -dimensional inclusions is inaccessible, so that inclusion analysis typically relies on examining -dimensional planar slices. Anderson and Coles (2002) establish a mathematical model to formulate the relationship between observed cross-section samples, , and the real diameter of inclusions, , assuming that the inclusions are spherical. The model focuses on large inclusions, i.e. , where is a certain threshold, which is endowed with a generalised Pareto distribution such that

where and are the scale and shape parameters, respectively, and . The inclusions are mutually independent and locations of them follow a homogeneous Poisson process with rate parameter . While the spherical model possesses a likelihood function that is easily computable, the spherical assumption itself might be inappropriate. This leads to the ellipsoidal model proposed by Bortot et al. (2007), who use ABC for likelihood-free inference due to the intractable likelihood function that it inherits. The new model assumes that inclusions are ellipsoidal with principal diameters , where is the largest diameter without loss of generality. Here and , where and are independent uniform random variables. The observed value is the largest principal diameter of an ellipse in the -dimensional cross-section.

Here we consider the ellipsoidal model with parameter of interest . The prior distribution is . We consider four summary statistics: the number of inclusions, and the minimum, mean and maximum inclusion. Figure 9 shows the distribution of the chosen summary statistic simulated at a point estimate of . The last three summary statistics have a significantly heavy right tail, strongly invalidating the normality assumption of BSL.

Figure 9: Distribution of the summary statistics of the stereological extreme example. The solid black line indicates the KDE with a large number of simulations at a point estimate of . The red dashed line is given by Gaussian estimation using sample mean and variance. Location of the observed data is marked on x-axis in dark green.

Figure 10 shows the marginal posterior distributions from three BSL runs, one semiBSL run and one EES run. Note that the three BSL posteriors were generated with identical input arguments, however the results differ substantially. On the contrary, semiBSL results are consistent with multiple runs (results not shown). It is worth pointing out that MCMC BSL is having trouble exploring the full parameter space due to the rarely occurring outliers shown in the boxplot (Figure 11) below, but the current result is enough to show the difference in posterior approximation. The surprisingly unstable behaviour of BSL can be explained in the boxplot, where each column represents a collection of Gaussian synthetic log-likelihood values based on a different parameter value and number of simulations. The first column uses a parameter value that should have relatively high posterior support and a medium number of simulations, the second column uses a parameter value that should have negligible posterior support (referred to here as “poor”) and a medium number of simulations, and the last column uses a poor parameter value and a large number of simulations. It is worth noting that semiBSL (with Epanechnikov kernel) always produces negative infinity log-likelihood estimates for simulations at the poor parameter value. Figure 11 indicates that the standard synthetic likelihood may still be estimated to be high at the poor parameter value. Thus the BSL run and show accepted samples in tails. The bivariate scatterplot (Figure 12) provides a clearer look at the posteriors, where roughly one sixth of accepted samples can be categorised as outliers. The EES has less but still a noticeable number of outliers.

Figure 10: Approximate posterior distributions of the stereological extreme example. The number of simulation used for each approach is given in the legend.
Figure 11: Boxplot of the standard synthetic log-likelihoods of the stereological extreme example.
Figure 12: Bivariate scatter plots of the posterior distributions of the stereological extreme example. The scatter plot is generated with thinned approximate sample by BSL (run no. in Figure 10), semiBSL and EES approach.

5.4 Fowler’s Toads

Movements of amphibian animals exhibit patterns of site fidelity and long-distance dispersal at the same time. Modelling such patterns helps in understanding amphibian’s travel behaviour and contributes to amphibian conservation. Marchand et al. (2017) develop an individual-based model to a species called Fowler’s Toads (Anaxyrus fowleri), and collected data via radiotracking in Ontario, Canada. The comprehensive experimenting and modelling details are stated in the original paper. Here we only present a brief description of the model.

The model assumes that a toad hides in its refuge site at daytime and moves to a random chosen foraging place at night. After its geological position being collected via transmitter, the toad either takes refuge at the current location or return to one of the previous sites. For simplicity, the refuge locations are projected to a single axis, thus can be represented with single-dimension spatial process. GPS location data are collected on toads for days, i.e. the observation matrix is of dimension . For the synthetic data we use here, , , and missingness is not considered. Then is summarised down to sets comprising the relative moving distances for time lags of days. For instance, consists of the displacement information of lag day, .

Simulation from the model involves two distinct processes. For each single toad, we first generate an overnight displacement, , then mimic the returning behaviour with a simplified model. The overnight displacement is deemed to have significant heavy tails, assumed to belong to the Lévy-alpha stable distribution family, with stability parameter and scale parameter . This distribution has no closed form, while simulation from it is straightforward (Chambers et al. (1976)), making simulation-based approaches appealing. The original paper provided three returning models with different complexity, we adopt the random return model here as it has the best performance among the three and is the easiest for simulation. The total returning probability is a constant , if a return occurs on day , then the return site is the same as the refuge site on day , where is selected randomly from with equal probability. Here we take the observed data as synthetically generated with true parameter . We use a uniform prior over here.

KDEs of are shown in Figure 13

. We fit a four-component Gaussian mixture model to each set of

. As the summary statistic we use the 11-dimensional score of this fitted auxiliary model (corresponding to three component weights, four means and four standard deviations). This is the indirect inference approach for selecting summary statistics (see Drovandi et al. (2011), Gleim and Pigorsch (2013) and Drovandi et al. (2015)). Accommodating the four different lags, there are summary statistics in total. Figure 14 shows the marginal distributions of , simulated with the true value of .

The scores in Figure 14 shows rough normality, thus standard BSL may be suitable for this model. To further explore the robustness of BSL and test our semiBSL approach, we include a power transformation of to push the irregularity further. The transformation function is given by . It retains the sign of the input and creates a sharp peak near . We show the density plot of summary statistics with transformation power (Figure 15). It is clear that the transformation can disturb the normality significantly.

In Figure 16 and 17, we show the posterior approximations produced by different approaches and transformation parameters. Figure 16 compares the results between different approaches at the same transformation parameter, and Figure 17 compares the posteriors with different transformation powers within the same posterior approximation approach. All the posteriors seem to be reasonably close at power (no transformation) and , however, the BSL posterior shifts away from the true parameter value when . The semiBSL and EES remain relatively robust at such an aggressive transformation.

Figure 13: Density plot of for lags of days of the Fowler’s toads example.
Figure 14: Distribution of the score summary statistics of the Fowler’s toads example. The solid black lines indicate the KDE from a large number of simulations at a point estimate of . The red dashed lines are given by Gaussian estimates using sample mean and covariance. Locations of the observed dataset are marked on x-axis in dark green.
Figure 15: Distribution of the transformed summary statistics (power=) of the Fowler’s toads example. The solid black lines indicate the KDE from a large number of simulations at a point estimate of . The red dashed lines are given by Gaussian estimates using sample mean and covariance. Locations of the observed dataset are marked on x-axis in dark green.
Figure 16: Approximate posterior distributions by BSL, semiBSL and EES of the Fowler’s toads example using scores (first row), transformed scores with (second row) and transformed scores with (third row). The vertical line indicate the true parameter value.
Figure 17: Comparing the approximate posterior distributions for each approach at different transformation powers of the Fowler’s toads example. The vertical line indicate the true parameter value.

5.5 Simple Recruitment, Boom and Bust

Here we consider an example that tests the limits of our semiBSL method. The simple recruitment, boom and bust model was used in Fasiolo et al. (2018) to investigate the performance of the saddlepoint approximation to a non-normal summary statistic. This is a discrete stochastic temporal model that can be used to represent the fluctuation of the population size of a certain group over time. Given the population size and parameter , the next value follows the following distribution

where is a stochastic term. The population oscillates between high and low level population sizes for several cycles. The true parameters are , , and , and the prior distribution is . There are values in the observed data, denote as . We use burn-in values to remove the transient phase of the process.

We construct the summary statistics as follows. Consider a dataset , define the differences and ratios as and . We use the sample mean, variance, skewness and kurtosis of , and as our summary statistic, . We also tested the statistics used in Fasiolo et al. (2018) but we found our choice to be far more informative about the model parameters.

The parameter seems to have a strong impact on the model statistic distribution. Small values of tend to generate statistics that are highly non-normal so we consider such a case here.

Kernel density plots of the -dimensional summary statistic (based on the true parameter value) are shown in Figure 19. None of the chosen summary statistics are close to normal. To gain some insight into the dependence structure between summaries, we consider bivariate scatterplots of the summaries, shown in Figure 18. It is clear that there is a high degree of non-linear dependence between many of the summaries, which cannot be captured by our Gaussian copula. Therefore this example is highly challenging for our semiBSL approach.

Posterior distributions by BSL, semiBSL, EES and ABC are shown in Figure 20. The values of are given in the legend. We use sequential Monte Carlo as the sampling method for ABC, specifically the algorithm of Drovandi and Pettitt (2011) given that the algorithm is highly automated. We also consider regression adjustment to limit the impact of the ABC tolerance but found it has little effect (results not shown). Given the relative low-dimensionality of the summary statistic we take the ABC approximation as the gold standard in this example. Both the BSL and semiBSL posteriors do not appear to be close to the ABC approximation. Although semiBSL is slightly closer to ABC, it is likely that the strong non-linear dependencies between the summaries is negatively impacting on semiBSL. This example also challenges EES, since the MCMC chain in the EES approach struggles with over-estimated log-likelihood. Therefore in this example we have to use a very large number of simulations, e.g. , to limit the occurrence of over-estimated log-likelihood and achieve a tolerable acceptance probability in EES. This example highlights further need to develop robust BSL methods.

Figure 18: Scatterplot of the bivariate summary statistics of the simple recruitment, boom and bust example.
Figure 19: Distribution of the summary statistics of the simple recruitment, boom and bust example. The solid black line indicates the KDE with a large number of simulations at a point estimate of . The red dashed line is given by Gaussian estimation using sample mean and variance. Location of the observed data is marked on x-axis in dark green.
Figure 20: Approximate marginal posterior distributions for the simple recruitment, boom and bust example. The tolerance used in SMC ABC is . The vertical lines indicate the true parameter values. The number of simulation used for each approach is given in the legend.
Figure 21: Bivariate scatter and contour plots of the posterior distributions of the simple recruitment, boom and bust example. The scatter plot is generated with thinned approximate sample by BSL or semiBSL or EES method. The contour plot is drawn with approximate sample by SMC ABC with tolerance . Only three pairs of the parameters are shown here, vs , vs and vs .

6 Conclusion and Future Work

In this paper, we proposed a new method to relax the normality assumption in BSL and presented several examples of varying complexity to test the empirical performance of BSL and semiBSL. The new approach offered additional robustness in several of the considered examples. Further, given the semi-parametric nature of the method, the computational gains of the fully parametric BSL are largely retained in terms of the number of model simulations required. When model simulations per iteration required by BSL is non-negligible, then the additional cost incurred by semiBSL will be small. Estimating marginal KDEs and the Gaussian rank correlation matrix is relatively straightforward.

However, we did observe situations where standard BSL was remarkably robust to lack of normality, which is consistent with some previous literature including Price et al. (2018) and Everitt (2017). Developing some theory around when we can expect standard BSL to work well would be useful.

Previous BSL research (Price et al., 2018) showed that the approximate posterior is very insensitive to the choice of . Surprisingly, we also found the semiBSL posterior to be relatively insensitive to , albeit not as insensitive as BSL. We expect semiBSL to be less insensitive to since the choice of is more likely to impact kernel density estimates compared to the Gaussian synthetic likelihood. Fortunately, the semiBSL results are largely insensitive to the choice of in our examples.

The new approach was also compared with another robustified synthetic likelihood, the EES (Fasiolo et al., 2018). Because of potential numerical issues with the standard empirical saddlepoint approximation, the EES has to resort to a tuning parameter called the decay, which shifts the estimation between a flexible saddlepoint one and a rigid normal distribution. For roughly the same number of simulations, the posterior approximation by EES generally shows some improvement over BSL but not as accurate as semiBSL in the examples tested.

One limitation of our approach is that it resorts to kernel density estimates of the marginals, which we found can result in some numerical instability when the observed statistic lies in the tail of the model statistic distribution. Furthermore, if the true underlying marginal distribution of a statistic is highly irregular, then the number of simulations required for a KDE to capture this will be large. We point out that we only require estimates of the marginal distributions of the summary statistics at the observed statistic values, rather than estimating the entire marginal distribution. Other approaches for estimating the marginals, such as the EES developed for the joint density in Fasiolo et al. (2018), or adaptive kernel density estimators such as balloon estimator and sample point estimator (e.g. Terrell and Scott (1992)) may provide more stability and require less model simulations. We plan to explore this in future work.

Given the above numerical instability of our synthetic likelihood estimator, we recommend that the practitioner firstly find and initialise the MCMC at a parameter that produces a summary statistic distribution that has reasonable support for the observed statistic. We note that the standard BSL can also exhibit slow convergence when initialised in the tail of the posterior (Price et al., 2018).

There is still research to be done in improving the robustness of BSL. Our semiBSL method relies on the assumption of Gaussian copula structure. The results in the boom and bust example show a compromised performance of semiBSL when there exists strong non-linear dependence structures between summary statistics. For future work, we plan to investigate other more flexible copula structures such as vine copulas (Bedford and Cooke (2002)).

It would be possible to incorporate the semiBSL likelihood estimator into the variational Bayes synthetic likelihood approaches of Ong et al. (2018)

to speed up computation for a high dimensional statistic and/or parameter.

Overall, we have demonstrated that our semiBSL approach can provide a significant amount of robustness relative to BSL with little or no additional computational cost in terms of the number of model simulations, and requires no additional tuning parameters.

7 Ackowledgements

CD was supported by an Australian Research Council’s Discovery Early Career Researcher Award funding scheme (DE160100741). ZA was supported by a scholarship under CDs grant DE160100741 and a top-up scholarship from the Australian Research Council Centre of Excellence for Mathematical and Statistics Frontiers (ACEMS). DJN was supported by a Singapore Ministry of Education Academic Research Fund Tier 1 grant (R-155-000-189-114). Computational resources and services used in this work were provided by the HPC and Research Support Group, Queensland University of Technology, Brisbane, Australia.

References

  • An et al. (2018) An, Z., Nott, D. J., and Drovandi, C. C. (2018). Accelerating Bayesian synthetic likelihood with the graphical lasso. https://eprints.qut.edu.au/102263/.
  • Anderson and Coles (2002) Anderson, C. and Coles, S. (2002). The largest inclusions in a piece of steel. Extremes, 5(3):237–252.
  • Andrieu and Roberts (2009) Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. The Annals of Statistics, 37(2):697–725.
  • Barbu et al. (2017) Barbu, C. M., Sethuraman, K., Billig, E. M. W., and Levy, M. Z. (2017). Two-scale dispersal estimation for biological invasions via synthetic likelihood. Ecography, 41(4):661–672.
  • Bedford and Cooke (2002) Bedford, T. and Cooke, R. M. (2002). Vines–a new graphical model for dependent random variables. Ann. Statist., 30(4):1031–1068.
  • Blum and François (2010) Blum, M. G. B. and François, O. (2010).

    Non-linear regression models for approximate Bayesian computation.

    Statistics and Computing, 20(1):63–73.
  • Bortot et al. (2007) Bortot, P., Coles, S. G., and Sisson, S. A. (2007). Inference for stereological extremes. Journal of the American Statistical Association, 102(477):84–92.
  • Boudt et al. (2012) Boudt, K., Cornelissen, J., and Croux, C. (2012). The Gaussian rank correlation estimator: robustness properties. Statistics and Computing, 22(2):471–483.
  • Bowman and Azzalini (1997) Bowman, A. W. and Azzalini, A. (1997). Applied smoothing techniques for data analysis: the kernel approach with S-Plus illustrations, volume 18. OUP Oxford.
  • Chambers et al. (1976) Chambers, J. M., Mallows, C. L., and Stuck, B. W. (1976). A method for simulating stable random variables. Journal of the American Statistical Association, 71(354):340–344.
  • Chiachio et al. (2014) Chiachio, M., Beck, J., Chiachio, J., and Rus, G. (2014). Approximate Bayesian computation by subset simulation. SIAM Journal on Scientific Computing, 36(3):A1339–A1358.
  • Drovandi et al. (2018) Drovandi, C. C., Grazian, C., Mengersen, K., and Robert, C. (2018). Handbook of Approximate Bayesian Computation, chapter Approximating the Likelihood in ABC, pages 319–367. Taylor & Francis.
  • Drovandi and Pettitt (2011) Drovandi, C. C. and Pettitt, A. N. (2011). Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics, 67(1):225–233.
  • Drovandi et al. (2011) Drovandi, C. C., Pettitt, A. N., and Faddy, M. J. (2011). Approximate Bayesian computation using indirect inference. Journal of the Royal Statistical Society: Series C (Applied Statistics), 60(3):317–337.
  • Drovandi et al. (2015) Drovandi, C. C., Pettitt, A. N., and Lee, A. (2015). Bayesian indirect inference using a parametric auxiliary model. Statistical Science, 30(1):72–95.
  • Dutta et al. (2017) Dutta, R., Corander, J., Kaski, S., and Gutmann, M. U. (2017). Likelihood-free inference by ratio estimation. arXiv preprint arXiv:1611.10242v3.
  • Epanechnikov (1969) Epanechnikov, V. A. (1969). Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications, 14(1):153–158.
  • Everitt (2017) Everitt, R. G. (2017). Bootstrapped synthetic likelihood. arXiv preprint arXiv:1711.05825v2.
  • Fasiolo et al. (2018) Fasiolo, M., Wood, S. N., Hartig, F., and Bravington, M. V. (2018). An extended empirical saddlepoint approximation for intractable likelihoods. Electronic Journal of Statistics, 12(1):1544–1578.
  • Fearnhead and Prangle (2012) Fearnhead, P. and Prangle, D. (2012). Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74(3):419–474.
  • Gleim and Pigorsch (2013) Gleim, A. and Pigorsch, C. (2013). Approximate Bayesian computation with indirect summary statistics. Draft paper: http://ect-pigorsch. mee. uni-bonn. de/data/research/papers.
  • Li et al. (2017) Li, J., Nott, D., Fan, Y., and Sisson, S. (2017). Extending approximate Bayesian computation methods to high dimensions via a Gaussian copula model. Computational Statistics & Data Analysis, 106:77 – 89.
  • Liu et al. (2009) Liu, H., Lafferty, J., and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs.

    Journal of Machine Learning Research

    , 10:2295–2328.
  • Marchand et al. (2017) Marchand, P., Boenke, M., and Green, D. M. (2017). A stochastic movement model reproduces patterns of site fidelity and long-distance dispersal in a population of Fowler’s toads (Anaxyrus fowleri). Ecological Modelling, 360:63 – 69.
  • Ong et al. (2018) Ong, V. M. H., Nott, D. J., Tran, M.-N., Sisson, S. A., and Drovandi, C. C. (2018). Variational Bayes with synthetic likelihood. Statistics and Computing, 28(4):971–988.
  • Parzen (1962) Parzen, E. (1962). On estimation of a probability density function and mode. Annals of Mathematical Statistics, 33(3):1065–1076.
  • Price et al. (2018) Price, L. F., Drovandi, C. C., Lee, A., and Nott, D. J. (2018). Bayesian synthetic likelihood. Journal of Computational and Graphical Statistics, 27:1–11.
  • Rosenblatt (1956) Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27(3):832–837.
  • Shestopaloff and Neal (2014) Shestopaloff, A. Y. and Neal, R. M. (2014). On Bayesian inference for the M/G/1 queue with efficient MCMC sampling. arXiv preprint arXiv:1401.5548.
  • Sisson et al. (2018) Sisson, S. A., Fan, Y., and Beaumont, M. (2018). Handbook of Approximate Bayesian Computation. Chapman and Hall/CRC, 1st edition.
  • Sklar (1959) Sklar, M. (1959). Fonctions de Répartition À N Dimensions Et Leurs Marges. Université Paris 8.
  • Terrell and Scott (1992) Terrell, G. R. and Scott, D. W. (1992). Variable kernel density estimation. The Annals of Statistics, 20(3):1236–1265.
  • Wood (2010) Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature, 466:1102–1107.