1 Introduction
From its humble beginnings roughly two decades ago, the approximate Bayesian computation (ABC) inference framework for simulatorbased models with an intractable likelihood has witnessed a remarkable development over the recent years and a growing interest from a number of diverse application fields (Secrier et al., 2009; Itan et al., 2009; Drovandi and Pettitt, 2011; Corander et al., 2017; Cameron and Pettitt, 2012; Järvenpää et al., 2019b; Shen et al., 2019; Simola et al., 2019). The standard workhorse of ABC has until recently been the sequential Monte Carlo algorithm, which builds an importance sampler by starting from the prior distribution for model parameters and then successively adapts this proposal distribution towards the approximate posterior distribution from which samples are ultimately sought (Beaumont et al., 2009; Del Moral et al., 2012). For general overviews of the computational and statistical aspects of ABC, see (Marin et al., 2012; Lintusaari et al., 2017). Alongside ABC, other likelihoodfree techniques for simulatorbased models have continued to thrive, most notably synthetic likelihood (Wood, 2010; Price et al., 2018; Ong et al., 2018b) and indirect inference (Drovandi et al., 2015).
A related but distinct approach is to use classifiers in the statistical inference procedure. Classifier ABC initially used classification accuracy as a discrepancy measure for rejection sampling
(Gutmann et al., 2018), and subsequent work has used classifiers to approximate ratios for frequentist likelihood testing (Cranmer et al., 2015), and both local and global approximations to a Bayesian posterior (Thomas et al., 2016; Hermans et al., 2019; Kokko et al., 2019). Formal use of classifiers for parameter estimation for computer simulation models dates back at least a decade
(Pratola et al., 2013). Neural network classifiers provide more expansive models for complex data distributions, whereas logistic regression with regularisation allow for interpretable feature selection from very large numbers of summary statistics
(Thomas et al., 2016). The rationale of training models via the discrimination of generated samples from observed samples has also been used for Generative Adversarial Networks, for which likelihood functions are not otherwise available (Goodfellow et al., 2014; Mohamed and Lakshminarayanan, 2016). The most notable difference between these models and the models considered in the ABC and synthetic likelihood context is that parameters of the former model class are generally not interpretable in the standard statistical sense. There has also been recent interest in the use of neural likelihood methods, which use neural networks to map between observed data and the parameters of distributions asserted over the simulator parameters (Papamakarios and Murray, 2016; Papamakarios et al., 2018).Inference for simulators with interpretable parameters of a high dimension remains a challenging and open question. Methods developed for lowdimensional models cannot be assumed to scale effectively to higherdimensional spaces, suggesting that specialized methods are more appropriate for such problems. The state of the field has been recently reviewed in Chapter 3 of Sisson et al. (2018)
. Gaussian Copula ABC defines distinct discrepancy functions for each parameter value, using a copula approach to characterize pairwise joint distributions of the variables and performing rejection sampling from the prior
(Li et al., 2017). ABC via regression density estimation asserts a Gaussian mixture model for the likelihood of the summary statistics and uses regression models to capture the relation between the parameters and the parameters of the mixture model, again relying on sampling from the prior
(Fan et al., 2013). Highdimensional inference has also been extended to synthetic likelihood methods, using shrinkage estimation of the covariance matrices of the summary statistics and sparse representation of the variational covariance approximation to project to a lowerdimensional space (Ong et al., 2018a). This approach has been demonstrated to perform successful inference for a large number of parameters and summary statistics, albeit requiring a substantial number of simulations and computation time, as is typical for the family of synthetic likelihood methods.Here we build upon the Bayesian optimization approach to accelerating likelihoodfree inference (Gutmann and Corander, 2016) and introduce a framework by which it can be more robustly applied to highdimensional parameter spaces. In order to achieve computational scalability, we use separate acquisition procedures for the discrepancies defined for different model parameters. These efficient highdimensional simulation acquisitions are then combined with exponentiated losslikelihoods to provide a misspecificationrobust characterisation of the marginal posterior distribution for all model parameters.
The remainder of the paper is structured as follows. The following two sections present the previous work that our method builds upon directly. Section 4 explains the methods developed in this paper, while Section 5 presents the performance of the method on both synthetic and real data. The ultimate section concludes with a discussion and proposals of future work.
2 Bayesian optimisation for likelihoodfree inference
A key insight to gaining accelerated likelihoodfree inference was obtained by replacing standard random draws from an importance distribution by efficient acquisitions delivered by the probabilistic Bayesian optimization (Gutmann and Corander, 2016). An important feature of Bayesian optimization is that is stops suggesting proposals from a poorly representative area in the parameter space as soon as it has gained sufficient level of certainty that the objective function values there are inferior to other areas. The Bayesian optimisation for Likelihood Free Inference (BOLFI) method (Gutmann and Corander, 2016) asserts a Gaussian Process (GP) prior on the discrepancy function , considered as a function of the parameter space , parametrized with mean and covariance functions and
(1) 
As with previous ABC methods, estimates of are derived from simulations through a norm of the difference of summary statistics evaluated on the simulated and observed data, i.e.
. The use of a GP prior and the consequent posterior distribution over the discrepancy function learned in an online fashion by running the simulator model sequentially allows for the use of a Bayesian optimisation algorithm for the purpose of efficiently selecting parameter values for the forward simulations. Consequently, BOLFI makes very efficient use of computational resources when the simulations are expensive. After all simulations have been acquired, a kernel density estimate (KDE) is then used to construct a proxy likelihood
from the GP posterior. For a uniform kernel with a specific bandwidth, this is proportional to the probability of the discrepancy being lower than a threshold
often taken to be the minimum discrepancy(2) 
Bayesian optimisation falls within the field of probabilistic numerics, in which probabilistic models are used to effectively solve problems traditionally considered within the realm of numerics, such as optimisation and quadrature (Hennig et al., 2015). For Bayesian optimisation, the posterior properties of a GP model trained on evaluations of an objective function are used to perform efficient acquisition of new evaluation points (Shahriari et al., 2016)
. Such an approach relies on the choice of a heuristic acquisition function, which encodes some probabilistic principle to reduce the uncertainty of the location of the optimum of the function. A commonly used acquisition function is the Lower Confidence Bound (LCB), composed of the mean
and marginal standard deviation
of the GP posterior, and a tradeoff parameter(3) 
However, also several other acquisition functions have been more recently considered for ABC, as the accuracy of the resulting posterior approximation may be influenced by the particular choice (Järvenpää et al., 2019a). The software package ELFI (Lintusaari et al., 2018)
offers users a choice between the LCB and expected integrated variance loss function introduced by
(Järvenpää et al., 2019a). which was found to be particularly accurate, although it may require an elevated computational effort. Variations on the basic Bayesian optimisation algorithm have been developed, including multitask variations and methods designed for highdimensional parameter spaces (Swersky et al., 2013). Such highdimensional variations generally rely on different kinds of sparsity assumptions. (Kandasamy et al., 2015) relies on an additive kernel structure to increase stability in highdimensions, the rationale stemming from the fact that given the linear property of GPs, this is equivalent to assuming that the objective function is composed of a sum of onedimensional functions, each associated with each parameter. This also has the effect of separating certain acquisition function into a sum of independent acquisition functions across each dimension, making the acquisition optimisation more tractable as(4) 
3 Misspecification and exponentiated loss likelihoods
Misspecification remains a central problem for Bayesian inference in general and the likelihoodfree context is not an exception, where much of the existing work is dedicated to developing effective techniques for diagnosing such an issue
(Frazier et al., 2017; Lee et al., 2018; Thomas and Corander, 2019).A central piece in the previous research concerned with misspecification has introduced a new method for belief updates based on Bayesian principles, only assuming the existence of a loss function rather than an explicit likelihood, in a bid to avoid model misspecification (Bissiri et al., 2016). The use of discrepancies within a likelihoodfree context is clearly reminiscent of the use of a loss function
, a concept central to the derivation of Bayes’ Theorem from decisiontheoretic principles. Such reasoning concludes that, in the absence of a true likelihood, an exponentiation of the negative loss with some tempering
should be used for a belief update(5) 
Further work also approaches the exponentiated negative loss likelihood within a Safe Bayesian approach to avoid misspecifcation, offering a principled method to select the learning rate parameter (Grünwald, 2012).
A similar proxy likelihood is used by the Probably Approximately Correct (PAC) Bayes community, which aims to perform inference only given a loss evaluated on the observed data, without assuming an explicit model. The exponentiated negative loss belief distribution is also derived from different principles. Such an approach has been extended in a thorough and principled way to ABC, providing nonasymptotic guarantees of convergence, even under misspecification (Ridgway, 2017).
4 Bayesian optimization and exponentiated losses
Our work aims to extend existing likelihoodfree inference to larger dimensional parameter spaces , by building on the BOLFI methodology. This is achieved by using independent GP priors on the discrepancies defined for each of the parameters, and by using an exponentiated loss for likelihood characterisation. We call this modular version of the BOLFI methodology SplitBOLFI.
The additive separation of discrepancies between parameters with an exponentiated loss likelihood corresponds to an assumption that the discrepancy function can be decomposed into the sum of separate discrepancies , each informative to the th parameter. These in turn are modelled by proxy discrepancies , solely as functions of corresponding parameters , implying in a single GP proxy for with additive mean and covariance structure
(6)  
(7) 
Additionally, assuming separate models for the discrepancies brings about separate acquisition functions, reducing the dimensional problem of acquisition to
1dimensional optimisation problems. It is worthwhile noting that the acquisitions and hyperparameter inference for each GP can be parallelised in a straightforward manner, which brings a clear computational advantage with this approach.
Another substantial difference from the previous BOLFI approach is a different method for likelihood characterisation given a GP posterior for the discrepancy. Previous versions used the expectation of a KDE with a uniform kernel. This paper uses a KDE with an exponential kernel, which is equivalent to the exponentiated loss likelihood method discussed above, with a tempering constant
(8) 
It is possible to substitute the loss with the discrepancy , to within a multiplicative constant describing the change in lengthscale between the two quantities, i.e. . It is then necessary to propagate the uncertainty on the GP discrepancy proxy through the exponential link function. The marginal distribution at any evaluation of the posterior proxy
will be that of a lognormal distribution. It is then possible to use properties of the lognormal distribution to successfully propagate the uncertainty. Using the median of the marginal lognormal distribution, we now present a likelihood proxy
(9) 
where is the posterior mean of the surrogate GP discrepancy function.
An implication of using an exponential kernel with an additive discrepancy is the factorisation of the likelihood approximation across parameters. The additive structure present in the discrepancy becomes a multiplicative structure in the likelihood proxy through the use of an exponential mechanism. Consequently the posterior approximations derived from each separate discrepancy component can be interpreted as marginal distributions of a joint distribution, which is not necessarily the case with other posterior characterisation methods.
It is also not necessary to use the same for each component of the additive discrepancy, so each parameter is assigned its own
(10) 
This reduces the problem of characterising the pseudo posterior to a set of onedimensional problems, effectively avoiding highdimensional sampling. We illustrate this separable structure in Figure 1.
The second implication of the modular structure is to increase the robustness to misspecification of the inference procedure as the use of an exponential kernel can be derived from principles designed to reduce the probability of an overly confident update with an incomplete generative model, as reviewed in Section 3.
Given a discrepancy proxy function derived by the GP , it is still necessary to specify the parameters and to derive a corresponding posterior proxy. A value of corresponds to a standard Bayesian update with a loss . Smaller values of can be used in situations where a tempered update is desirable. We use a similar method to the original BOLFI method to establish . The minimum values of each mean discrepancy gives an indication of both the lengthscale variation of the discrepancy function varies and the degree of misspecifcation of the model.
Consequently, the use of in place of has some desirable properties, illustrated in Figure 2.

If the variation in is small relative to , the model is probably poorly specified and we should be cautious doing a full belief update. In this context, the factor can be interpreted as both providing the appropriate likelihood tempering for a misspecified model, as well as providing the approriate rescaling between the discrepancy and the loss function.

If the summary statistics and hence discrepancy are defined with an arbitrary lengthscale, then the factor of will help to normalize the discrepancy to vary on a range natural to the information present in the data.

The value of can be calculated in a straightforward manner by minimizing each of the marginal discrepancies independently to find each .
In practice, we use as occasionally the GP posterior mean can dip below zero, making hard to interpret. In these situations, we use the minimum observed discrepancy value instead of the minimum discrepancy posterior value in the above approach. The value of remains a free parameter to be investigated later, but from our experiments it appears that the problems of lengthscale characterisation and misspecification tempering are both satisfactorily addressed through .
There is a connection between SplitBOLFI and CopulaABC, which uses separate discrepancies for each parameter and also pairwise combinations of parameters. The original formulation of the latter algorithm performed separate rounds of simulation to characterize the discrepancy associated with each parameter. In practice, a large pool of simulations from the prior is reused to characterize each of the separate discrepancies. The use of samples from the prior might be expected to be suboptimal from the computational perspective, so we expect the use of marginal Bayesian optimisations to significantly improve the acquisition efficiency.
5 Examples and results
In this section, we present the performance of the SplitBOLFI method on several highdimensional simulator models, including comparisons with an established likelihoodfree sampling method designed for high dimensions. We begin with a multivariate Gaussian example with an easily accessible analytical posterior distribution, followed by a graphical vector autoregressive (GVAR) model designed for timeseries data. Finally, we demonstrate SplitBOLFI with a highdimensional model designed for capturing competition dynamics between different species/strains of bacteria in a daycare transmission setting, presenting interpretable and biologically sound insights into the data.
The implementation of SplitBOLFI used in the simulations is available in ELFI (Engine for LikelihoodFree Inference), a statistical software package coded in python for likelihoodfree inference (Lintusaari et al., 2018). ELFI is a practical and flexible library, providing a general computational framework featuring implementations of widelyused inference algorithms, including BOLFI, SMCABC and LFIRE (Kokko et al., 2019)
. It is distributed under an open source BSD3 license, and is supported from the webpage
elfi.ai.5.1 Multidimensional Gaussian
We start with a simple example for estimating the mean vector of a Gaussian distribution. The data are sampled from a
dimensional multivariate Gaussian distribution with known covariance matrix equal to the identity matrix. The summary statistics used are the means of each dimension of the simulated data and since the parameter elements are independent, this is an ideal situation for a factorized inference method like SplitBOLFI.
We judge the success of the algorithm by whether the mean and mode of the returned proxy posterior are close to the true generating value of the means of the Gaussian data and the true analytical posterior which are accessible in this simple case. We also assess the standard deviation of the posterior proxy compared to the true posterior, and the symmetrized KullbackLeibler distance between the proxy posterior and the analytical posterior. We use the analytical true posterior to assess strategies for selecting the parameters and described in Section 4.
We also compare to a highdimensional ABC algorithm similar to Copula ABC. A large pool of simulations from the prior were used to draw samples from the marginal distributions defined by the same discrepancies used by SplitBOLFI. We are comparing the accuracies of the estimates of the marginal distributions, so the Gaussian copula step of CopulaABC was not used for the pairwise joint distributions between parameters. We aim to use comparable numbers of simulations with the experiments performed with each method, since there is a clear tradeoff between the quality and quantity of the accepted samples when performing rejection ABC with a fixed number of simulations. We present results here with varying numbers of accepted samples and quantile thresholds to make this tradeoff as clear as possible. The total number of simulations necessary for an ABC run, given a predetermined quantile threshold and number of accepted samples are shown in Table
A.6.SplitBOLFI was used to estimate posteriors for the multidimensional Gaussian. Matérn kernels were used for the kernel components, with Gamma(2, 2) hyperpriors for the lengthscales and exp(1) hyperpriors for the kernel variances. A onedimensional example in Figure
3 presents the GP discrepancy, posterior proxy, and acquisition of samples for , demonstrating efficient use of computational resources around the posterior mode. Further, Figure A.1 similarly presents discrepancies, proxies and evaluations with for mean parameters of a 25dimensional Gaussian model. Inference was successfully performed for higherdimensional models, not shown visually here due to space considerations.The average RMSE results and standard deviations of the posterior proxy means and modes are presented in Table 1. The RMSEs are evaluated against the parameter value that generated the observed data under “Generating value”, and against the mean of the exact analytical posterior under “Exact Posterior mean”. Given the relatively small size of the simulated data set (), the posterior mean may actually not be expected to lie close to the true generative value. SplitBOLFI was effective at identifying the true generative mean with a small number of simulations for up to 100dimensional distributions, as can be seen in Table 1. Comparing against the analytical posterior mean, which is the distribution to which the likelihood proxy would be expected to converge with increasing simulations, the RMSE appears to asymptote to zero, suggesting that the likelihood proxy is converging in expectation to the analytical posterior.
CopulaABC was also used to perform inference for highdimensional Gaussian model. We compare SplitBOFLI to CopulaABC with a similar number of simulations, but also use CopulaABC using a large number of samples and a small threshold as a neargoldstandard for the posterior, conditioned on the summary statistics. It is clear that CopulaABC converges to the same expected value of the parameter, asymptoting with more simulations to a constant RMSE against the true parameter value in Table A.1 and to zero for the analytical posterior mean in Table A.2.
The RMSEs for CopulaABC and SplitBOLFI are visually represented in Figure 4 The characterisation of the posterior mean converges more slowly with the number of simulations for CopulaABC, suggesting that SplitBOLFI makes more efficient usage of simulations. The posterior mean is poorly characterized for very small numbers of samples or large quantile thresholds.
We present the standard deviation of the derived samples in Table A.3, showing convergence to the analytical posterior value of 0.1 for small quantiles and large number of accepted samples. We see in Table 1 that SplitBOLFI also approaches the true value of the posterior standard deviation from above with increasing number of acquisitions, although the proxy standard deviation was heavily determined by the tempering parameters and .
We addressed the question of how to choose the parameter
for constructing a likelihood proxy by comparing the symmetrized KullbackLeibler divergence between the constructed likelihood proxy and the true posterior under a uniform prior. We see from Table
2 that a value of , combined with , minimizes the symmetrized KullbackLeibler divergence within an order of magnitude. This suggests that the use of is reasonable to temper the likelihood without a further parameter . It is possible that a value of between 1 and 10 is strictly optimal for this specific example, but in the interest of simplicity and general applicability, we use a value of for likelihood characterisation for all further experiments.Generating value  Exact Posterior mean  
Mean  Mode  Mean  Mode  s.d.  
50  0.1036  0.1083  0.02297  0.03923  0.1717 
100  0.1016  0.1070  0.01338  0.03461  0.1558 
250  0.1003  0.1040  0.008130  0.03139  0.1453 
.01  .1  1  10  100  

50  229.97  40.548  0.4162  1.2625  15.606 
100  228.42  35.329  0.2304  1.1349  14.302 
250  225.82  31.904  0.1536  1.3034  15.965 
5.2 Graphical Vector Autoregression
We progress to a more complex example of sparsely parametrized graphical vector autoregression (GVAR), a time series model capturing the correlation structure between multiple variables changing with time (Corander and Villani, 2006; Lütkepohl, 2005). We design a sparse GVAR model, such that we simulate a number of variables from a noise parameter and a sparse transition matrix with known sparsity structure but unknown parameter values. The transition matrix is designed to have negative unity diagonal values, and offdiagonal interactions assigned such that each variable is coupled to exactly one other variable, with a transition matrix element absolute value of less than one to ensure the simulation stability.
The dynamic equations updating the observation over a timestep to are as follows:
(11)  
(12) 
The model is parametrized by the components of the transition matrix and the noise parameter . This model has a tractable likelihood, the structure of which motivates the choice of some informative summary statistics. The summary statistics used were the vector autocorrelation between the observed variables, with a lag of 1. For each of the transfer matrix parameters, the lagged correlation between the corresponding observable dimension was used, while the summary for the noise parameter is the sum over those used for all of the transition matrix parameters.
True value  
Mean  Mode  s.d.  
50  0.07001  0.07694  0.1031 
100  0.06156  0.06144  0.09618 
250  0.05728  0.05640  0.08901 
Mean RMSEs for posterior proxy means and modes with respect to the true generative parameter value, alongside the proxy posterior standard deviations. Results are averaged over 10 random seeds and dimensionalities of 6, 21 and 101, for a Gaussian vector autoregressive model with
acquisition points and tempering =1.We performed experiments with 500 time steps and noise , with nonzero elements of
drawn from a uniform distribution between
. Examples of simulated timeseries are shown in Figure 5. Uniform priors in the interval were used for the unknown values of and a uniform prior in the interval was used for the noise term . Matérn kernels were used for the kernel components, with Gamma(2, 2) hyperpriors for the lengthscales and exp(1) hyperpriors for the kernel variances.Inference was performed over models with different sized number of observed variables, of size 5, 20 and 100, which if we include the noise parameter , correspond to parameter spaces of size 6, 21 and 101, respectively. SplitBOLFI was performed with , and acquisitions with ten random seeds, and CopulaABC was performed to generate 1, 2, 5, 10 and 50 samples with quantile thresholds of 0.1, 0.05, 0.02, 0.01, 0.004 and 0.005.
We present RMSEs and posterior standard deviations derived from SplitBOLFI and CopulaABC in Tables 3, A.4 and A.5. The RMSE results are represented visually in Figure 6. We see that SplitBOLFI achieves substantially smaller RMSEs for a given number of simulations compared to CopulaABC. The very high rejection rate simulations of CopulaABC suggest an average true posterior standard deviation of approximately 0.0731, which we can see the SplitBOLFI method also approaches with increasing number of acquisitions.
For visual reference, we present an example posterior proxy for a GVAR model generated from SplitBOLFI after 250 acquisitions for a 12dimensional model in Figure A.2. We see that the method clearly converges to the true value of 0.1 for the noise parameter, and similarly exhibits clear convergence for each of the transfer matrix parameters.
5.3 Daycare transmission dynamics competition model
To illustrate the SplitBOLFI method with a challenging highdimensional simulator model lacking tractable likelihood, we developed an extension of the pneumococcal competition model introduced in (Numminen et al., 2013). Our model is designed to capture the dynamics of bacterial colonization and competition between typical opportunistic bacterial pathogens circulating among young children in daycare centres. Each child can be colonized by a number of strains of bacteria, and each of these strains may compete within each daycare centre to colonize the children who are assumed to interact with one another and to transmit the bacteria further when they have established a stable colonization. The observations at any given time are encoded as a binary matrix , indicating which children are colonized by which pathogens. The vector indicates the background prevalence of each strain in the population, and the model is parametrized with scalars and a symmetric matrix .
The model is defined by a set of transition equations determining the probabilities of a colonization event taking place for a particular strain given the absences and presences of the other strains. The observed data are assumed to have been sampled from an equilibrium state distribution defined by the model parameters. The transition equations are defined as
(13) 
where is the standard normal CDF, determines the rate of transmission within the daycare population, determines the rate of transmission from the general population outside the daycare center, and determines the natural clearance rate of a strain within in a host. is a competition matrix that contains the strainspecific pairwise competition parameters , which suppresses the transmission of a strain if strain is already present in each host. A nonzero value of consequently implies competition between the two particular strains.
The parameters can only be estimated relative to one another, hence the parameter is set to a constant equal to one. Consequently, for a number of observed strains , the parameter space of the entire model is dimensional.
We followed the work by (Numminen et al., 2013) and their carefully constructed summary statistics were used for this model which were the Shannon index of diversity of the distribution of observed strains, the number of different strains observed, the prevalence of each strain, the prevalence of multiple colonizations, and the upper coprevalence matrix of each of the pairs of strains, normalized by their individual prevalence, i.e.
(14) 
The sums are calculated over the individual daycare centres and children, and the addition of ensures stability when zero prevalence of a strain is observed. The normalisation should ensure that joint sparsity is not confused for competitiveness. Each of the normalized coprevalence matrix elements was used for each competition parameter, and the sum of the other summary statistics was used for the other parameters.
The samples from each time point are considered independent draws from the underlying dynamic process and are compared to independently generated equilibrium states from the simulator implementing the transition equations. The observed data contained 20% missing values resulting from specific children being absent during a given week of a nurse visiting the daycare center. The missing data problem was addressed by making a missing at random assumption, such that the summary statistics of the observed data were used for the whole population, which should be a reasonable approximation to the missingness generating mechanism in the observation process.
The study design for the data collection was previously described in (SáLeão et al., 2008). Fortyseven children attending a daycare center in Lisbon, Portugal were sampled monthly during one year. The children occupied three rooms of the same daycare centre, and are here treated as a single population. Nasopharyngeal samples were taken eleven times between February 1998 to February 1999. Nasopharyngeal swabs were obtained and inoculated into agar media in order to select and identify Streptococcus pneumoniae, Streptococcus pyogenes, Staphylococcus aureus, Moraxella catarrhalis and Haemophilus influenzae. S. pneumoniae and H. influenzae were selectively cultured in blood agar with gentamicin and chocolate blood agar containing isovitalex and bacitracin, respectively. S. aureus was selectively cultured in mannitolsalt agar and S. pyogenes and M. catarrhalis were isolated from trypticasesoy blood agar plates. Pneumococcal identification was based on hemolysis and optochin susceptibility; H. influenzae were identified based on the requirement of X and V factors for growth; S. aureus was identified on the basis of mannitol fermentation and positive coagulase test; S. pyogenes was identified based on hemolysis and susceptibility to bacitracin and M. catarrhalis was identified based on colony morphology and positive tributyrin test. Routinely, a single colony of each species was isolated, cultured and frozen. S. pneumococci strains were serotyped by the Quellung reaction using commercially available antisera (Statens Serum Institut, Copenhagen, Denmark).
The presence of H. influenzae, S. pneumoniae, M. catarrhalis, S. aureus and S. pyogenes were recorded, with 14 serotypes of S. pneumoniae also identified and recorded when present. Some of the serotypes were observed very infrequently and were discarded from our analysis, leading to the four most common being retained. These are the serotypes 19F, 23F, 6B and 14, which are common in prepneumococcal conjugate vaccine infant populations. These four most common serotypes of S. pneumoniae were considered as distinct strains in the simulations alongside H. influenzae, M. catarrhalis, S. aureus and S. pyogenes. The strains will be referred to using the letters in Table 4 for brevity.
Label  Strain name  Label  Strain name 

A  H. influenzae  E  S. pneumoniae, serotype 23F 
B  M. catarrhalis  F  S. pyogenes 
C  S. pneumoniae, serotype 19F  G  S. pneumoniae, serotype 6B 
D  S. aureus  H  S. pneumoniae, serotype 14 
The strains used in the highdimensional data analysis, with labels used for reference.
As a result, we performed inference on a model with 28 competition parameters and the two parameters and . SplitBOLFI used 100 simulator acquisitions to fit GPs to the discrepancies associated with each parameter and to generate posterior proxies for each of the marginal distributions. Matérn kernels were used for the kernel components, with exp(1) hyperpriors for the kernel variances and lengthscales held fixed to a value of 8.
In Figure 7 we present the proxy likelihoods for , , and competition parameters for which most of the mass is away from zero, all of which demonstrate convergence to a discrepancy minimum away from zero, with the Bayesian optimisation algorithm clearly drawing acquisitions from near the posterior mode. The competition parameters posteriors assigning most of the mass close to zero are presented in Figure A.3, demonstrating convergence to a competition parameter value close to zero. The parameters whose corresponding discrepancies do not approach zero have correspondingly more dispersed posterior proxies, demonstrating the misspecification robustness described in Figure 2.
The posteriors which place mass away from a zero value of the competition parameter provide evidence that the corresponding pairs of strains are competitive. Posterior statistics derived from , , and the fifteen competition parameter distributions that put least mass on zero are presented in Table A.7. All of the means and modes of the posterior proxies, and the corresponding pairs of strains are also presented.
The pairs of strains that SplitBOLFI identifies as competitive form a complete subset of the strains considered, with every pairwise competition parameter between M. catarrhalis, S. pyogenes, and each of the serotypes of S. pneumoniae identified as most likely to be nonzero. Our results make sound biological sense in the light of existing literature on colonization competition between successful pneumococcal strain and between the species M. catarrhalis, S. pyogenes Dahlblom and Söderström (2012); Xu et al. (2012); Jourdain et al. (2011); Man et al. (2017); Gjini et al. (2016); Valente et al. (2016). It is also reassuring that the inference has identified a complete subset of strains as being mutually competitive, as such a structure was not a priori encoded into the model parameters and suggests that the inference method is coherently identifying interactions of interest. This result goes beyond previous analyses that were restricted to lowerdimensional spaces and hence considered all of the bacterial strains to be equally competitive with one another Numminen et al. (2013). The extended results here illustrate the potential of the SplitBOLFI approach to infer parameters for models representing complex processes observed in reality.
6 Conclusion
In this work, we have extended the existing BOLFI methodology for likelihoodfree inference to highdimensional parameter spaces, through the use of additively defined discrepancies and exponentiated loss proxy likelihoods. The methodology exhibits efficient use of simulation resources in highdimensional spaces, targeting parameter acquisitions local to the marginal modes through the use of Bayesian optimisation, and also contains robustness to misspecification through the use of exponentiated loss likelihoods and tempering in the case of large minimum discrepancies. We demonstrate the ability of the algorithm to work well compared to existing methods in 100dimensional spaces for Gaussian and GVAR models, representing a very highdimensional space for likelihoodfree inference.
We note that the fully factorized structure of the posterior is an assumption that may not hold, but the marginal distributions of the parameters should still be well specified. If there is strong a priori belief that the correlation structure between any specific parameters cannot be ignored, then individual lowdimensional interaction terms can be included within the kernel without overly compromising the computational efficiency of the method. Including pairwise terms in the discrepancy also combines the acquisition functions associated with the corresponding parameters. When wanting to use the same simulations for inference on different parameters, it is necessary to limit the number of correlations considered in order to maintain the independence of the parameter acquisitions for each dimension.
We also stress that the same kernel does not need to be used for the purposes of simulation acquisition and posterior characterization, depending on the structure anticipated. It is possible to use separate GP priors to make efficient parameter acquisitions optimally for the marginal distributions, then use the resulting parameter values and discrepancy evaluations to build a likelihood proxy with a complex correlation structure. The acquisitions will not have been performed completely optimally for modelling the joint distribution, but we consider drawing simulations optimally for marginal characterisation to be a generally reasonable acquisition strategy.
We also demonstrated the ability of the SplitBOLFI framework to work with real data on a generative model relevant to questions in contemporary epidemiology. We successfully perform inference in a 30dimensional space (also a large parameter space for likelihoodfree methods), to identify novel and biologically consistent competition dynamics between pathogens in a real world problem.
Acknowledgements
OT, HP and JC are supported by European Research Council grant [742158] (SCARABEE, Scalable inference algorithms for Bayesian evolutionary epidemiology). RSL and HdL were supported Projects LISBOA010145FEDER007660 (Microbiologia Molecular, Estrutural e Celular, funded by FEDER funds through COMPETE2020  Programa Operacional Competitividade e Internacionalização (POCI) and LISBOA010145FEDER016417 (ONEIDA project, cofunded by FEEI  ”Fundos Europeus Estruturais e de Investimento” from ”Programa Operacional Regional Lisboa 2020”).
7 Supplementary Tables
0.1  0.05  0.02  0.01  0.005  0.004  

1  0.5719  0.3498  0.1969  0.1540  0.1458  0.1447  
2  0.3461  0.1991  0.1408  0.1287  0.1205  0.1177  
5  0.1770  0.1280  0.1177  0.1117  0.1078  0.1084  
10  0.1340  0.1156  0.1091  0.1067  0.1046  0.1052  
50  0.1075  0.1043  0.1026  0.1017  0.1019  0.10177955 
0.1  0.05  0.02  0.01  0.005  0.004  

1  0.5580  0.3299  0.1712  0.1236  0.1105  0.1110  
2  0.3254  0.1698  0.09587  0.07689  0.07407  0.07206  
5  0.1536  0.08255  0.05593  0.04888  0.04500  0.04404  
10  0.1040  0.05632  0.03987  0.03226  0.03167  0.03084  
50  0.04341  0.02533  0.01638  0.01471  0.01455  0.01493 
0.1  0.05  0.02  0.01  0.005  0.004  

1              
2  0.3049  0.1688  0.1155  0.09399  0.08447  0.08137  
5  0.3312  0.1878  0.1157  0.1016  0.09773  0.09616  
10  0.3148  0.1806  0.1155  0.1034  0.09834  0.09875  
50  0.3072  0.1758  0.1152  0.1029  0.09981  0.09955 
Standard deviations of ABC samples as unbiased estimates of the analytical posterior standard deviation, averaged over 10 random seeds and dimensionalities of 5, 10, 50 and 100, for a multivariate Gaussian model, with
accepted samples and quantile threshold .0.1  0.05  0.02  0.01  0.005  0.004  

1  0.1617  0.1325  0.1160  0.1110  0.1139  0.1182  
2  0.1217  0.1009  0.09933  0.09993  0.09918  0.1033  
5  0.09393  0.08979  0.09140  0.09234  0.09185  0.09016  
10  0.08823  0.08775  0.08770  0.08732  0.08726  0.08515  
50  0.08538  0.08392  0.08183  0.08399  0.08370  0.08418 
0.1  0.05  0.02  0.01  0.005  0.004  

1              
2  0.08732  0.06210  0.05751  0.05879  0.05732  0.05756  
5  0.08952  0.07464  0.06821  0.06966  0.06896  0.06833  
10  0.09083  0.07547  0.07031  0.06970  0.07001  0.07106  
50  0.09259  0.07847  0.07362  0.07334  0.07347  0.07310 
0.1  0.05  0.02  0.01  0.005  0.004  

1  10  20  50  100  200  250  
2  20  40  100  200  400  500  
5  50  100  250  500  1000  1250  
10  100  200  500  1000  2000  2500  
50  500  1000  2500  5000  10000  12500 
mean  mode  sd  strains 

5.3644  5.51  2.6835  
6.7477  7.81  2.3101  
2.5580  3.000  0.3899  G, H 
2.3527  3.000  0.5885  C, G 
2.3031  2.463  0.4479  E, F 
2.2450  3.000  0.5194  E, G 
2.2158  2.796  0.5357  E, H 
2.1243  2.112  0.4182  F, H 
2.0545  1.950  0.4451  F, G 
1.9856  1.755  0.5720  B, F 
1.9669  1.827  0.6086  B, C 
1.6974  2.745  0.7928  C, F 
1.6774  1.515  0.7175  B, H 
1.6634  1.260  0.7514  B, G 
1.6528  1.236  0.7124  B, E 
1.5215  1.404  0.8089  C, H 
1.5053  1.776  0.5295  C, E 
8 Supplementary Figures
References
 Adaptive approximate bayesian computation. Biometrika 96 (4), pp. 983–990. Cited by: §1.
 A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (5), pp. 1103–1130. Cited by: §3.
 Approximate bayesian computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift. Monthly Notices of the Royal Astronomical Society 425 (1), pp. 44–65. Cited by: §1.
 Frequencydependent selection in vaccineassociated pneumococcal population dynamics. Nature ecology & evolution 1 (12), pp. 1950. Cited by: §1.
 A bayesian approach to modelling graphical vector autoregressions. Journal of Time Series Analysis 27 (1), pp. 141–156. Cited by: §5.2.
 Approximating likelihood ratios with calibrated discriminative classifiers. arXiv preprint arXiv:1506.02169. Cited by: §1.
 Bacterial interactions in the nasopharynx–effects of host factors in children attending daycare centers. Journal of infection and public health 5 (2), pp. 133–139. Cited by: §5.3.
 An adaptive sequential monte carlo method for approximate bayesian computation. Statistics and Computing 22 (5), pp. 1009–1020. Cited by: §1.
 Bayesian indirect inference using a parametric auxiliary model. Statistical Science, pp. 72–95. Cited by: §1.
 Estimation of parameters for macroparasite population evolution using approximate bayesian computation. Biometrics 67 (1), pp. 225–233. Cited by: §1.
 Approximate bayesian computation via regression density estimation. Stat 2 (1), pp. 34–48. Cited by: §1.
 Model misspecification in abc: consequences and diagnostics. arXiv preprint arXiv:1708.01974. Cited by: §3.
 How direct competition shapes coexistence and vaccine effects in multistrain pathogen systems. Journal of Theoretical Biology 388, pp. 50–60. Cited by: §5.3.
 Generative adversarial networks. arXiv preprint arXiv:1406.2661. Cited by: §1.
 The safe bayesian. In International Conference on Algorithmic Learning Theory, pp. 169–183. Cited by: §3.

Bayesian optimization for likelihoodfree inference of simulatorbased statistical models.
The Journal of Machine Learning Research
17 (1), pp. 4256–4302. Cited by: §1, §2.  Likelihoodfree inference via classification. Statistics and Computing 28 (2), pp. 411–425. Cited by: §1.
 Probabilistic numerics and uncertainty in computations. Proc. R. Soc. A 471 (2179), pp. 20150142. Cited by: §2.
 Likelihoodfree mcmc with approximate likelihood ratios. arXiv preprint arXiv:1903.04057. Cited by: §1.
 The origins of lactase persistence in europe. PLoS computational biology 5 (8), pp. e1000491. Cited by: §1.
 Efficient acquisition rules for modelbased approximate bayesian computation. Bayesian Analysis 14 (2), pp. 595–622. Cited by: §2.
 A bayesian model of acquisition and clearance of bacterial colonization incorporating withinhost variation. PLoS computational biology 15 (4), pp. e1006534. Cited by: §1.
 Differences in nasopharyngeal bacterial carriage in preschool children from different socioeconomic origins. Clinical microbiology and infection 17 (6), pp. 907–914. Cited by: §5.3.
 High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, pp. 295–304. Cited by: §2.
 PYLFIRE: python implementation of likelihoodfree inference by ratio estimation. Wellcome Open Research 4 (197), pp. 197. Cited by: §1, §5.
 Calibration procedures for approximate bayesian credible sets. arXiv preprint arXiv:1810.06433. Cited by: §3.
 Extending approximate bayesian computation methods to high dimensions via a gaussian copula model. Computational Statistics & Data Analysis 106, pp. 77–89. Cited by: §1.
 Fundamentals and recent developments in approximate bayesian computation. Systematic biology 66 (1), pp. e66–e82. Cited by: §1.
 Elfi: engine for likelihoodfree inference. The Journal of Machine Learning Research 19 (1), pp. 643–649. Cited by: §2, §5.
 New introduction to multiple time series analysis. Springer Science & Business Media. Cited by: §5.2.
 The microbiota of the respiratory tract: gatekeeper to respiratory health. Nature Reviews Microbiology 15 (5), pp. 259. Cited by: §5.3.
 Approximate bayesian computational methods. Statistics and Computing 22 (6), pp. 1167–1180. Cited by: §1.
 Learning in implicit generative models. arXiv preprint arXiv:1610.03483. Cited by: §1.
 Estimating the transmission dynamics of streptococcus pneumoniae from strain prevalence data. Biometrics 69 (3), pp. 748–757. Cited by: §5.3, §5.3, §5.3.
 Likelihoodfree inference in high dimensions with synthetic likelihood. Computational Statistics & Data Analysis 128, pp. 271–291. Cited by: §1.
 Variational bayes with synthetic likelihood. Statistics and Computing 28 (4), pp. 971–988. Cited by: §1.
 Fast free inference of simulation models with bayesian conditional density estimation. In Advances in Neural Information Processing Systems, pp. 1028–1036. Cited by: §1.
 Sequential neural likelihood: fast likelihoodfree inference with autoregressive flows. arXiv preprint arXiv:1805.07226. Cited by: §1.
 Fast sequential computer model calibration of large nonstationary spatialtemporal processes. Technometrics 55 (2), pp. 232–242. Cited by: §1.
 Bayesian synthetic likelihood. Journal of Computational and Graphical Statistics 27 (1), pp. 1–11. Cited by: §1.
 Probably approximate bayesian computation: nonasymptotic convergence of abc under misspecification. arXiv preprint arXiv:1707.05987. Cited by: §3.

High rates of transmission of and colonization by streptococcus pneumoniae and haemophilus influenzae within a day care center revealed in a longitudinal study
. Journal of clinical microbiology 46 (1), pp. 225–234. Cited by: §5.3.  The abc of reverse engineering biological signalling systems. Molecular BioSystems 5 (12), pp. 1925–1935. Cited by: §1.
 Taking the human out of the loop: a review of bayesian optimization. Proceedings of the IEEE 104 (1), pp. 148–175. Cited by: §2.
 Pneumococcal quorum sensing drives an asymmetric owner–intruder competitive strategy during carriage via the competence regulon. Nature microbiology 4 (1), pp. 198. Cited by: §1.
 Machine learning accelerated likelihoodfree event reconstruction in dark matter direct detection. Journal of Instrumentation 14 (03), pp. P03004. Cited by: §1.
 Handbook of approximate bayesian computation. Chapman and Hall/CRC. Cited by: §1.
 Multitask bayesian optimization. In Advances in neural information processing systems, pp. 2004–2012. Cited by: §2.
 Diagnosing model misspecification and performing generalized bayes’ updates via probabilistic classifiers. arXiv preprint arXiv:1912.05810. Cited by: §3.
 Likelihoodfree inference by ratio estimation. arXiv preprint arXiv:1611.10242. Cited by: §1.
 Impact of the 13valent pneumococcal conjugate vaccine on streptococcus pneumoniae multiple serotype carriage. Vaccine 34 (34), pp. 4072–4078. Cited by: §5.3.
 Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466 (7310), pp. 1102. Cited by: §1.
 Nasopharyngeal bacterial interactions in children. Emerging infectious diseases 18 (11), pp. 1738. Cited by: §5.3.
Comments
There are no comments yet.