Split-BOLFI for for misspecification-robust likelihood free inference in high dimensions

02/21/2020 ∙ by Owen Thomas, et al. ∙ 13

Likelihood-free inference for simulator-based statistical models has recently grown rapidly from its infancy to a useful tool for practitioners. However, models with more than a very small number of parameters as the target of inference have remained an enigma, in particular for the approximate Bayesian computation (ABC) community. To advance the possibilities for performing likelihood-free inference in high-dimensional parameter spaces, here we introduce an extension of the popular Bayesian optimisation based approach to approximate discrepancy functions in a probabilistic manner which lends itself to an efficient exploration of the parameter space. Our method achieves computational scalability by using separate acquisition procedures for the discrepancies defined for different parameters. These efficient high-dimensional simulation acquisitions are combined with exponentiated loss-likelihoods to provide a misspecification-robust characterisation of the marginal posterior distribution for all model parameters. The method successfully performs computationally efficient inference in a 100-dimensional space on canonical examples and compares favourably to existing Copula-ABC methods. We further illustrate the potential of this approach by fitting a bacterial transmission dynamics model to daycare centre data, which provides biologically coherent results on the strain competition in a 30-dimensional parameter space.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

From its humble beginnings roughly two decades ago, the approximate Bayesian computation (ABC) inference framework for simulator-based models with an intractable likelihood has witnessed a remarkable development over the recent years and a growing interest from a number of diverse application fields (Secrier et al., 2009; Itan et al., 2009; Drovandi and Pettitt, 2011; Corander et al., 2017; Cameron and Pettitt, 2012; Järvenpää et al., 2019b; Shen et al., 2019; Simola et al., 2019). The standard workhorse of ABC has until recently been the sequential Monte Carlo algorithm, which builds an importance sampler by starting from the prior distribution for model parameters and then successively adapts this proposal distribution towards the approximate posterior distribution from which samples are ultimately sought (Beaumont et al., 2009; Del Moral et al., 2012). For general overviews of the computational and statistical aspects of ABC, see (Marin et al., 2012; Lintusaari et al., 2017). Alongside ABC, other likelihood-free techniques for simulator-based models have continued to thrive, most notably synthetic likelihood (Wood, 2010; Price et al., 2018; Ong et al., 2018b) and indirect inference (Drovandi et al., 2015).

A related but distinct approach is to use classifiers in the statistical inference procedure. Classifier ABC initially used classification accuracy as a discrepancy measure for rejection sampling

(Gutmann et al., 2018), and subsequent work has used classifiers to approximate ratios for frequentist likelihood testing (Cranmer et al., 2015), and both local and global approximations to a Bayesian posterior (Thomas et al., 2016; Hermans et al., 2019; Kokko et al., 2019)

. Formal use of classifiers for parameter estimation for computer simulation models dates back at least a decade

(Pratola et al., 2013)

. Neural network classifiers provide more expansive models for complex data distributions, whereas logistic regression with regularisation allow for interpretable feature selection from very large numbers of summary statistics

(Thomas et al., 2016). The rationale of training models via the discrimination of generated samples from observed samples has also been used for Generative Adversarial Networks, for which likelihood functions are not otherwise available (Goodfellow et al., 2014; Mohamed and Lakshminarayanan, 2016). The most notable difference between these models and the models considered in the ABC and synthetic likelihood context is that parameters of the former model class are generally not interpretable in the standard statistical sense. There has also been recent interest in the use of neural likelihood methods, which use neural networks to map between observed data and the parameters of distributions asserted over the simulator parameters (Papamakarios and Murray, 2016; Papamakarios et al., 2018).

Inference for simulators with interpretable parameters of a high dimension remains a challenging and open question. Methods developed for low-dimensional models cannot be assumed to scale effectively to higher-dimensional spaces, suggesting that specialized methods are more appropriate for such problems. The state of the field has been recently reviewed in Chapter 3 of Sisson et al. (2018)

. Gaussian Copula ABC defines distinct discrepancy functions for each parameter value, using a copula approach to characterize pairwise joint distributions of the variables and performing rejection sampling from the prior

(Li et al., 2017)

. ABC via regression density estimation asserts a Gaussian mixture model for the likelihood of the summary statistics and uses regression models to capture the relation between the parameters and the parameters of the mixture model, again relying on sampling from the prior

(Fan et al., 2013). High-dimensional inference has also been extended to synthetic likelihood methods, using shrinkage estimation of the covariance matrices of the summary statistics and sparse representation of the variational covariance approximation to project to a lower-dimensional space (Ong et al., 2018a). This approach has been demonstrated to perform successful inference for a large number of parameters and summary statistics, albeit requiring a substantial number of simulations and computation time, as is typical for the family of synthetic likelihood methods.

Here we build upon the Bayesian optimization approach to accelerating likelihood-free inference (Gutmann and Corander, 2016) and introduce a framework by which it can be more robustly applied to high-dimensional parameter spaces. In order to achieve computational scalability, we use separate acquisition procedures for the discrepancies defined for different model parameters. These efficient high-dimensional simulation acquisitions are then combined with exponentiated loss-likelihoods to provide a misspecification-robust characterisation of the marginal posterior distribution for all model parameters.

The remainder of the paper is structured as follows. The following two sections present the previous work that our method builds upon directly. Section 4 explains the methods developed in this paper, while Section 5 presents the performance of the method on both synthetic and real data. The ultimate section concludes with a discussion and proposals of future work.

2 Bayesian optimisation for likelihood-free inference

A key insight to gaining accelerated likelihood-free inference was obtained by replacing standard random draws from an importance distribution by efficient acquisitions delivered by the probabilistic Bayesian optimization (Gutmann and Corander, 2016). An important feature of Bayesian optimization is that is stops suggesting proposals from a poorly representative area in the parameter space as soon as it has gained sufficient level of certainty that the objective function values there are inferior to other areas. The Bayesian optimisation for Likelihood Free Inference (BOLFI) method (Gutmann and Corander, 2016) asserts a Gaussian Process (GP) prior on the discrepancy function , considered as a function of the parameter space , parametrized with mean and covariance functions and

(1)

As with previous ABC methods, estimates of are derived from simulations through a norm of the difference of summary statistics evaluated on the simulated and observed data, i.e.

. The use of a GP prior and the consequent posterior distribution over the discrepancy function learned in an online fashion by running the simulator model sequentially allows for the use of a Bayesian optimisation algorithm for the purpose of efficiently selecting parameter values for the forward simulations. Consequently, BOLFI makes very efficient use of computational resources when the simulations are expensive. After all simulations have been acquired, a kernel density estimate (KDE) is then used to construct a proxy likelihood

from the GP posterior. For a uniform kernel with a specific bandwidth, this is proportional to the probability of the discrepancy being lower than a threshold

often taken to be the minimum discrepancy

(2)

Bayesian optimisation falls within the field of probabilistic numerics, in which probabilistic models are used to effectively solve problems traditionally considered within the realm of numerics, such as optimisation and quadrature (Hennig et al., 2015). For Bayesian optimisation, the posterior properties of a GP model trained on evaluations of an objective function are used to perform efficient acquisition of new evaluation points (Shahriari et al., 2016)

. Such an approach relies on the choice of a heuristic acquisition function, which encodes some probabilistic principle to reduce the uncertainty of the location of the optimum of the function. A commonly used acquisition function is the Lower Confidence Bound (LCB), composed of the mean

and marginal standard deviation

of the GP posterior, and a tradeoff parameter

(3)

However, also several other acquisition functions have been more recently considered for ABC, as the accuracy of the resulting posterior approximation may be influenced by the particular choice (Järvenpää et al., 2019a). The software package ELFI (Lintusaari et al., 2018)

offers users a choice between the LCB and expected integrated variance loss function introduced by

(Järvenpää et al., 2019a). which was found to be particularly accurate, although it may require an elevated computational effort. Variations on the basic Bayesian optimisation algorithm have been developed, including multi-task variations and methods designed for high-dimensional parameter spaces (Swersky et al., 2013). Such high-dimensional variations generally rely on different kinds of sparsity assumptions. (Kandasamy et al., 2015) relies on an additive kernel structure to increase stability in high-dimensions, the rationale stemming from the fact that given the linear property of GPs, this is equivalent to assuming that the objective function is composed of a sum of one-dimensional functions, each associated with each parameter. This also has the effect of separating certain acquisition function into a sum of independent acquisition functions across each dimension, making the acquisition optimisation more tractable as

(4)

3 Misspecification and exponentiated loss likelihoods

Misspecification remains a central problem for Bayesian inference in general and the likelihood-free context is not an exception, where much of the existing work is dedicated to developing effective techniques for diagnosing such an issue

(Frazier et al., 2017; Lee et al., 2018; Thomas and Corander, 2019).

A central piece in the previous research concerned with misspecification has introduced a new method for belief updates based on Bayesian principles, only assuming the existence of a loss function rather than an explicit likelihood, in a bid to avoid model misspecification (Bissiri et al., 2016). The use of discrepancies within a likelihood-free context is clearly reminiscent of the use of a loss function

, a concept central to the derivation of Bayes’ Theorem from decision-theoretic principles. Such reasoning concludes that, in the absence of a true likelihood, an exponentiation of the negative loss with some tempering

should be used for a belief update

(5)

Further work also approaches the exponentiated negative loss likelihood within a Safe Bayesian approach to avoid misspecifcation, offering a principled method to select the learning rate parameter (Grünwald, 2012).

A similar proxy likelihood is used by the Probably Approximately Correct (PAC) Bayes community, which aims to perform inference only given a loss evaluated on the observed data, without assuming an explicit model. The exponentiated negative loss belief distribution is also derived from different principles. Such an approach has been extended in a thorough and principled way to ABC, providing non-asymptotic guarantees of convergence, even under misspecification (Ridgway, 2017).

4 Bayesian optimization and exponentiated losses

Our work aims to extend existing likelihood-free inference to larger -dimensional parameter spaces , by building on the BOLFI methodology. This is achieved by using independent GP priors on the discrepancies defined for each of the parameters, and by using an exponentiated loss for likelihood characterisation. We call this modular version of the BOLFI methodology Split-BOLFI.

The additive separation of discrepancies between parameters with an exponentiated loss likelihood corresponds to an assumption that the discrepancy function can be decomposed into the sum of separate discrepancies , each informative to the th parameter. These in turn are modelled by proxy discrepancies , solely as functions of corresponding parameters , implying in a single GP proxy for with additive mean and covariance structure

(6)
(7)

Additionally, assuming separate models for the discrepancies brings about separate acquisition functions, reducing the -dimensional problem of acquisition to

1-dimensional optimisation problems. It is worthwhile noting that the acquisitions and hyperparameter inference for each GP can be parallelised in a straightforward manner, which brings a clear computational advantage with this approach.

Another substantial difference from the previous BOLFI approach is a different method for likelihood characterisation given a GP posterior for the discrepancy. Previous versions used the expectation of a KDE with a uniform kernel. This paper uses a KDE with an exponential kernel, which is equivalent to the exponentiated loss likelihood method discussed above, with a tempering constant

(8)

It is possible to substitute the loss with the discrepancy , to within a multiplicative constant describing the change in lengthscale between the two quantities, i.e. . It is then necessary to propagate the uncertainty on the GP discrepancy proxy through the exponential link function. The marginal distribution at any evaluation of the posterior proxy

will be that of a log-normal distribution. It is then possible to use properties of the log-normal distribution to successfully propagate the uncertainty. Using the median of the marginal log-normal distribution, we now present a likelihood proxy

(9)

where is the posterior mean of the surrogate GP discrepancy function.

An implication of using an exponential kernel with an additive discrepancy is the factorisation of the likelihood approximation across parameters. The additive structure present in the discrepancy becomes a multiplicative structure in the likelihood proxy through the use of an exponential mechanism. Consequently the posterior approximations derived from each separate discrepancy component can be interpreted as marginal distributions of a joint distribution, which is not necessarily the case with other posterior characterisation methods.

It is also not necessary to use the same for each component of the additive discrepancy, so each parameter is assigned its own

(10)

This reduces the problem of characterising the pseudo posterior to a set of one-dimensional problems, effectively avoiding high-dimensional sampling. We illustrate this separable structure in Figure 1.

(a)
(b)
(c)
(d)
(e)
(f)
Figure 1: Example additive discrepancies in two dimensions bringing about factorisable likelihood through the use of an exponential mechanism.

The second implication of the modular structure is to increase the robustness to misspecification of the inference procedure as the use of an exponential kernel can be derived from principles designed to reduce the probability of an overly confident update with an incomplete generative model, as reviewed in Section 3.

Given a discrepancy proxy function derived by the GP , it is still necessary to specify the parameters and to derive a corresponding posterior proxy. A value of corresponds to a standard Bayesian update with a loss . Smaller values of can be used in situations where a tempered update is desirable. We use a similar method to the original BOLFI method to establish . The minimum values of each mean discrepancy gives an indication of both the lengthscale variation of the discrepancy function varies and the degree of misspecifcation of the model.

Consequently, the use of in place of has some desirable properties, illustrated in Figure 2.

  • If the variation in is small relative to , the model is probably poorly specified and we should be cautious doing a full belief update. In this context, the factor can be interpreted as both providing the appropriate likelihood tempering for a misspecified model, as well as providing the approriate rescaling between the discrepancy and the loss function.

  • If the summary statistics and hence discrepancy are defined with an arbitrary lengthscale, then the factor of will help to normalize the discrepancy to vary on a range natural to the information present in the data.

  • The value of can be calculated in a straightforward manner by minimizing each of the marginal discrepancies independently to find each .

In practice, we use as occasionally the GP posterior mean can dip below zero, making hard to interpret. In these situations, we use the minimum observed discrepancy value instead of the minimum discrepancy posterior value in the above approach. The value of remains a free parameter to be investigated later, but from our experiments it appears that the problems of lengthscale characterisation and misspecification tempering are both satisfactorily addressed through .

(a)
(b)
(c)
(d)
Figure 2: Discrepancies and proxy likelihood for inference for the mean of a univariate Gaussian, with true value of zero. The true discrepancy is transformed according to , to mimic the situation of misspecification. The “misspecified” models with large exhibit heavily tempered proxy likelihoods, while do the models with smaller variation relative to are assigned more moderate tempering.

There is a connection between Split-BOLFI and Copula-ABC, which uses separate discrepancies for each parameter and also pairwise combinations of parameters. The original formulation of the latter algorithm performed separate rounds of simulation to characterize the discrepancy associated with each parameter. In practice, a large pool of simulations from the prior is reused to characterize each of the separate discrepancies. The use of samples from the prior might be expected to be suboptimal from the computational perspective, so we expect the use of marginal Bayesian optimisations to significantly improve the acquisition efficiency.

5 Examples and results

In this section, we present the performance of the Split-BOLFI method on several high-dimensional simulator models, including comparisons with an established likelihood-free sampling method designed for high dimensions. We begin with a multivariate Gaussian example with an easily accessible analytical posterior distribution, followed by a graphical vector autoregressive (GVAR) model designed for time-series data. Finally, we demonstrate Split-BOLFI with a high-dimensional model designed for capturing competition dynamics between different species/strains of bacteria in a daycare transmission setting, presenting interpretable and biologically sound insights into the data.

The implementation of Split-BOLFI used in the simulations is available in ELFI (Engine for Likelihood-Free Inference), a statistical software package coded in python for likelihood-free inference (Lintusaari et al., 2018). ELFI is a practical and flexible library, providing a general computational framework featuring implementations of widely-used inference algorithms, including BOLFI, SMC-ABC and LFIRE (Kokko et al., 2019)

. It is distributed under an open source BSD-3 license, and is supported from the webpage

elfi.ai.

5.1 Multidimensional Gaussian

We start with a simple example for estimating the mean vector of a Gaussian distribution. The data are sampled from a

-dimensional multivariate Gaussian distribution with known covariance matrix equal to the identity matrix. The summary statistics used are the means of each dimension of the simulated data and since the parameter elements are independent, this is an ideal situation for a factorized inference method like Split-BOLFI.

We judge the success of the algorithm by whether the mean and mode of the returned proxy posterior are close to the true generating value of the means of the Gaussian data and the true analytical posterior which are accessible in this simple case. We also assess the standard deviation of the posterior proxy compared to the true posterior, and the symmetrized Kullback-Leibler distance between the proxy posterior and the analytical posterior. We use the analytical true posterior to assess strategies for selecting the parameters and described in Section 4.

We also compare to a high-dimensional ABC algorithm similar to Copula ABC. A large pool of simulations from the prior were used to draw samples from the marginal distributions defined by the same discrepancies used by Split-BOLFI. We are comparing the accuracies of the estimates of the marginal distributions, so the Gaussian copula step of Copula-ABC was not used for the pairwise joint distributions between parameters. We aim to use comparable numbers of simulations with the experiments performed with each method, since there is a clear trade-off between the quality and quantity of the accepted samples when performing rejection ABC with a fixed number of simulations. We present results here with varying numbers of accepted samples and quantile thresholds to make this trade-off as clear as possible. The total number of simulations necessary for an ABC run, given a predetermined quantile threshold and number of accepted samples are shown in Table

A.6.

Split-BOLFI was used to estimate posteriors for the multidimensional Gaussian. Matérn kernels were used for the kernel components, with Gamma(2, 2) hyperpriors for the lengthscales and exp(1) hyperpriors for the kernel variances. A one-dimensional example in Figure

3 presents the GP discrepancy, posterior proxy, and acquisition of samples for , demonstrating efficient use of computational resources around the posterior mode. Further, Figure A.1 similarly presents discrepancies, proxies and evaluations with for mean parameters of a 25-dimensional Gaussian model. Inference was successfully performed for higher-dimensional models, not shown visually here due to space considerations.

The average RMSE results and standard deviations of the posterior proxy means and modes are presented in Table 1. The RMSEs are evaluated against the parameter value that generated the observed data under “Generating value”, and against the mean of the exact analytical posterior under “Exact Posterior mean”. Given the relatively small size of the simulated data set (), the posterior mean may actually not be expected to lie close to the true generative value. Split-BOLFI was effective at identifying the true generative mean with a small number of simulations for up to 100-dimensional distributions, as can be seen in Table 1. Comparing against the analytical posterior mean, which is the distribution to which the likelihood proxy would be expected to converge with increasing simulations, the RMSE appears to asymptote to zero, suggesting that the likelihood proxy is converging in expectation to the analytical posterior.

Copula-ABC was also used to perform inference for high-dimensional Gaussian model. We compare Split-BOFLI to Copula-ABC with a similar number of simulations, but also use Copula-ABC using a large number of samples and a small threshold as a near-gold-standard for the posterior, conditioned on the summary statistics. It is clear that Copula-ABC converges to the same expected value of the parameter, asymptoting with more simulations to a constant RMSE against the true parameter value in Table A.1 and to zero for the analytical posterior mean in Table A.2.

The RMSEs for Copula-ABC and Split-BOLFI are visually represented in Figure 4 The characterisation of the posterior mean converges more slowly with the number of simulations for Copula-ABC, suggesting that Split-BOLFI makes more efficient usage of simulations. The posterior mean is poorly characterized for very small numbers of samples or large quantile thresholds.

We present the standard deviation of the derived samples in Table A.3, showing convergence to the analytical posterior value of 0.1 for small quantiles and large number of accepted samples. We see in Table 1 that Split-BOLFI also approaches the true value of the posterior standard deviation from above with increasing number of acquisitions, although the proxy standard deviation was heavily determined by the tempering parameters and .

We addressed the question of how to choose the parameter

for constructing a likelihood proxy by comparing the symmetrized Kullback-Leibler divergence between the constructed likelihood proxy and the true posterior under a uniform prior. We see from Table

2 that a value of , combined with , minimizes the symmetrized Kullback-Leibler divergence within an order of magnitude. This suggests that the use of is reasonable to temper the likelihood without a further parameter . It is possible that a value of between 1 and 10 is strictly optimal for this specific example, but in the interest of simplicity and general applicability, we use a value of for likelihood characterisation for all further experiments.

Generating value Exact Posterior mean
Mean Mode Mean Mode s.d.
50 0.1036 0.1083 0.02297 0.03923 0.1717
100 0.1016 0.1070 0.01338 0.03461 0.1558
250 0.1003 0.1040 0.008130 0.03139 0.1453
Table 1: Mean RMSEs for posterior proxy means and modes with respect to the true generative parameter value and the analytical posterior mean, alongside the proxy posterior standard deviations. Results are averaged over 10 random seeds and dimensionalities of 5, 10, 50 and 100, for a multivariate Gaussian model with acquisition points and tempering =1.
.01 .1 1 10 100
50 229.97 40.548 0.4162 1.2625 15.606
100 228.42 35.329 0.2304 1.1349 14.302
250 225.82 31.904 0.1536 1.3034 15.965
Table 2: Mean proxy posterior symmetrized Kullback Leibler distances with the analytical posterior, averaged over 10 random seeds and dimensions for a multivariate Gaussian model, and with acqusition points and tempering parameter .
Figure 3: GP discrepancies in blue and proxy posteriors in red for increasing numbers of black acqusition points .
(a) RMSE true parameter value
(b) RMSE analytical posterior mean
Figure 4: RMSEs of the ABC with various quantile thresholds and Split-BOLFI evaluated against the true parameter value and the analytical posterior mean, against total number of data sets simulated. Results are averaged over 10 random seeds and dimensionalities of 5, 10, 50 and 100 for a multivariate Gaussian model.

5.2 Graphical Vector Autoregression

We progress to a more complex example of sparsely parametrized graphical vector autoregression (GVAR), a time series model capturing the correlation structure between multiple variables changing with time (Corander and Villani, 2006; Lütkepohl, 2005). We design a sparse GVAR model, such that we simulate a number of variables from a noise parameter and a sparse transition matrix with known sparsity structure but unknown parameter values. The transition matrix is designed to have negative unity diagonal values, and off-diagonal interactions assigned such that each variable is coupled to exactly one other variable, with a transition matrix element absolute value of less than one to ensure the simulation stability.

Figure 5: Example simulations from GVAR models with three and ten observed variables simulated for 50 time points.

The dynamic equations updating the observation over a timestep to are as follows:

(11)
(12)

The model is parametrized by the components of the transition matrix and the noise parameter . This model has a tractable likelihood, the structure of which motivates the choice of some informative summary statistics. The summary statistics used were the vector autocorrelation between the observed variables, with a lag of 1. For each of the transfer matrix parameters, the lagged correlation between the corresponding observable dimension was used, while the summary for the noise parameter is the sum over those used for all of the transition matrix parameters.

True value
Mean Mode s.d.
50 0.07001 0.07694 0.1031
100 0.06156 0.06144 0.09618
250 0.05728 0.05640 0.08901
Table 3:

Mean RMSEs for posterior proxy means and modes with respect to the true generative parameter value, alongside the proxy posterior standard deviations. Results are averaged over 10 random seeds and dimensionalities of 6, 21 and 101, for a Gaussian vector autoregressive model with

acquisition points and tempering =1.
Figure 6: RMSEs of ABC with various quantile thresholds and Split-BOLFI evaluated against the true parameter value, against total number of data sets simulated. Results are averaged over 10 random seeds and dimensionalities of 6, 21 and 101 for a Gaussian Vector Autoregressive model.

We performed experiments with 500 time steps and noise , with non-zero elements of

drawn from a uniform distribution between

. Examples of simulated time-series are shown in Figure 5. Uniform priors in the interval were used for the unknown values of and a uniform prior in the interval was used for the noise term . Matérn kernels were used for the kernel components, with Gamma(2, 2) hyperpriors for the lengthscales and exp(1) hyperpriors for the kernel variances.

Inference was performed over models with different sized number of observed variables, of size 5, 20 and 100, which if we include the noise parameter , correspond to parameter spaces of size 6, 21 and 101, respectively. Split-BOLFI was performed with , and acquisitions with ten random seeds, and Copula-ABC was performed to generate 1, 2, 5, 10 and 50 samples with quantile thresholds of 0.1, 0.05, 0.02, 0.01, 0.004 and 0.005.

We present RMSEs and posterior standard deviations derived from Split-BOLFI and Copula-ABC in Tables 3, A.4 and A.5. The RMSE results are represented visually in Figure 6. We see that Split-BOLFI achieves substantially smaller RMSEs for a given number of simulations compared to Copula-ABC. The very high rejection rate simulations of Copula-ABC suggest an average true posterior standard deviation of approximately 0.0731, which we can see the Split-BOLFI method also approaches with increasing number of acquisitions.

For visual reference, we present an example posterior proxy for a GVAR model generated from Split-BOLFI after 250 acquisitions for a 12-dimensional model in Figure A.2. We see that the method clearly converges to the true value of 0.1 for the noise parameter, and similarly exhibits clear convergence for each of the transfer matrix parameters.

5.3 Daycare transmission dynamics competition model

To illustrate the Split-BOLFI method with a challenging high-dimensional simulator model lacking tractable likelihood, we developed an extension of the pneumococcal competition model introduced in (Numminen et al., 2013). Our model is designed to capture the dynamics of bacterial colonization and competition between typical opportunistic bacterial pathogens circulating among young children in daycare centres. Each child can be colonized by a number of strains of bacteria, and each of these strains may compete within each daycare centre to colonize the children who are assumed to interact with one another and to transmit the bacteria further when they have established a stable colonization. The observations at any given time are encoded as a binary matrix , indicating which children are colonized by which pathogens. The vector indicates the background prevalence of each strain in the population, and the model is parametrized with scalars and a symmetric matrix .

The model is defined by a set of transition equations determining the probabilities of a colonization event taking place for a particular strain given the absences and presences of the other strains. The observed data are assumed to have been sampled from an equilibrium state distribution defined by the model parameters. The transition equations are defined as

(13)

where is the standard normal CDF, determines the rate of transmission within the daycare population, determines the rate of transmission from the general population outside the daycare center, and determines the natural clearance rate of a strain within in a host. is a competition matrix that contains the strain-specific pairwise competition parameters , which suppresses the transmission of a strain if strain is already present in each host. A non-zero value of consequently implies competition between the two particular strains.

The parameters can only be estimated relative to one another, hence the parameter is set to a constant equal to one. Consequently, for a number of observed strains , the parameter space of the entire model is dimensional.

We followed the work by (Numminen et al., 2013) and their carefully constructed summary statistics were used for this model which were the Shannon index of diversity of the distribution of observed strains, the number of different strains observed, the prevalence of each strain, the prevalence of multiple colonizations, and the upper co-prevalence matrix of each of the pairs of strains, normalized by their individual prevalence, i.e.

(14)

The sums are calculated over the individual daycare centres and children, and the addition of ensures stability when zero prevalence of a strain is observed. The normalisation should ensure that joint sparsity is not confused for competitiveness. Each of the normalized co-prevalence matrix elements was used for each competition parameter, and the sum of the other summary statistics was used for the other parameters.

The samples from each time point are considered independent draws from the underlying dynamic process and are compared to independently generated equilibrium states from the simulator implementing the transition equations. The observed data contained 20% missing values resulting from specific children being absent during a given week of a nurse visiting the daycare center. The missing data problem was addressed by making a missing at random assumption, such that the summary statistics of the observed data were used for the whole population, which should be a reasonable approximation to the missingness generating mechanism in the observation process.

The study design for the data collection was previously described in (Sá-Leão et al., 2008). Forty-seven children attending a daycare center in Lisbon, Portugal were sampled monthly during one year. The children occupied three rooms of the same daycare centre, and are here treated as a single population. Nasopharyngeal samples were taken eleven times between February 1998 to February 1999. Nasopharyngeal swabs were obtained and inoculated into agar media in order to select and identify Streptococcus pneumoniae, Streptococcus pyogenes, Staphylococcus aureus, Moraxella catarrhalis and Haemophilus influenzae. S. pneumoniae and H. influenzae were selectively cultured in blood agar with gentamicin and chocolate blood agar containing iso-vitalex and bacitracin, respectively. S. aureus was selectively cultured in mannitol-salt agar and S. pyogenes and M. catarrhalis were isolated from trypticase-soy blood agar plates. Pneumococcal identification was based on -hemolysis and optochin susceptibility; H. influenzae were identified based on the requirement of X and V factors for growth; S. aureus was identified on the basis of mannitol fermentation and positive coagulase test; S. pyogenes was identified based on -hemolysis and susceptibility to bacitracin and M. catarrhalis was identified based on colony morphology and positive tributyrin test. Routinely, a single colony of each species was isolated, cultured and frozen. S. pneumococci strains were serotyped by the Quellung reaction using commercially available antisera (Statens Serum Institut, Copenhagen, Denmark).

The presence of H. influenzae, S. pneumoniae, M. catarrhalis, S. aureus and S. pyogenes were recorded, with 14 serotypes of S. pneumoniae also identified and recorded when present. Some of the serotypes were observed very infrequently and were discarded from our analysis, leading to the four most common being retained. These are the serotypes 19F, 23F, 6B and 14, which are common in pre-pneumococcal conjugate vaccine infant populations. These four most common serotypes of S. pneumoniae were considered as distinct strains in the simulations alongside H. influenzae, M. catarrhalis, S. aureus and S. pyogenes. The strains will be referred to using the letters in Table 4 for brevity.

Label Strain name Label Strain name
A H. influenzae E S. pneumoniae, serotype 23F
B M. catarrhalis F S. pyogenes
C S. pneumoniae, serotype 19F G S. pneumoniae, serotype 6B
D S. aureus H S. pneumoniae, serotype 14
Table 4:

The strains used in the high-dimensional data analysis, with labels used for reference.

As a result, we performed inference on a model with 28 competition parameters and the two parameters and . Split-BOLFI used 100 simulator acquisitions to fit GPs to the discrepancies associated with each parameter and to generate posterior proxies for each of the marginal distributions. Matérn kernels were used for the kernel components, with exp(1) hyperpriors for the kernel variances and lengthscales held fixed to a value of 8.

In Figure 7 we present the proxy likelihoods for , , and competition parameters for which most of the mass is away from zero, all of which demonstrate convergence to a discrepancy minimum away from zero, with the Bayesian optimisation algorithm clearly drawing acquisitions from near the posterior mode. The competition parameters posteriors assigning most of the mass close to zero are presented in Figure A.3, demonstrating convergence to a competition parameter value close to zero. The parameters whose corresponding discrepancies do not approach zero have correspondingly more dispersed posterior proxies, demonstrating the misspecification robustness described in Figure 2.

Figure 7: Marginal discrepancies and proxy posteriors for , , and the competition parameters identified as non-negative from a 30-dimensional daycare model trained on real data.

The posteriors which place mass away from a zero value of the competition parameter provide evidence that the corresponding pairs of strains are competitive. Posterior statistics derived from , , and the fifteen competition parameter distributions that put least mass on zero are presented in Table A.7. All of the means and modes of the posterior proxies, and the corresponding pairs of strains are also presented.

The pairs of strains that Split-BOLFI identifies as competitive form a complete subset of the strains considered, with every pairwise competition parameter between M. catarrhalis, S. pyogenes, and each of the serotypes of S. pneumoniae identified as most likely to be non-zero. Our results make sound biological sense in the light of existing literature on colonization competition between successful pneumococcal strain and between the species M. catarrhalis, S. pyogenes Dahlblom and Söderström (2012); Xu et al. (2012); Jourdain et al. (2011); Man et al. (2017); Gjini et al. (2016); Valente et al. (2016). It is also reassuring that the inference has identified a complete subset of strains as being mutually competitive, as such a structure was not a priori encoded into the model parameters and suggests that the inference method is coherently identifying interactions of interest. This result goes beyond previous analyses that were restricted to lower-dimensional spaces and hence considered all of the bacterial strains to be equally competitive with one another Numminen et al. (2013). The extended results here illustrate the potential of the Split-BOLFI approach to infer parameters for models representing complex processes observed in reality.

6 Conclusion

In this work, we have extended the existing BOLFI methodology for likelihood-free inference to high-dimensional parameter spaces, through the use of additively defined discrepancies and exponentiated loss proxy likelihoods. The methodology exhibits efficient use of simulation resources in high-dimensional spaces, targeting parameter acquisitions local to the marginal modes through the use of Bayesian optimisation, and also contains robustness to misspecification through the use of exponentiated loss likelihoods and tempering in the case of large minimum discrepancies. We demonstrate the ability of the algorithm to work well compared to existing methods in 100-dimensional spaces for Gaussian and GVAR models, representing a very high-dimensional space for likelihood-free inference.

We note that the fully factorized structure of the posterior is an assumption that may not hold, but the marginal distributions of the parameters should still be well specified. If there is strong a priori belief that the correlation structure between any specific parameters cannot be ignored, then individual low-dimensional interaction terms can be included within the kernel without overly compromising the computational efficiency of the method. Including pairwise terms in the discrepancy also combines the acquisition functions associated with the corresponding parameters. When wanting to use the same simulations for inference on different parameters, it is necessary to limit the number of correlations considered in order to maintain the independence of the parameter acquisitions for each dimension.

We also stress that the same kernel does not need to be used for the purposes of simulation acquisition and posterior characterization, depending on the structure anticipated. It is possible to use separate GP priors to make efficient parameter acquisitions optimally for the marginal distributions, then use the resulting parameter values and discrepancy evaluations to build a likelihood proxy with a complex correlation structure. The acquisitions will not have been performed completely optimally for modelling the joint distribution, but we consider drawing simulations optimally for marginal characterisation to be a generally reasonable acquisition strategy.

We also demonstrated the ability of the Split-BOLFI framework to work with real data on a generative model relevant to questions in contemporary epidemiology. We successfully perform inference in a 30-dimensional space (also a large parameter space for likelihood-free methods), to identify novel and biologically consistent competition dynamics between pathogens in a real world problem.

Acknowledgements

OT, HP and JC are supported by European Research Council grant [742158] (SCARABEE, Scalable inference algorithms for Bayesian evolutionary epidemiology). RS-L and HdL were supported Projects LISBOA-01-0145-FEDER-007660 (Microbiologia Molecular, Estrutural e Celular, funded by FEDER funds through COMPETE2020 - Programa Operacional Competitividade e Internacionalização (POCI) and LISBOA-01-0145-FEDER-016417 (ONEIDA project, co-funded by FEEI - ”Fundos Europeus Estruturais e de Investimento” from ”Programa Operacional Regional Lisboa 2020”).

7 Supplementary Tables

0.1 0.05 0.02 0.01 0.005 0.004
1 0.5719 0.3498 0.1969 0.1540 0.1458 0.1447
2 0.3461 0.1991 0.1408 0.1287 0.1205 0.1177
5 0.1770 0.1280 0.1177 0.1117 0.1078 0.1084
10 0.1340 0.1156 0.1091 0.1067 0.1046 0.1052
50 0.1075 0.1043 0.1026 0.1017 0.1019 0.10177955
Table A.1: Mean RMSEs for ABC means, with respect to the true generative parameter value, averaged over 10 random seeds and dimensionalities of 5, 10, 50 and 100, for a multivariate Gaussian model, with accepted samples and quantile threshold .
0.1 0.05 0.02 0.01 0.005 0.004
1 0.5580 0.3299 0.1712 0.1236 0.1105 0.1110
2 0.3254 0.1698 0.09587 0.07689 0.07407 0.07206
5 0.1536 0.08255 0.05593 0.04888 0.04500 0.04404
10 0.1040 0.05632 0.03987 0.03226 0.03167 0.03084
50 0.04341 0.02533 0.01638 0.01471 0.01455 0.01493
Table A.2: Mean RMSEs for ABC means, with respect to the analytical posterior mean, averaged over 10 random seeds and dimensionalities of 5, 10, 50 and 100, for a multivariate Gaussian model, with accepted samples and quantile threshold .
0.1 0.05 0.02 0.01 0.005 0.004
1 - - - - - -
2 0.3049 0.1688 0.1155 0.09399 0.08447 0.08137
5 0.3312 0.1878 0.1157 0.1016 0.09773 0.09616
10 0.3148 0.1806 0.1155 0.1034 0.09834 0.09875
50 0.3072 0.1758 0.1152 0.1029 0.09981 0.09955
Table A.3:

Standard deviations of ABC samples as unbiased estimates of the analytical posterior standard deviation, averaged over 10 random seeds and dimensionalities of 5, 10, 50 and 100, for a multivariate Gaussian model, with

accepted samples and quantile threshold .
0.1 0.05 0.02 0.01 0.005 0.004
1 0.1617 0.1325 0.1160 0.1110 0.1139 0.1182
2 0.1217 0.1009 0.09933 0.09993 0.09918 0.1033
5 0.09393 0.08979 0.09140 0.09234 0.09185 0.09016
10 0.08823 0.08775 0.08770 0.08732 0.08726 0.08515
50 0.08538 0.08392 0.08183 0.08399 0.08370 0.08418
Table A.4: Mean RMSEs for ABC means, with respect to the true generative parameter value, averaged over 10 random seeds and dimensionalities of 6, 21 and 101, for a GVAR model, with accepted samples and quantile threshold .
0.1 0.05 0.02 0.01 0.005 0.004
1 - - - - - -
2 0.08732 0.06210 0.05751 0.05879 0.05732 0.05756
5 0.08952 0.07464 0.06821 0.06966 0.06896 0.06833
10 0.09083 0.07547 0.07031 0.06970 0.07001 0.07106
50 0.09259 0.07847 0.07362 0.07334 0.07347 0.07310
Table A.5: Standard deviations of ABC samples as unbiased estimates of the analytical posterior standard deviation, averaged over 10 random seeds and dimensionalities of 6, 21 and 101, for a GVAR model, with accepted samples and quantile threshold .
0.1 0.05 0.02 0.01 0.005 0.004
1 10 20 50 100 200 250
2 20 40 100 200 400 500
5 50 100 250 500 1000 1250
10 100 200 500 1000 2000 2500
50 500 1000 2500 5000 10000 12500
Table A.6: Total number of simulations used to generate accepted samples with a given quantile threshold, i.e. .
mean mode sd strains
5.3644 5.51 2.6835
6.7477 7.81 2.3101
2.5580 3.000 0.3899 G, H
2.3527 3.000 0.5885 C, G
2.3031 2.463 0.4479 E, F
2.2450 3.000 0.5194 E, G
2.2158 2.796 0.5357 E, H
2.1243 2.112 0.4182 F, H
2.0545 1.950 0.4451 F, G
1.9856 1.755 0.5720 B, F
1.9669 1.827 0.6086 B, C
1.6974 2.745 0.7928 C, F
1.6774 1.515 0.7175 B, H
1.6634 1.260 0.7514 B, G
1.6528 1.236 0.7124 B, E
1.5215 1.404 0.8089 C, H
1.5053 1.776 0.5295 C, E
Table A.7: Means, modes, standard deviations of , , and competition parameters of pairs of strains with posteriors that put least mass on zero, and as such can be considered to be competitive

8 Supplementary Figures

Figure A.1: GP discrepancies in blue and proxy posteriors in red with black acqusition points for each of the mean parameters in a 25-dimensional multivariate Gaussian model.
Figure A.2: Marginal discrepancies and proxy posteriors for a GVAR model with 11 transfer matrix parameters and one noise parameter.
Figure A.3: Marginal discrepancies and proxy posteriors for the competition parameters identified as non-competitive from a 30-dimensional daycare model trained on real data.

References

  • M. A. Beaumont, J. Cornuet, J. Marin, and C. P. Robert (2009) Adaptive approximate bayesian computation. Biometrika 96 (4), pp. 983–990. Cited by: §1.
  • P. G. Bissiri, C. C. Holmes, and S. G. Walker (2016) A general framework for updating belief distributions. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 78 (5), pp. 1103–1130. Cited by: §3.
  • E. Cameron and A. Pettitt (2012) Approximate bayesian computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift. Monthly Notices of the Royal Astronomical Society 425 (1), pp. 44–65. Cited by: §1.
  • J. Corander, C. Fraser, M. U. Gutmann, B. Arnold, W. P. Hanage, S. D. Bentley, M. Lipsitch, and N. J. Croucher (2017) Frequency-dependent selection in vaccine-associated pneumococcal population dynamics. Nature ecology & evolution 1 (12), pp. 1950. Cited by: §1.
  • J. Corander and M. Villani (2006) A bayesian approach to modelling graphical vector autoregressions. Journal of Time Series Analysis 27 (1), pp. 141–156. Cited by: §5.2.
  • K. Cranmer, J. Pavez, and G. Louppe (2015) Approximating likelihood ratios with calibrated discriminative classifiers. arXiv preprint arXiv:1506.02169. Cited by: §1.
  • V. Dahlblom and M. Söderström (2012) Bacterial interactions in the nasopharynx–effects of host factors in children attending day-care centers. Journal of infection and public health 5 (2), pp. 133–139. Cited by: §5.3.
  • P. Del Moral, A. Doucet, and A. Jasra (2012) An adaptive sequential monte carlo method for approximate bayesian computation. Statistics and Computing 22 (5), pp. 1009–1020. Cited by: §1.
  • C. C. Drovandi, A. N. Pettitt, and A. Lee (2015) Bayesian indirect inference using a parametric auxiliary model. Statistical Science, pp. 72–95. Cited by: §1.
  • C. C. Drovandi and A. N. Pettitt (2011) Estimation of parameters for macroparasite population evolution using approximate bayesian computation. Biometrics 67 (1), pp. 225–233. Cited by: §1.
  • Y. Fan, D. J. Nott, and S. A. Sisson (2013) Approximate bayesian computation via regression density estimation. Stat 2 (1), pp. 34–48. Cited by: §1.
  • D. T. Frazier, C. P. Robert, and J. Rousseau (2017) Model misspecification in abc: consequences and diagnostics. arXiv preprint arXiv:1708.01974. Cited by: §3.
  • E. Gjini, C. Valente, R. Sa-Leao, and M. G. M. Gomes (2016) How direct competition shapes coexistence and vaccine effects in multi-strain pathogen systems. Journal of Theoretical Biology 388, pp. 50–60. Cited by: §5.3.
  • I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial networks. arXiv preprint arXiv:1406.2661. Cited by: §1.
  • P. Grünwald (2012) The safe bayesian. In International Conference on Algorithmic Learning Theory, pp. 169–183. Cited by: §3.
  • M. U. Gutmann and J. Corander (2016) Bayesian optimization for likelihood-free inference of simulator-based statistical models.

    The Journal of Machine Learning Research

    17 (1), pp. 4256–4302.
    Cited by: §1, §2.
  • M. U. Gutmann, R. Dutta, S. Kaski, and J. Corander (2018) Likelihood-free inference via classification. Statistics and Computing 28 (2), pp. 411–425. Cited by: §1.
  • P. Hennig, M. A. Osborne, and M. Girolami (2015) Probabilistic numerics and uncertainty in computations. Proc. R. Soc. A 471 (2179), pp. 20150142. Cited by: §2.
  • J. Hermans, V. Begy, and G. Louppe (2019) Likelihood-free mcmc with approximate likelihood ratios. arXiv preprint arXiv:1903.04057. Cited by: §1.
  • Y. Itan, A. Powell, M. A. Beaumont, J. Burger, and M. G. Thomas (2009) The origins of lactase persistence in europe. PLoS computational biology 5 (8), pp. e1000491. Cited by: §1.
  • M. Järvenpää, M. U. Gutmann, A. Pleska, A. Vehtari, P. Marttinen, et al. (2019a) Efficient acquisition rules for model-based approximate bayesian computation. Bayesian Analysis 14 (2), pp. 595–622. Cited by: §2.
  • M. Järvenpää, M. R. A. Sater, G. K. Lagoudas, P. C. Blainey, L. G. Miller, J. A. McKinnell, S. S. Huang, Y. H. Grad, and P. Marttinen (2019b) A bayesian model of acquisition and clearance of bacterial colonization incorporating within-host variation. PLoS computational biology 15 (4), pp. e1006534. Cited by: §1.
  • S. Jourdain, P. Smeesters, O. Denis, M. Dramaix, V. Sputael, X. Malaviolle, L. Van Melderen, and A. Vergison (2011) Differences in nasopharyngeal bacterial carriage in preschool children from different socio-economic origins. Clinical microbiology and infection 17 (6), pp. 907–914. Cited by: §5.3.
  • K. Kandasamy, J. Schneider, and B. Póczos (2015) High dimensional bayesian optimisation and bandits via additive models. In International Conference on Machine Learning, pp. 295–304. Cited by: §2.
  • J. Kokko, U. Remes, O. Thomas, H. Pesonen, and J. Corander (2019) PYLFIRE: python implementation of likelihood-free inference by ratio estimation. Wellcome Open Research 4 (197), pp. 197. Cited by: §1, §5.
  • J. E. Lee, G. K. Nicholls, and R. J. Ryder (2018) Calibration procedures for approximate bayesian credible sets. arXiv preprint arXiv:1810.06433. Cited by: §3.
  • J. Li, D. J. Nott, Y. Fan, and S. A. Sisson (2017) Extending approximate bayesian computation methods to high dimensions via a gaussian copula model. Computational Statistics & Data Analysis 106, pp. 77–89. Cited by: §1.
  • J. Lintusaari, M. U. Gutmann, R. Dutta, S. Kaski, and J. Corander (2017) Fundamentals and recent developments in approximate bayesian computation. Systematic biology 66 (1), pp. e66–e82. Cited by: §1.
  • J. Lintusaari, H. Vuollekoski, A. Kangasrääsiö, K. Skytén, M. Järvenpää, P. Marttinen, M. U. Gutmann, A. Vehtari, J. Corander, and S. Kaski (2018) Elfi: engine for likelihood-free inference. The Journal of Machine Learning Research 19 (1), pp. 643–649. Cited by: §2, §5.
  • H. Lütkepohl (2005) New introduction to multiple time series analysis. Springer Science & Business Media. Cited by: §5.2.
  • W. H. Man, W. A. de Steenhuijsen Piters, and D. Bogaert (2017) The microbiota of the respiratory tract: gatekeeper to respiratory health. Nature Reviews Microbiology 15 (5), pp. 259. Cited by: §5.3.
  • J. Marin, P. Pudlo, C. P. Robert, and R. J. Ryder (2012) Approximate bayesian computational methods. Statistics and Computing 22 (6), pp. 1167–1180. Cited by: §1.
  • S. Mohamed and B. Lakshminarayanan (2016) Learning in implicit generative models. arXiv preprint arXiv:1610.03483. Cited by: §1.
  • E. Numminen, L. Cheng, M. Gyllenberg, and J. Corander (2013) Estimating the transmission dynamics of streptococcus pneumoniae from strain prevalence data. Biometrics 69 (3), pp. 748–757. Cited by: §5.3, §5.3, §5.3.
  • V. M. Ong, D. J. Nott, M. Tran, S. A. Sisson, and C. C. Drovandi (2018a) Likelihood-free inference in high dimensions with synthetic likelihood. Computational Statistics & Data Analysis 128, pp. 271–291. Cited by: §1.
  • V. M. Ong, D. J. Nott, M. Tran, S. A. Sisson, and C. C. Drovandi (2018b) Variational bayes with synthetic likelihood. Statistics and Computing 28 (4), pp. 971–988. Cited by: §1.
  • G. Papamakarios and I. Murray (2016) Fast -free inference of simulation models with bayesian conditional density estimation. In Advances in Neural Information Processing Systems, pp. 1028–1036. Cited by: §1.
  • G. Papamakarios, D. C. Sterratt, and I. Murray (2018) Sequential neural likelihood: fast likelihood-free inference with autoregressive flows. arXiv preprint arXiv:1805.07226. Cited by: §1.
  • M. T. Pratola, S. R. Sain, D. Bingham, M. Wiltberger, and E. J. Rigler (2013) Fast sequential computer model calibration of large nonstationary spatial-temporal processes. Technometrics 55 (2), pp. 232–242. Cited by: §1.
  • L. F. Price, C. C. Drovandi, A. Lee, and D. J. Nott (2018) Bayesian synthetic likelihood. Journal of Computational and Graphical Statistics 27 (1), pp. 1–11. Cited by: §1.
  • J. Ridgway (2017) Probably approximate bayesian computation: nonasymptotic convergence of abc under misspecification. arXiv preprint arXiv:1707.05987. Cited by: §3.
  • R. Sá-Leão, S. Nunes, A. Brito-Avô, C. R. Alves, J. A. Carriço, J. Saldanha, J. S. Almeida, I. Santos-Sanches, and H. de Lencastre (2008)

    High rates of transmission of and colonization by streptococcus pneumoniae and haemophilus influenzae within a day care center revealed in a longitudinal study

    .
    Journal of clinical microbiology 46 (1), pp. 225–234. Cited by: §5.3.
  • M. Secrier, T. Toni, and M. P. Stumpf (2009) The abc of reverse engineering biological signalling systems. Molecular BioSystems 5 (12), pp. 1925–1935. Cited by: §1.
  • B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas (2016) Taking the human out of the loop: a review of bayesian optimization. Proceedings of the IEEE 104 (1), pp. 148–175. Cited by: §2.
  • P. Shen, J. A. Lees, G. C. W. Bee, S. P. Brown, and J. N. Weiser (2019) Pneumococcal quorum sensing drives an asymmetric owner–intruder competitive strategy during carriage via the competence regulon. Nature microbiology 4 (1), pp. 198. Cited by: §1.
  • U. Simola, B. Pelssers, D. Barge, J. Conrad, and J. Corander (2019) Machine learning accelerated likelihood-free event reconstruction in dark matter direct detection. Journal of Instrumentation 14 (03), pp. P03004. Cited by: §1.
  • S. A. Sisson, Y. Fan, and M. Beaumont (2018) Handbook of approximate bayesian computation. Chapman and Hall/CRC. Cited by: §1.
  • K. Swersky, J. Snoek, and R. P. Adams (2013) Multi-task bayesian optimization. In Advances in neural information processing systems, pp. 2004–2012. Cited by: §2.
  • O. Thomas and J. Corander (2019) Diagnosing model misspecification and performing generalized bayes’ updates via probabilistic classifiers. arXiv preprint arXiv:1912.05810. Cited by: §3.
  • O. Thomas, R. Dutta, J. Corander, S. Kaski, and M. U. Gutmann (2016) Likelihood-free inference by ratio estimation. arXiv preprint arXiv:1611.10242. Cited by: §1.
  • C. Valente, J. Hinds, K. A. Gould, F. R. Pinto, H. de Lencastre, and R. Sá-Leão (2016) Impact of the 13-valent pneumococcal conjugate vaccine on streptococcus pneumoniae multiple serotype carriage. Vaccine 34 (34), pp. 4072–4078. Cited by: §5.3.
  • S. N. Wood (2010) Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466 (7310), pp. 1102. Cited by: §1.
  • Q. Xu, A. Almudervar, J. R. Casey, and M. E. Pichichero (2012) Nasopharyngeal bacterial interactions in children. Emerging infectious diseases 18 (11), pp. 1738. Cited by: §5.3.