1 Introduction
Integrals of expensivetoevaluate functions arise in many scientific and industrial applications, for example when complex simulations have uncertain inputs and we wish to estimate the expected outcome. This could be a fluid dynamics model, e.g., in meteorology or astrophysics, in which every function query corresponds to running a simulation on a supercomputer. Within reasonable budget, integration using Monte Carlo may not be feasible and alternative numerical integration schemes are needed that require fewer simulation runs (i.e., function evaluations). Bayesian quadrature (
bq)—a means of constructing posterior measures over the unknown value of the integral OHagan1991, Diaconis1988,Briol2015b—mitigates a high sample demand by encoding known or assumed structure of the integrand such as smoothness or regularity, usually via a Gaussian process (gp) measure. With its increased ‘data’^{1}^{1}1See e.g., Hennig2015 and Cockayne2017 for a discussion on ‘data’ in numerical solvers. efficiency, bq is a natural choice in a setting where function evaluations are precious Rasmussen2003. In the past, bqhas been applied in reinforcement learning Paul2018, filtering Kersting2016, and has been extended to encode properties of probabilistic integrals Osborne2012b,Osborne2012a,Gunter2014,Chai2018.
A complementary approach to integrating costly functions is to make use of related, cheaper information sources about the integrand of interest. The task of finding approximations to computationally demanding numerical models is an area of active research all on its own (see e.g., Benner2017). Examples of information sources of diverse cost and quality include numerical models that are run at different resolutions, computational models that are approximated by an analytical expression, or different kinds of models for the same target. Multisource modeling is a statistical technique for harvesting information from related functions by constructing correlated surrogates over multiple sources. When the information sources are hierarchical in that they are ordered from most to least informative, this concept is known as multifidelity modeling Kennedy2000,Peherstorfer2018,Forrester2007,LeGratiet2012. The notion of multisource models is more overarching and includes settings in which sources do not exhibit an easily identifiable order, if any. Each of the sources has its own cost function that quantifies the cost of evaluating the source at a certain input. Integrands with inputdependent cost arise for example when querying the integrand involves running a simulation that needs to be refined for certain values of input. A linear instance of a multisource model is a multioutput gp aka. cokriging Alvarez2011a. bq with multioutput gps to integrate several related functions has been studied by Xi2018, who impose properties on data—which they assume given—to prove theoretical guarantees and consistency of the Bayes estimator.
bq leverages active learning schemes similar to Bayesian optimization Shahriari2016 or experimental design Atkinson2007,Yu2006. Through its argumentum maximi, an acquisition function identifies optimal future locations to query the integrand according to a userdefined metric. Metrics of interest in bq
are information gain on the value of the integral or the predictive variance of the integral. By optimizing the target
per cost, active multisource bq (amsbq) trades off improvement on the target (the integral) and ressources spent. In Bayesian optimization, this setting has been explored by Poloczek2017.We summarize our contributions:

[leftmargin=*, topsep=0pt, itemsep=3pt]

We lay the foundations for active bq in the multisource setting for the task of integrating an expensive function that comes with cheaper approximations. In particular, we assign cost to function evaluations and generalize vbq acquisition functions to acquisition rates.

We analyze these acquisition rates and discover that some exhibit a sane, others pathological behavior which can be traced back to the meaning of the acquisition function. We argue that these pathologies were not present in classically used vbq acquisition functions since most are degenerate in their policy and do not rely on values of the acquisition function, while costadapted rates inevitably do so. In other words, all considered multisource acquisition functions collapse on a single policy for vbq, as a corner case of amsbq; we discuss the implications.

We conduct proofofconcept experiments which show that amsbq improves upon vbq in that it spends less budget on learning the integral to an acceptable precision.
2 Model
We wish to estimate the integral over the information source of interest (the primary source), w.l.o.g. indexed by 1, ,
and integrated against the probability measure
on ,(1) 
in presence of not necessarily ordered or orderable secondary information sources , with . Each source comes with an inputdependent cost which must be invested to query at location . For ease of interpretation and numerical stability we set and . This is equivalent to assuming there exists a and a s.t. and then normalizing w.r.t. , i.e., . In other words, no query takes an infinite amount of resources, nor does any evaluation come for free. Normalization is no requirement and in practice, neither nor need to be known.
In the following sections we review the tools for building a statistical model that allows us to harvest information from both the primary and the secondary sources for learning the integral of Eq. (1), before turning to the decision theoretic problem of how to actively select locations and sources to query next in Section 3.
2.1 Vanilla Bayesian Quadrature
Let , be a function and a probability measure on that has an intractable integral . In vanilla Bayesian quadrature (vbq), we express our epistemic uncertainty about the value of
through a random variable
. The distribution over is obtained by integrating a Gaussian process (gp) prior that is placed over the integrand , i.e. , where , denotes the prior mean function and , the covariance function or kernel. Observations come in form of potentially noisy function evaluations^{2}^{2}2Noise free evaluations are usually assumed in bq, but that might not be true for a blackbox integrand. with . Let denote the matrix of input locations and the set of corresponding observations, summarized in (see Rasmussen2006 for an introduction to gp inference). With the closure property of gps, the posterior over when conditioning onis a univariate Gaussian distribution with posterior mean
and variance that are integrals over the gp’s posterior mean and covariance . These expressions are detailed below for the more general multisource case that vbq is a subset of and further derivations can be found in Briol2015b. So as not to replace an intractable integral by another intractable integral, the kernel is chosen to be integrable against .2.2 MultiSource Models
We consider linear multisource models, which can equally be phrased as multioutput Gaussian processes Alvarez2011a over the vectorvalued function
, . Nonlinear models for multisource modeling exist and have been considered by Perdikaris2017. They do however come with the additional technical difficulty that the model may not be integrable analytically—a sensible prerequisite for bq—and are thus another beast altogether. The notation mimics the singleoutput case, that is, , where is an matrixvalued covariance function. More precisely, the covariance between two sources and at inputs and is . The kernel encodes not only characteristics of the individual sources (e.g., smoothness), but crucially the correlation between them. In the multisource setting, observations come in sourcelocationevaluation triplets with and sourcedependent observation noise as usually only one element of is being observed.^{3}^{3}3This can be rewritten in the multioutput gp framework in terms of linear projections of onto the observed space. See Section A in the supp. mat. for details. The dataset contains data triplets from evaluating elements of at locations with corresponding sources and observations . The gp posterior over has mean and covariance(2)  
with the kernel Gram matrix and . A summary of the notation used can be found in Table 1 in the supplementary material (supp. mat.).
2.3 MultiSource Bayesian Quadrature
The multisource model of Section 2.2 can be integrated and gives rise to a quadrature rule similar to vbq (cf. sec. 2.1). Let denote the random variable representing the integral of interest of Eq. (1). The posterior over given data triplets is a univariate Gaussian with mean and variance
(3)  
where is the kernel mean and the initial error, both of source 1. Just as in vbq, the kernel is required to be integrable analytically.
We choose an intrinsic coregionalization model (icm) Alvarez2011a with kernel
(4) 
where is a positive definite matrix. Eq. (4) is a simple extension of a standard kernel to the multisource case which factors the correlation between the sources and input locations. If is integrable analytically, will be, too, and thus retains the favorable property of a bqkernel. A typical choice for is the squaredexponential, aka. rbf kernel with no dependence on the sources and . This model can easily be extended e.g., to a linear model of coregionalization (lmc) without challenging integrability of . This would untie the lenthscales between sources, but would also introduce additional generally unknown kernel parameters. The simpler icm is also used by Xi2018 to establish convergence rates for a multioutput bq rule.
3 Active Learning
Active learning
describes the automation of the decisionmaking about prospective actions to be taken by an algorithm in order to achieve a certain learning objective. A heuristic measure of improvement towards the specified goal (here: learning the value of an integral) is defined through a
utility function. It transfers the decisiontheoretic problem to an optimization problem, but usually an unfeasible one. Therefore, the utility is commonly approximated by an acquisition function. Optimizing the acquisition function induces an acquisition policy that pins down what action to take next. To obtain a sequence of actions, the considered method (here: bq) is placed in a loop where it is iteratively fed with new observations in the general multistep lookahead (nonmyopic) approach. A myopic approximation is to instead optimize for a single new observation triplet at a time. Besides feasibility, the lack of exact model knowledge motivates a loop in which the model is repeatedly updated with new observations.In Section 3.1, we recapitulate several utilities that are commonly used for vbq. All these utilities give rise to the same acquisition policy in the absence of cost and are thus not greatly differentiated between in the literature. Intriguingly, the policies do not coincide for amsbq if cost is accounted for in the acquisition functions, as will be shown and discussed in Section 3.2.
3.1 CostInsensitive bq Acquisitions
In the absence of any notion of evaluation cost (or if all sources come at the same cost), the utility functions from vbq generalize straightforwardly to the multisource case. The vbq case can be recovered by setting the number of sources to one.
3.1.1 Mutual Information
From an information theoretic perspective, new sourcelocation pairs can be chosen such that they jointly maximize the mutual information (mi) between the integral and a set of new but yet unobserved observations with . In terms of the individual and joint differential entropies over and , . Sections 2.2 and 2.3 imply that both and
are normally distributed and so is their joint. The differential entropy of a multivariate normal distribution with covariance matrix
is . Since there is no explicit dependence on the value of , we (sloppily) express the mutual information as a function of the new sourcelocation pairs ,(5) 
where we introduce the scalar correlation
(6) 
, with the noisecorrected posterior covariance matrix . In the onestep lookahead case (),
(7) 
is the bivariate correlation between and .
3.1.2 VarianceBased Acquisitions
Variancebased approaches attempt to select such that the variance on shrinks maximally. As mi, the integral variance reduction (ivr) normalized by the current integral variance can be written in terms of correlation as
(8)  
Eq. (8) is a monotonic transformation of Eq. (5) and therefore, both utility functions share the same global maximizer . In fact, any monotonic transformation of Eq. (6), whether interpretable or not, gives rise to the same acquisition policy. This is because the policy only depends on the locations, but not the value of the utility function’s global maximum. Hence, in vbq
it is equivalent to consider maximal shrinkage of the variance of the integral, minimization of the integral’s standard deviation, or maximal increase of the integral’s precision (
ip), to name a few—they all lead to the same active learning scheme and have thus not been greatly distinguished between in previous work on active vbq.3.2 CostSensitive MultiInformation Source Acquisitions
When there is a location and/or sourcedependent cost associated to evaluating the information sources (cf. Section 2), the utility function should trade off the improvement made on the integral and the budget spent for function evaluations.
This is achieved by considering the ratio of a costinsensitive bq utility and the cost function .
Such a ratio can be interpreted as an acquisition rate and bears the units of the utility function divided by units of cost.
The notion of a rate becomes clearer when considering for example the mutual information utility (Eq. (5)) with cost measured in terms of evaluation time: the unit is , i.e., a rate of information gain.
This construction has an important consequence: Modification of the vbq utility function (i.e., the numerator), even by a monotonic transformation, changes the maximizer of the costadapted acquisition rate and hence, also the acquisition policy.
In other words, the degeneracy of bq acquisition functions in terms of the policy they induce in the absence of cost is lifted when evaluation cost is included, firstly, because the argmax of each acquisition is shifted differently with cost, and, secondly, because acquisition values from different sources are discriminated against each other now.
As will be discussed below, not all monotonic transformations yield a sensible acquisition policy; indeed, some display pathological behavior.
The adapted nonmyopic acquisition rates for the bq utilities mutual information (mi Eq. (5)) and integral variance reduction (ivr Eq. (8)) are
(9)  
(10) 
where we have dropped the factor in mi as an arbitrary scaling factor.
It is evident that these acquisition rates do no longer share their maximizer; yet they still induce a meaningful acquisition scheme.
Both mi and ivr have the property to be zero at and thus never select points that are uncorrelated with the integral , no matter the cost, e.g., locations that have already been observed exactly (with ).
Such points do not update the posterior of the integral when conditioned on.
In vbq these locations are the minimizers of all acquisition functions and thus excluded no matter their value.
This is not ensured for the costadapted acquisition rates and therefore, they additionally require the numerator to be zero at .
Hence, not every monotonic transformation of the bq utility produces a sane acquisition policy in the presence of cost.
Consider for example the valid transformation , which is at .
Maximizing this utility function corresponds to maximizing the negative integral variance, i.e., minimizing the integral variance, which is very commonly done in vbq.
Since , is negative everywhere and gets larger (takes a smaller negative value) with larger cost.
Hence when maximized, this acquisition would favor expensive evaluations.
More subtle is the misbehavior of the integral precision (ip) which is positive everywhere and has the desired behavior of favoring lowcost evaluations.
In terms of the squared correlation from Eq. (6) (with simplified notation for convenience), the numerator of the ip acquisition rate can be written as .
This expression is nonzero at and therefore, it does not exclude points of zero correlation when they come at sufficiently cheap cost, and in experiments we observe it getting stuck reevaluating at the location of minimum cost ad infinitum.
We conjecture that this is because ip only encodes an absolute scale of the integral variance but does not quantify any “improvement” on the integral value.
Figure 1 illustrates the augmentation of utility functions from vbq with multiple information sources and cost.
Figure 2 displays the behavior of a few acquisitions, mi, ivr, and ip.
The right plot shows these acquisitions as used in vbq in terms of the squared correlation (Eq. (6)) in the absence of cost.
All acquisitions are strictly monotonically increasing functions of .
Among the sane acquisition rates that are zero at , the differences in the corresponding policy can also be understood from the functional dependence on .
mi diverges at perfect correlation .
Therefore, and since the cost lies in , mi will always take a ‘perfect step’ to learn the integral exactly, i.e., it will always select the points with correlation , if the step is available and no matter the cost.
ivr, however, is finite at and trades off cost and correlation even if the perfect with exists.
These interpretations are reinforced by the left three plots of Figure 2, in which we plot mi, ivr, and ip versus a univariate for the synthetic choice , and a myopic step ().
In the pure vbq situation, the locations of all their maxima coincide, but as soon as a nonconstant cost is applied, the shapes of the acquisition functions become relevant which discriminates their and lifts the degeneracy in policies.
mi tends more towards higher correlation than ivr, the maximizer of which moves further towards locations of lower cost.
While mi and ivr act differently, they are both sensible choices for acquisition functions in amsbq.
In fact for low to midranged values of where mi is approximately a linear function of they roughly coincide.
The choice of acquisition ultimately depends on the application and the user, who may choose which measures of improvement on the integral and cost to trade off.
4 Experiments
The key practical applications for amsbq is solving integrals of expensivetoevaluate blackbox functions that are accompanied by cheaper approximations, potentially in a setting where a finite budget is available. Typical applications are models of complex nonlinear systems that need to be tackled computationally. With evaluations being precious, the goal is to get a decent estimate of the integral with as little budget as possible, rather than caring about floatingpoint precision. In the experiments, we focus on the rear vertices of the acquisition cube Figure 1, i.e., multiple sources with source and inputdependent or only sourcedependent cost, and separate them into two main experiments:

[leftmargin=*, topsep=0pt, itemsep=3pt]

A synthetic multisource setting with cost that varies in source and location for the purpose of exploring and demonstrating the behavior of the acquisition functions derived in Section 3.

An epidemiological model of the spread of a disease with uncertain input, in which two sources correspond to simulations that differ in cost as well as quality of the quantity of interest.
Additionally, we present a bivariate experiment with three sources in the supplementary material Section D. We take a myopic approach to all scenarios in that we optimize the acquisition for a single sourcelocation pair a time.
4.1 Multiple Sources, NonConstant Cost
We initially consider a synthetic twosource setting with univariate input. The cost functions depend on both source and location. The experiment’s purpose is to demonstrate our findings from Section 3 and convey intuition about the behavior of the novel acquisition functions. The sources we consider have been suggested by Forrester2007 with the primary source and the secondary source for . The cost functions both take the form of a scaled and shifted logistic function in a way that the cost lies in , cf. Figure 8 in the supp. mat. The costs of both sources converge to the same value close to ; for larger , is two orders of magnitude cheaper than . Figure 3 shows snapshots of three consecutive query decisions taken by the mi multisource acquisition. The gp model (depicted in the left column) has been initialized with 3 datapoints in the primary and 5 in the secondary source and merely the noise variance was constrained to . The mi acquisition given the current state of the gp is shown on the right—the top left frame is shown for mi, ivr, and ip in Figure 7 in the supp. mat. to emphasize the pathology of ip and to highlight the subtle difference mi and ivr in practice. The acquisition function is optimized using the lbfgsb optimizer in scipy. We observe that amsbq does not query where the source costs are almost identical for (see Figure 9 in supp. mat.). This is because the two sources are not perfectly correlated and evaluating always conveys more information about than . The fact that decreases with increasing is nicely represented in the increasing height of the maxima of the dashed acquisition function for the secondary source in the top left frame of Figure 3.
For assessing the performance of amsbq, we compare against vbq and a percentile estimator (pe) that both operate on the primary source.
The latter is obtained by separating the domain into intervals that contain the same probability mass and summing up the function values at these nodes.
For the uniform integration measure used here, this is equivalent to a right Riemann sum.
We assume that gp inference comes at negligible cost as compared to the function evaluations and thus consider cost to be incurred purely by querying the information sources.
To render the integration problem more difficult, we modify the Forrester functions to vary on a smaller length scale by adding a sinusoidal term and adapting some parameters, s.t.
and which we integrate from 0 to 1 against a uniform measure (cf. Figure 4, top left).
To avoid over or underfitting, we set a conservative gamma prior on the lengthscale with a mode at a small fraction of the domain for both vbq and amsbq, and assume zero observation noise.
With six^{4}^{4}4Due to the construction of
more hyperparameters than
vbq, amsbq is more prone to over/underfitting, and we further set a prior on the coregionalization matrix (cf. Section 2.3) with parameters estimated from the initial three data points using empirical Bayes. This is to avoid initial over or underestimation of the correlation between sources, which would either cause the active scheme to select only or only , respectively. Compared to the previous experiment, the cost is changed to have a minimum, but still composed of a sum of logistic functions and normalized to be in (Figure 4, top right). The effect of these cost functions on the final state is depicted in Figure 10 in the supp. mat. Furthermore, this setting reveals the pathology of the ip acquisition (cf. Section 3.2) that everlastingly reevaluates the secondary source at the location of minimal cost. The convergence behavior of the wellbehaved acquisition functions mi and ivr are displayed in Figure 4 (bottom) in comparison to vbq and pe. The hyperparameters of the gp are optimized after every newly acquired node, both for vbq and amsbq. Figure 4 shows the superior performance of both amsbq methods in arriving close the true integral with little budget. The vertical jumps in the amsbq methods occur when is evaluated at cheaper cost.4.2 A Simulation of Infections
We now consider multisource models in which sources come with inputindependent cost, a.k.a. multifidelity models (bottom rear mi vertex in Figure 1
). We choose an epidemiological model in which evaluating the primary source requires running numerous stochastic simulations and the secondary source solves a system of ordinary differential equations. Epidemiological models deal with simulating the propagation of an infectious disease through a population. The
sir model forms the base for many compartmental models and assumes a population of fixed size where at any point in time, each individuum is in one of three states—susceptible, infected, and recovered (sir)—with sizes , andKermack1927. The dynamics are determined by stochastic discretetime events of individuals changing infection state, for which Poisson processes (i.e., exponentially distributed interevent times) are commonly assumed (see e.g., Daley1999). In the thermodynamic limit where
is large, the average dynamics is governed by a system of odes that does not admit a generic analytic solution. There are two parameters in the sir model: the infection rate , and the recovery rate . Model details and experiment setup can be found in Section C (supp. mat.).For the amsbq experiment, we assume that we know , but we are uncertain about . We are interested in the expected maximum number of simultaneously infected individuals and the time this maximum occurs , which might be relevant for vaccination planning. Querying the primary source for the quantities of interest as a function of requires numerous realizations of a stochastic fourcompartments epidemic model (an extension to the sir model) using the Gillespie algorithm Gillespie1976,Gillespie1977. For each trajectory, the maximum value and time are computed and henceforth averaged over. In our implementation, each query of takes on a laptop’s cpu. The secondary source solves the system of odes for given and computes the maximum value and time for the resulting function , which takes about to evaluate. As in previous experiments, we set a gamma prior on the lengthscale, a prior on the coregionalization matrix , and the noise variance to zero as in Section 4.1. Both vbq and amsbq are given the same initial value of , and amsbq additionally gets the value of at the same location, as well as one more random datum from . This is justified since amsbq needs to learn more hyperparameters than vbq and secondary source evaluations are very cheap. Otherwise, if the initial evaluations of were further apart than the prior lengthscale from the locations of the initial primary datum, virtually zero correlation would be inferred between the sources, and the primary source would be evaluated until a sampled location roughly coincides with locations where the secondary sources have been evaluated.
Figure 5 shows the relative error of the amsbq estimator against normalized cost as compared to vbq and pe for (left) and (right). The horizontal dashed line shows , i.e., the integral of the secondary source with one evolution of a Monte Carlo estimator of . This illustrates that simply using the secondary source for the integral estimate might be computationally cheap, but results in an unknown bias of the integral estimate. In the left plot, amsbq achieves a good estimate with one additional evaluation of only, while vbq takes another six evaluations. Again, the vertical jumps for amsbq are caused by evaluations of . The initial high confidence on the integral is caused by the choice of prior on the output scale from the initial data, which is located in the tail of the gamma prior on . Figure 6 displays the order in which amsbq evaluates primary and secondary source.
5 Discussion
The multisource model presented in Section 2.2 can be extended in various ways to increase its expressiveness by using a more elaborate kernel (e.g., one lengthscale per source), or by encoding knowledge about the functions to be integrated, e.g., a probabilistic integrand. Other applications might come with the complication that the cost function is unknown a priori and needs to be learned during the active bqloop from measurements of the amount of resource required during the queries. A simple example of this was presented in Section 4.2
where the cost was parameterized by a constant which was then estimated during the initial observations. A probabilistic (in contrast to parametrized) model upon the cost would induce an acquisition function which is not only conditioned on the uncertain model predictions but also on the uncertain cost predictions. Furthermore, as in other active learning schemes, nonmyopic steps for acquiring multiple observations
at once might be beneficial especially when the multisource model is already wellknown, i.e., it does not benefit from being refitted after seeing new observations, or when multiple evaluations of sources come at lower cost than evaluating sequentially. On the experimental side, more elaborate applications of amsbq in areas of active research are reserved for future work.5.1 Conclusion
We have placed multisource bq in a loop and thus enabled active learning to infer the integral of a primary source while including information from cheaper secondary sources. We discovered that utilities that yield redundant acquisition policies in vbq give rise to various policies, some desirable and others pathological, when evaluation cost is accounted for. Our experiments illustrate that with the sensible acquisition functions, the amsbq algorithm allocates budget to information retrieval more efficiently than traditional methods do for solving expensive integrals.
Acknowledgements
The authors thank Andrei Paleyes for advice in softwarerelated matters, as well as Philipp Hennig, Motonobu Kanagawa, and Aaron Klein for useful discussions. Alexandra Gessner acknowledges support by the International Max Planck Research School for Intelligent Systems (IMPRSIS).
Supplementary Material
Active MultiInformation Source Bayesian Quadrature
Appendix A Multisource models and multioutput GPs
We have seen in Section 2.2 that linear multisource models can be phrased in terms of multioutput gps. Typically, the goal of multioutput gps is to model a vectorvalued function and observations come as a vector , where . In multisource models, we wish to observe only elements of , i.e., . These observations can be written as projections of the vectorvalued observations,
(11) 
where denotes a vector with a 1 in the coordinate and zero elsewhere. Let denote the vector of stacked vectorvalued noisy observations . Then the corresponding observations of elements is
(12) 
where is a sparse matrix. Note the delicate notational difference between the observations of single elements of , , and a single evaluation of the vectorvalued function . The covariance matrix between all of the observations is
(13)  
where and is an matrix with every element a 1. Also, . With the following mappings, we arrive at the multisource notation introduced in Section 2;
(14)  
Hence, the notational detour over vectorvalued observations is not required and evaluations of individual sources can be incorporated easily in the multisource model. From the mappings Eq. (14) follow the posterior mean and covariance of the multisource model Eq. (2).
Appendix B Additional plots for Section 4.1
Section 4.1 showed two examples to demonstrate the behavior of our derived acquisition functions. All relevant details and crossreferences are in the captions.
Appendix C Details for the infections model
c.1 The SIR model and extensions
When the population size is large, the sir (susceptible, infected, recovered) model can be described by the following system of ordinary differential equations,
(15)  
in which is the rate of infection and the rate of recovery. It is the most basic of a series of compartmental epidemiological models. Various extensions exist to accommodate additional effects e.g., vital dynamics, immunity, incubation time (cf. e.g., Hethcote2000). Some of these extensions serve as a general model refinement, others are relevant to specific diseases.
Statistical properties, however, are not captured by the description through ode
s and call for a stochastic model. The Gillespie algorithm Gillespie1976, Gillespie1977 enables discrete and stochastic simulations in which every trajectory is an exact sample of the solution of the ‘master equation’ that defines a probability distribution over solutions to a stochastic equation. In the
sirmodel, the rate constants are timeindependent and thus, the underlying process is Markovian in which the event times are Poisson distributed. Here, an event denotes the transition of one individual from one compartment to another (e.g.,
).c.2 Experimental setup
For the amsbq experiment, we assume that we know the recovery rate , but we are uncertain about the infection rate . Therefore, we rescale the odes and place a shifted gamma prior on that starts at and has shape and scale parameters 5 and 4 respectively. With this prior we encode our belief that the infection rate is significantly larger than the recovery rate so an offset of the epidemic is very likely. Also, we set the population size to to be well below the thermodynamic limit and set one individual to be infected initially. We are interested in the expected maximum number of simultaneously infected individuals and the time this maximum occurs , which might be relevant for vaccination planning. Querying the primary source for the quantities of interest as a function of requires numerous realizations of a stochastic fourcompartments epidemic model using the Gillespie algorithm Gillespie1976,Gillespie1977; in addition to the base model (sir), we include the state ‘exposed’, in which individuals are infected but not yet infectious. The modified system of odes that also account for assumed known incubation time are
(16)  
where we set . We absorb the prior on in the blackbox function for all methods.
Figure 11 shows the sir and seir models (Eq. (15) and (16), respectively) with 10 stochastic trajectories (thin lines). The solid lines indicate the mean of 100 of these stochastic realizations, and the dashed lines show the solution of the odes, in both cases for the sir model. We also use the sir model for solving the odes even though the stochastic model simulates the seir model. The purpose of this is to mimic a case where secondary sources are simplified simulations in that minor components are deprecated. In the stochastic case, there is not always an outbreak of the disease, i.e., the initially infected individuum recovers before infecting someone else. This causes the average to level off significantly below 1. For the integrals, only outbreaks are taken into account. The corresponding integrands for the quantities of interest are shown in Figure 12.
Appendix D Bivariate linear combination of Gaussians
We construct an integrand (primary source) in the 2Ddomain as a linear combination of normalized Gaussian basis functions
(17) 
i.e., . For this, we sample means uniformly in the 2D domain. We then sample corresponding covariance matices according to , , and . The scalar weights are sampled from a standard Gaussian and can be negative. Thus
is not a probability density function but rather a linear combination of Gaussians with varying location, shape, and weight. We then construct secondary sources
and consecutively by adding uniform noise to the means, and additive uniform noise to the diagonal of the covariance matrices. Thus, with each additional source, each of the means get randomly but consecutively shifted up and right, and the basis functions , randomly become wider and flatter. Additionally we consecutively add Gaussian random noise to the weights which ensures that the true integrals of the secondary sources differ from the integral of the primary source. All sources are depicted in Figure 13; the primary source on the left, and secondary sources and in the middle and right respectively. The cost for evaluating the primary source is everywhere, the cost of evaluating and are of the primary cost each.The priors on the kernel lengthscale and coregionalization matrix are set analogously to the other experiments already described in Section 4.1. amsbq is initialized with one evaluation of the primary source and two evaluations each of the secondary sources which amounts to a total initial cost of 1.2 (initial evaluations shown as red dots in Figure 13). Vanillabq is initialized with three evaluations which are needed to get an initial guess for its hyperparameters (initial cost=3). The result is shown in Figure 14 which plots relative error of the integral estimate versus the budget spent as well as two standard deviations of the relative error as returned by the model. It is apparent that amsbq finds a good solution faster than vanillabq.
Figure 15 illustrates the sequence of sources choses by amsbq. Secondary source is chosen more often than secondary source at equal evolution cost of 0.05. This is intuitive since , by construction, provides more information about than , but both secondary sources shrink the budget equally when queried. The percentage of number of evaluations for each source after spending a total budget of 50 is , , for sources , , respectively.
Variable  Shape  Description 

number of sources, indexed by where is the primary source  
dimension of the input space, indexed by  
number of sourceinputevaluation triplets, indexed by  
number of potential sourceinputevaluation triplets  
input location  
, where is the source  
integral of the source  
integration measure on  
domain that is integrated over,  
random variable for integral of interest  
sourcelocation pair where is evaluated  
sourcelocationpairs  
noisefree function evaluations  
noisy function evaluations  
simultaneous evaluation of all sources  
noise vector,  
diagonal noise matrix with noise per level  
collected data triplets  
matrixvalued covariance matrix  
covariance function  
vectorvalued covariance  
Gram matrix  
gp prior mean for multioutput gp  
prior mean evaluated at sourcelocation pairs  
posterior mean at source  
posterior covariance of sources  
integrated prior mean  
kernel mean of source at sourcelocation pairs  
initial error  
coregionalization matrix for the kernel used in the icm  
kernel encoding purely spatial correlation in the icm  
potential new sourcelocationevaluation triplets  
cost of evaluating at ;  
; in the myopic case denoted as  
scalar correlation function for , defined in Eq. (6)  
nonmyopic acquisition function, in the myopic case 
Comments
There are no comments yet.