The numerical evaluation of integrals is a foundational aspect of mathematics, with impact on diverse areas such as finance, uncertainty quantification, stochastic programming, and many others. Monte Carlo (MC) sampling is arguably the most robust means of estimating such integrals and can be easily applied to arbitrary integration domains and measures. The MC estimate of an integral is unbiased, and its rate of convergence is independent of the number of variables and smoothness of the integrand.
However, when the integrand consists of a function that is expensive to evaluate, for instance a high-fidelity computational simulation, obtaining a MC estimate with even moderate accuracy can be infeasible because the variance of a MC estimator is proportional to the ratio of the variance of the true integral and the number of samples used. Control variate (CV) techniques have a long history of enhancing MC estimates of integrals [18, 20, 19, 14]. They reduce the number of samples to compute an integral by introducing additional functions, which are correlated with the integrand, to reduce the variance of its estimate.
The use of CV methods has recently seen a resurgence, driven by the field of uncertainty quantification (UQ). In UQ, the need to evaluate statistics of computationally expensive high-fidelity models is of paramount importance, and multiple information sources, arising from multiple simulation models, are often available to accelerate the convergence of statistics [22, 11, 27, 2, 6]. Such models can be used to accelerate algorithms for both forward UQ problem and inverse UQ . A set of models can result from simulating different sets of equations (i.e., the multifidelity case of differing model forms) and from varying temporal and spatial discretizations (i.e., the multilevel case of differing numerical resolutions for the same set of equations). Multiple conceptual dimensions can exist within a modeling hierarchy, leading to the generalized notion of multi-index MC methods , although it can be important to distinguish dimensions which define a convergent sequence (i.e., multilevel) from those that do not (e.g., multifidelity). In the most general setting, the model ensemble could include reduced-order models , dimension-reduction or surrogate models  (e.g., active subspace approximations), and even data from physical experiments . Finally, both multi-physics and multi-scale simulations contribute additional combinatorial richness to the modeling ensemble.
Traditional CV methods  require explicit knowledge of the statistics (for instance the expected value) of their approximate information sources, however these estimates are frequently unavailable a priori in the UQ simulation-based context. Consequently, CV methods must be modified to balance the computational cost of evaluating lower fidelity models and the reduction in error they each provide. Existing strategies for using these CV techniques in the UQ context have therefore focused on slightly modified frameworks. For instance, in the case where additional information sources arise from a hierarchy of discretization levels, multilevel and multiindex Monte Carlo (MLMC) [8, 12]
use the additional information sources through the estimation of statistics of discrepancy terms. When the variance of these discrepancy decays, significant variance reduction can be achieved. In more general contexts, heuristic estimators have been developed and analyzed[24, 26]. These estimators use additional samples of low-fidelity information sources to obtain approximations of their missing statistical information. Finally, more flexible multilevel-multifidelity estimators that combine these notions have also been investigated [3, 5, 6].
Within these multilevel, multifidelity, and multilevel-multifidelity UQ strategies, an optimal sample allocation is defined across different model instances for the case where a recursive model dependency structure has been defined a priori. In this paper, we present a framework to leverage control variate techniques in the context where both the control-variate means and the model dependency structure are unknown. We demonstrate how existing algorithms, e.g., multilevel Monte Carlo (MLMC)  and multifidelity Monte Carlo (MFMC) [21, 26], are suboptimal realizations of this framework in that they do not extract all possible value from the correlations among information sources.
There are two mechanisms by which control variates can be used to reduce the variance of a MC estimator: (1) increasing the number of high fidelity evaluations and (2) leveraging the correlations of the low-fidelity models to the greatest extent possible. An optimal way to reduce the variance uses both of these mechanisms to allocate samples amongst all available information sources to reduce the variance per unit cost. In this paper, we provide a framework to exploit the second pathway for variance reduction in a more optimal way. Furthermore, we then embed this framework into an optimal sample allocation scheme. In particular, we extend the control variate theory cited above to the case of unknown control-variate means and make three primary contributions:
New theoretical results demonstrating the sub-optimality (in term of maximum possible variance reduction for a fixed high-fidelity number of evaluations) of MLMC and MFMC.
Within the context of MLMC, we derive a novel estimator that, for fixed resources, is provably more accurate than the standard MLMC estimator.
Within the context of MFMC, we derive a drop-in replacement (“ACV-MF”) that can address this sub-optimality.
A framework for new control variate algorithms that, for estimated control means, converges toward a level of variance reduction that can be orders of magnitude greater than the maximum level of variance reduction achievable by MLMC and MFMC.
Initial sample allocation strategies for optimal variance reduction given a fixed computational cost.
A significant gap can exist, of orders of magnitude in some cases, between the variance reduction achievable by current schemes and our generalized schemes. The initial sample allocation strategies we present for exploiting this gap are most effective when the number of high-fidelity samples that can be generated is fixed or the cost of the high fidelity model is significantly more expensive than the low fidelity models. This situation occurs in practice when the high fidelity model is expensive to run and requires significant computational resources. In this scenario, our estimators can achieve greater variance reduction by changing the allocation and pairings of samples for each of the approximate models used as control variates. Central to these observations, we prove the following two results:
An approximate control variate scheme can be devised to converge to the optimal linear control variate in the limit of infinite data (Theorem 5).
The remainder of the paper is structured as follows: in Section 2, we describe a unifying framework for multi-model control variate approaches and then identify existing multilevel and multifidelity estimators within this general formalism; in Section 3, we develop a set of new approximate control variate estimators; and, in Section 4, we demonstrate our approaches for several prototypical applications in fluid dynamics.
2 A unifying approximate control variate framework
In this section we provide background on control variate methods and present a generalized framework for estimating integrals using control variates with unknown mean. Then we show how existing methodologies (MLMC and MFMC) are suboptimal instances of this general framework.
be a probability space anddenote a mapping from to a scalar valued output, we will call this mapping a “high-fidelity model” because it will represent the process whose statistics we desire to estimate. In particular, our goal is to generate an estimate of using a set of samples of the input random variables and the corresponding evaluations of the function at those samples . The MC estimate of the mean
is unbiased, that is , and conveges a.s. For, as .
2.1 Traditional control variate estimation
The MC-based222Control-variate-based variance reduction algorithms can certainly be performed with other sampling-based and non-sampling-based estimators. In this paper we focus on the commonly Monte Carlo estimators. linear CV algorithm seeks a new estimator that has smaller variance than while still requiring only evaluations of . The algorithm works by first introducing an additional random variable with known mean . Then it requires computing an estimator of using the same samples that were used for . Finally, these quantities are assembled into the following estimator
for some scalar 333In the rest of the paper we drop the term “linear.” It is assumed that any scheme that is discussed is linear. For reference, nonlinear control variates are those that may have polynomial corrections of the form . As we will see shortly, the variance of is strongly influenced by the correlation between and , and greater variance reduction is achieved for higher correlation. To this end we will call the CV correlated mean estimator (CME), and we will refer to as the control variate mean (CVM). This approach can be extended to information sources , see e.g., [18, 20, 19], using
The effectiveness of the control variate algorithm is measured by the variance reduction ratio ratio between the variances of and :
The optimal CV (OCV) uses weights that minimize this variance:
Following , let denote the covariance matrix among and denote the vector of covariances between and each . The solution to Equation (5) is . If we further define , where is the Pearson correlation coefficient between and , then the variance reduction corresponding to these weights becomes
where . When , then we have no advantage over MC, and when , we have the maximum variance reduction possible. When there is only a single approximate information source, this quantity is equivalent to .
2.2 Approximate control variate estimation
Traditional control variate estimation assumes that the means are known. In our motivating problem of multifidelity uncertainty quantification, these means are not known, but rather must be estimated from lower fidelity simulations. To this end, let denote a set of samples used to evaluate the high-fidelity function . Furthermore, let denote an ordered set of samples partitioned into two ordered subsets and such that . Note that and are not required to be disjoint, i.e., in many strategies they overlap so that . We will construct and analyze approximate CV estimators of the following form
where , and we use to denote the input values to the high-fidelity model and each CV model . Here we use additional samples to estimate each mean , and this estimate is denoted by and called the estimated control variate mean (ECVM). All of the estimators in the literature, and those that we derive in this paper, are differentiated by how the samples are related among the CVs and how the subsets and are defined within each CV. It will be useful to consider ordered sets of samples such that In this case, if and intersect then for some and
Because we must now use extra samples to estimate , we will have to account for the additional cost this imposes. Let denote the number of available realizations of and
denote the total number of available evaluations of the -th CV for scaling factor . Let denote the ratio between the cost of a single realization of control variates and the cost of obtaining a realization of the high-fidelity . Then the cost of the CV estimator is
For fixed , the estimator is unbiased when each component, , , and for is unbiased. The variance of is given in the following proposition.
Proposition 1 (Variance of the approximate CV estimator)
The variance of the approximate CV estimator (7) for fixed is
This result follows directly from the the rules regarding the variance of MC estimators, the variance of scaled random variables, and the variance of sums of random variables. We have
where the middle term on the first line is the definition, the last term follows from the variance of sums of random variables, and the second line factors out the baseline estimator variance.
This result means that the variance ratio of the approximate control variate (ACV) estimator is
The optimal approximate CV estimator consists of the which minimizes this ratio. The properties of this optimal estimator are given in the following proposition.
Proposition 2 (Optimal approximate CV)
Assume is positive definite and . The CV weight that provides the greatest variance reduction for the approximate CV Equation (7) is given by
with corresponding estimator variance
The variance reduction is a quadratic function of and so its extremum can be found by setting the gradient to zero
If we substitute this weight into Equation (9), we obtain the stated result
Proposition 2 provides a means to construct and theoretically compare different control variate estimators. In the following, we will use this tool to derive a number of existing estimators which use different sampling strategies to compute and .
2.3 Multilevel Monte Carlo (MLMC)
The MLMC estimator  is inspired by ideas drawn from multigrid solvers for PDEs. It posits that if a stochastic process can be simulated using various resolution methods, then low resolution simulations can be used to obtain a reduction in the variance of the statistics of higher resolution simulations. An initial set of motivating applications included stochastic differential equations [7, 9, 13], but it has been extended also to elliptic problems  (among others), filtering problems  and a large number of other scientific and engineering applications [16, 6, 4, 10].
In this section, we provide several new contributions. First, we derive a novel expression for the variance reduction provided by MLMC that is a function of the correlations between each model. Second, we identify that MLMC is sub-optimal in that it cannot provide a greater variance reduction than the optimal control variate estimator with one control variate (OCV-1), regardless of how many resolutions or how much data is used. Third, we provide a simple enhancement to the standard MLMC assembly strategy to provide a provably greater variance reduction for virtually no additional cost.
Let be a sequence of random variables, ordered bottom-up555Here, we use instead of because in this paper is always of “higher fidelity” than , for and MLMC traditionally reverses these notions. We use a different letter and then provide a mapping back to to reflect this discrepancy. from coarsest resolution at level 0 to finest resolution at level . The MLMC estimator seeks the expectation of by rewriting it as a telescopic sum . Then MLMC calls for (by definition) estimating the expectation using independent sample sets for each level , by summing the following sequence of estimators
From the cancellation of terms, we see that this estimator is still an unbiased estimate of.
To link this expression to Eq. (7), we make the following associations: the number of control variates is , the finest resolution level is our quantity of interest , and the order of is reversed to obtain our control variates for . Using this assignment it is clear that the the same sum can be written as
where we have introduced independent sample sets for each level with for and From here it is clear that two estimators of each QoI are obtained; one estimator uses and the other Thus we can identify the ECVM of with the estimator that uses since this is the set with more samples. Accounting for the sample set for the high fidelity model we can rewrite this equation as
Finally, re-pairing terms to reflect the same level but different sample sets, we arrive at the form of the approximate CV in (7)
where we have set and for and for all . It is important to note that the sampling strategy implies a recursive underpinning to the MLMC strategy. In this sense, MLMC can be derived by starting with a single with unknown CVM. Then we introduce as a control variate for the ECVM. It is for this reason that the samples between and are shared.
Given this partitioning, the receptive cardinality of and is and , and
for . Here the evaluation ratios for each model are and because . The sampling scheme is illustrated in Figure 0(a) for reference. The variance reduction of the MLMC estimator is given in the following proposition.
Lemma 1 (Variance reduction of MLMC)
Let the MLMC estimator be defined as
with the sampling strategy: . Each ordered set of samples is partitioned according to , , , and for . The variance of this estimator is
is the ratio of the standard deviations,
is the ratio of the standard deviations,is the Pearson correlation coefficient between , and are defined recursively according to
The proof is provided in Appendix A. MLMC provides a smaller variance reduction than the single optimal CV.
Theorem 1 (Maximum variance reduction of MLMC)
The variance reduction of MLMC is bounded above by the optimal single CV i.e.,
The proof is provided in Appendix B. This bound is a result of the sampling strategy used by MLMC, which does not allocate any of the samples used to evaluate the high fidelity model to the models for . Because and do not share samples there is no correlation between the estimators of and for . Ignoring these correlations limits the maximum attainable variance reduction of the final estimator to the one corresponding to OCV-1; specifically, the control variates , for , are used to accelerate convergence of to OCV-1 using model .
Note that the bound given in Theorem 1 does not depend on the number of samples of each . Even with an infinite number of samples of the CVs, the manner in which the samples are distributed restricts the variance reduction achieved. As far as we are aware, this sub-optimality has not been discussed previously in the literature, where the focus has been to show some variance reduction as compared with MC.
This sub-optimality property can have a significant effect in practice. We will show that significantly greater variance reduction can be obtained with essentially no additional computational expense by simply redistributing the sample allocation (even within the two components of ).
Finally, we briefly mention that multi-index Monte Carlo MIMC algorithm  is a multidimensional analogue of MLMC in that it considers a two- (or multi-) dimensional hierarchy of models This estimator can also be written in the language of control variates; whereas MLMC introduces a single to reduce the variance in estimating the ECVM , MIMC introduces several control variates to reduce the variance of the ECVM . As a result MIMC can be more efficient because it has the potential for greater variance reduction; however, it still suffers from a similar suboptimality, with respect to metric, because it still uses lower-fidelity models to only accelerate convergence of the of the immediate higher-fidelity models rather than of – in other words they are still recursive.
2.3.1 Weighted Multilevel Monte Carlo
A small modification of the MLMC algorithm can reduce its variance with no additional computational cost. This improvement is obtained by replacing the weights with the optimal weights given by Proposition 2. Since these weights guarantee that the estimator achieves the greatest variance reduction, using these weights will yields an estimator that employs the same sampling strategy as the standard MLMC algorithm (i.e., sampling in a sequence of model pairs), but has a lower variance (error).
Pseudo-code for constructing the optimal multi-level MC estimator is given in Algorithm 1. The values of and are provided in Appendix A. Here we assume that the allocation of the samples is given a priori. For example a typical MLMC allocation strategy that attempts to balance physical prediction bias and statistical error can be used. Such an allocation may no longer be optimal. We discuss how to construct an optimal allocation strategy in Section 3.3.
2.4 Multifidelity Monte Carlo (MFMC)
An alternative to MLMC, called Multifidelity Monte Carlo (MFMC), was recently proposed in [26, 22, 24, 27]. In this section, we provide several new results regarding this sampler. First, we derive a new variance reduction expression, an alternative to [26, Lemma 3.3], that relates the correlation of the CVs to the variance reduction. Second, we use this expression to demonstrate that MFMC is a recursive estimator, like MLMC, and therefore is also sub-optimal. That is, the maximum variance reduction provided by this estimator is limited to that provided by the single optimal CV, regardless of how many samples are used to evaluate .
The MFMC estimator can be obtained from the following recursive procedure. Samples are used to obtain and the CME () of . Then an enriched set of samples are used for the ACVM () of to obtain
where we have chosen and . In other words, consists of the original samples along with an additional set.
Now, let us introduce another CV to reduce the variance of the according to
where we have again set and
Continuing in this recursive pattern and collapsing -products, we obtain the MFMC estimator
and the sampling strategy partitions into and according to
for and and . The sampling scheme for the MFMC estimator is illustrated in Figure 0(b) for reference.
Unlike the MLMC estimator, this estimator has free weights that we can tune, and the optimal weights are given in the following Lemma.
Lemma 2 (Optimal CV weights for MFMC)
The optimal weights for the MFMC estimator are
The proof is provided in Appendix C, and provides an alternate derivation to the identical result in [26, Th. 2.4], but it does not require the same assumptions; in the prior work, these weights were derived using assumptions on model costs and correlation orderings. The next Lemma provides a new representation of the variance reduction of this estimator in terms of the correlations between model pairs.
Lemma 3 (Variance reduction of the optimal MFMC)
Assume and let . The variance reduction of the optimal MFMC estimator is
The proof is provided in Appendix D. Under the assumption made in  that, without loss of generality (can be constructed by reordering), for , we can use Lemma 3 to show that the MFMC estimator cannot perform better than the optimal single CV, regardless of how many samples are allocated to the control variate models.
Theorem 2 (Maximum variance reduction of MFMC)
The variance reduction of MFMC is bounded above by the optimal single CV, i.e.,
The proof is provided in Appendix E. The variance reduction of MFMC approaches in the limit of infinite data. Though the MFMC estimator is limited by the highest correlation , it will be able to approach this limit more efficiently than a single low-fidelity CV because is often larger than , given the (assumed) decreasing computational cost in the sequence from the first to the last low-fidelity model . Similarly to the MLMC case, the suboptimality of MFMC is due to the recursive nature of this estimator. Each successive CV is used to enhance convergence of .
2.5 Summary and example
Table 1 below summarizes the sample distribution and the variance reduction ratio of the state-of-the-art methods described above, and Figure 1 pictorially illustrate the existing variance reduction algorithms that do not require a known mean.
|Algorithm||Relation between and||Reduction ratio|
|MLMC ||, for||,|
|MFMC ||for all||,|
To demonstrate these results we consider the following simple monomial example: let and for , where . The correlation matrix for this problem is given in Table 2.
As we have not yet introduced a cost model, we first explore the performance of these methods through the lens of an assumed sample ratio so that for . As a result, the sample allocations across are prescribed by , rather than computed from known and estimated ; this is the viewpoint taken in Figures 2, 4, and 5. We explore the introduction of and resulting optimal allocation strategies later in Sections 3.3.1 and 4.1. These two views highlight two critical features for these algorithms: (a) the existence of a significant gap between OCV-1 and OCV, and (b) an effective ability to exploit this gap to achieve computational savings.
Figure 2 shows the variance reduction ratio for the described algorithms. The dotted horizontal lines provide baseline variance reduction ratios corresponding to MC (), the single OCV (OCV-1), double OCV (OCV-2), triple OCV (OCV-3), and the optimal control variate that uses all the models (OCV). We clearly see that the variance reduction of the existing estimators is bounded by that provided by OCV-1, as established by Theorems 1 and 2 and summarized in Table 1.
To summarize, this example shows that when the number of high fidelity samples is fixed, the existing state-of-the-art estimators are sub-optimal. As such, in applications where it is infeasible to obtain more samples of the highest fidelity model, we can potentially achieve orders of magnitude greater variance reduction. We want to emphasize that we are aware of no existing methods in the form (7) that can converge to the OCV estimator (with known means) as the amount of data increases. This defines our goal for the discussion to follow: overcome the sub-optimality of the existing estimators by developing algorithms that can converge to the optimal control variate estimator (red OCV line in Figure 2).
3 Approximate CV
3.1 Two convergent estimators
The most straightforward way to guarantee convergence of an ACV estimator to an OCV estimator is to set for each CV and then to use all available samples to estimate , i.e., . In other words, we use the exact same input samples for as those used for ; any additional samples available for each control variate to evaluate .
The formal definition of this estimator, which we coin approximate CV-IS (ACV-IS) where IS stands for “independent samples”, is
Definition 1 (Acv-Is)
Let , and for and . Then the estimator ACV-IS is defined as
The estimator only requires shared evaluations for the input samples that make up and each . The rest of the samples for each control variate are completely independent. As a result, an attractive feature of the sample distribution strategy for ACV-IS is that each control variate can be evaluated separately and in parallel. The sampling scheme for the ACV-IS estimator is illustrated in Figure 2(a) for reference.
The optimal control variate weights and variance reduction ratio for ACV-IS are now given.666In this paper, denotes a Hadamard, or elementwise, product.
Theorem 3 (Optimal CV-weights and variance reduction for ACV-IS)
The optimal CV weights and resulting estimator variance for the ACV-IS estimator are
and such that
The proof is provided in Appendix F.
Next we consider the case where the samples in the sets are not independent among models, but rather use overlapping sets. We call this estimator ACV-MF.
Definition 2 (Acv-Mf)
Let , and for and . Then ACV-MF estimator is
Note that this estimator can use an identical set of samples to the MFMC estimator and can thus be considered a drop-in replacement. The only difference is that is evaluated using only the first samples instead of . Furthermore, using less samples for does not cause loss of accuracy because the control variates method does not require an accurate estimate of in terms of how close it is to , it requires an estimator that is correlated to and unbiased. The sampling scheme for the ACV-MF estimator is illustrated in Figure 2(b) for reference.
The optimal weights and resulting variance reduction of the ACV-MF estimator are now provided.
Theorem 4 (Optimal CV-weights and variance reduction for ACV-MF)
The optimal CV weights and resulting estimator variance for the ACV-MF estimator are