Stochastic simulation is one of the most widely used analytic tools in operations research. It provides a flexible mean to approximate complex models and to inform decisions. See, for instance, Law et al. (1991) for applications in manufacturing, revenue management, service and operations systems etc. In practice, the simulation platform relies on input models that are typically observed or calibrated from data. These statistical noises can propagate to the output analysis, leading to significant errors and suboptimal decision-making. In the literature, this problem is commonly known as input uncertainty or extrinsic uncertainty.
This paper concerns the quantification of input uncertainty. In conventional simulation output analysis where the input model is completely pre-specified, the statistical errors come solely from the Monte Carlo noises, and it suffices to account only for such noises in analyzing the output variability. When input uncertainty is present, such an analysis will undermine the actual variability. One common approach to quantify the additional uncertainty is to estimate the variance in the output that is contributed from the input noises (e.g., Song et al. (2014)); for convenience, we call this the input variance
. Input variance can be used to identify models that are overly ambiguous and flag the need of more data collection. It also provides a building block to construct valid output confidence intervals (CIs) that account for combined input and simulation errors (e.g.,Cheng and Holland (2004)).
Bootstrap resampling is a common approach to estimate input variances. This applies most prominently in the nonparametric regime, namely when no assumptions are placed on the input parametric family. It could also be used in the parametric case (where more alternatives are available). For example, Cheng and Holland (1997) proposes the variance bootstrap, and Song and Nelson (2015) studies the consistency of this strategy on a random-effect model that describes the uncertainty propagation. A bottleneck with using bootstrap resampling in estimating input variances, however, is the need to “outwash” the simulation noise, which often places substantial burden on the required simulation effort. More precisely, to handle both the input and the simulation noises, the bootstrap procedure typically comprises a two-layer sampling that first resamples the input data (i.e., outer sampling), followed by running simulation replications using each resample (i.e., inner replications). Due to the reciprocal relation between the magnitude of the input variance and the input data, the input variance becomes increasingly small as the input data size increases. This deems the control of the relative estimation error increasingly expensive, and requires either a large outer bootstrap size or inner replication size to extinguish the effect of simulation noises.
The main goal of this paper is to investigate subsampling as a simulation saver for input variance estimation. This means that, instead of creating distributions by resampling a data set of the full size, we only resample (with or without replacement) a set of smaller size. We show that a judicious use of subsampling can reduce the total simulation effort from an order bigger than the data size in the conventional two-layer bootstrap to an order independent of the data size, while retaining the estimation accuracy. This approach leverages the interplay between the form of the input variance and its estimation error, in terms of the data size and the sampling effort in each layer of the bootstrap. On a high level, the subsample is used to estimate an input variance as if less data are available, followed by a correction of this discrepancy in the data size by properly rescaling the input variance. We call this approach proportionate subsampled variance bootstrap. We explicitly identify the procedural specifications in our approach that guarantee estimation consistency, including the minimally required simulation effort in each layer. We also study the corresponding optimal allocation rules.
In the statistics literature, subsampling has been used as a remedy for situations where the full-size bootstrap does not apply, due to a lack (or undeterminability) of uniform convergence required for its statistical consistency, which relates to the functional smoothness or regularity of the estimators (e.g., Politis and Romano (1994)). Subsampling has been used in time series and dependent data (e.g., Politis et al. (1999), Hall et al. (1995), Datta and McCormick (1995)), extremal estimation (e.g., Bickel and Sakov (2008)), shape-constrained estimation (e.g., Sen et al. (2010)) and other econometric contexts (e.g., Abadie and Imbens (2008), Andrews and Guggenberger (2009, 2010)). In contrary to these works, our subsampling approach is introduced to reduce the simulation effort faced by the two-layer sampling necessitated from the presence of both the input and simulation noises. In other words, we are not concerned about the issue of uniform convergence, but instead, we aim to distort the relation between the required simulation effort and data size in a way that allows more efficient deconvolution of the effects of the two noises.
use the percentile bootstrap to construct CIs, where the CI limits are determined from the quantiles of the bootstrap distributions.Yi and Xie (2017) investigates adaptive budget allocation policies to reduce simulation cost in the percentile bootstrap. Lam and Qian (2016, 2017) study the use of empirical likelihood, and Xie et al. (2018) studies nonparametric Bayesian methods to construct CIs. Glasserman and Xu (2014), Hu et al. (2012), Lam (2016b) and Ghosh and Lam (2015) study input uncertainty from a robust optimization viewpoint. In the parametric regime, Barton et al. (2013) and Xie et al. (2016) investigate the basic bootstrap with a metamodel built in advance, a technique known as the metamodel-assisted bootstrap. Cheng and Holland (1997) studies the delta method, and Cheng and Holland (1998, 2004) reduce its computation burden via the so-called two-point method. Lin et al. (2015) and Song and Nelson (2018) study regression approaches to estimate sensitivity coefficients which are used to apply the delta method, generalizing the gradient estimation method in Wieland and Schmeiser (2006). Zhu et al. (2015) studies risk criteria and computation to quantify parametric uncertainty. Finally, Chick (2001), Zouaoui and Wilson (2003), Zouaoui and Wilson (2004) and Xie et al. (2014) study variance estimation and interval construction from a Bayesian perspective. For general surveys on input uncertainty, readers are referred to Barton et al. (2002), Henderson (2003), Chick (2006), Barton (2012), Song et al. (2014), Lam (2016a), and Nelson (2013) Chapter 7.
The remainder of the paper is as follows. Section 2 introduces the input uncertainty problem and explains the simulation complexity bottleneck in the existing bootstrap schemes. Section 3 presents our subsampling idea, procedures and the main statistical results. Section 4 discusses the key steps in our theoretical developments. Section 5 reports our numerical experiments. Section 6 concludes the paper. All proofs are relegated to the Appendix.
2 Problem Motivation
This section describes the problem and our motivation. Section 2.1 first describes the input uncertainty problem, Section 2.2 presents the existing bootstrap approach, and Section 2.3 discusses its computational barrier, thus motivating our subsampling investigation. We aim to provide intuitive explanations in this section, and defer mathematical details to later sections.
2.1 The Input Uncertainty Problem
Suppose there are independent input processes driven by input distributions . We consider a generic performance measure that is simulable, i.e., given the input distributions, independent unbiased replications of can be generated in a computer. As a primary example, think of and as the interarrival and service time distributions in a queue, and is some output measure such as the mean queue length averaged over a time horizon.
The input uncertainty problem arises in situations where the input distributions are unknown but real-world data are available. One then has to use their estimates to drive the simulation. Denote a point estimate of as , where typically we take
with being a conditionally unbiased simulation replication driven by . This point estimate is affected by both the input statistical noises and the simulation noises. By conditioning on the estimated input distributions (or viewing the point estimate as a random effect model with uncorrelated input and simulation noises), the variance of can be expressed as
is interpreted as the overall input variance, and
as the variance contributed from the simulation noises. Assuming that the estimates ’s are consistent in estimating ’s, then, as grows, is approximately and can be estimated by taking the sample variance of all simulation replications (see, e.g., Cheng and Holland (1997)). The key and the challenge in quantifying input uncertainty is to estimate .
To this end, suppose further that for each input model , we have i.i.d. data generated from the distribution . When ’s are large, typically the overall input variance is decomposable into
where is the variance contributed from the data noise for model , with being a constant. In the parametric case where comes from a parametric family containing the estimated parameters, this decomposition is well known from the delta method (Asmussen and Glynn (2007), Chapter 3). Here, is typically , where is the collection of sensitivity coefficients, i.e., the gradient, with respect to the parameters in model , and is the asymptotic estimation variance of the point estimates of these parameters (scaled reciprocally with ). In the nonparametric case where the empirical distribution is used (where denotes the delta measure at ), (2) still holds under mild conditions (e.g., Assumption 4.1 in the sequel). In this setting the quantity is equal to , where is the influence function (Hampel (1974)) of with respect to the distribution , whose domain is the value space of the input variate , and denotes the variance under
. The influence function can be viewed as a functional derivative taken with respect to the probability distributions’s (see Serfling (2009), Chapter 6), and dictates the first-order asymptotic behavior of the plug-in estimate of .
Under further regularity conditions, a Gaussian approximation holds for so that
is an asymptotically tight -level CI for , where is the standard normal quantile. This CI, which provides a bound-based alternative to quantify input uncertainty, again requires a statistically valid estimate of or (and ).
We will investigate how to estimate or , focusing primarily on the nonparametric case. The commonest estimation technique in this context is bootstrap resampling, which we discuss next.
2.2 Bootstrap Resampling
Let represent the empirical distribution constructed using a bootstrap resample from the original data for input , i.e., points drawn by uniformly sampling with replacement from . The bootstrap variance estimator is , where denotes the variance over the bootstrap resamples from the data, conditional on .
The principle of bootstrap entails that . Here is obtained from a (hypothetical) infinite number of bootstrap resamples and simulation runs per resample. In practice, however, one would need to use a finite bootstrap size and a finite simulation size. This comprises conditionally independent bootstrap resamples of , and simulation replications driven by each realization of the resampled input distributions. This generally incurs two layers of Monte Carlo errors.
Denote as the -th simulation run driven by the -th bootstrap resample . Denote as the average of the simulation runs driven by the -th resample, and as the grand sample average from all the
runs. An unbiased estimator foris given by
To explain, the first term in (4) is an unbiased estimate of the variance of , which is (where denotes the expectation on ’s conditional on ’s), since incurs both the bootstrap noise and the simulation noise. In other words, the variance of is upward biased for . The second term in (4), namely , removes this bias. This bias adjustment can be derived by viewing as the variance of a conditional expectation. Alternately, can be viewed as a random effect model where each “group” corresponds to each realization of , and (4) estimates the “between-group” variance in an analysis-of-variance (ANOVA). Formula (4) has appeared in the input uncertainty literature, e.g., Cheng and Holland (1997), Song and Nelson (2015), Lin et al. (2015), and also in Zouaoui and Wilson (2004) in the Bayesian context. Algorithm 1 summarizes the procedure.
More generally, to estimate the variance contribution from the data noise of model only, namely , one can bootstrap only from and keep other input distributions fixed. Then and are used to drive the simulation runs. With this modification, the same formula (4) or Algorithm 1 is an unbiased estimate for , which is approximately by the bootstrap principle, in turn asymptotically equal to introduced in (2). This observation appeared in, e.g., Song et al. (2014); in Section 4 we give further justifications.
In subsequent discussions, we use the following notations. For any sequences and , both depending on some parameter, say, , we say that if for some constant for all sufficiently large , and if as . Alternately, we say if for some constant for all sufficiently large , and if as . We say that if as for some constants . We use
to represent a random variablethat has stochastic order at least , i.e., for any , there exists such that for . We use to represent a random variable that has stochastic order less than , i.e., . We use to represent a random variable that has stochastic order exactly at , i.e., satisfies but not .
2.3 A Complexity Barrier
We explain intuitively the total number of simulation runs needed to ensure that the variance bootstrap depicted above can meaningfully estimate the input variance. For convenience, we call this number the simulation complexity. This quantity turns out to be of order bigger than the data size. On a high level, it is because the input variance scales reciprocally with the data size (recall (2)). Thus, when the data size increases, the input variance becomes smaller and increasingly difficult to estimate with controlled relative error. This in turn necessitates the use of more simulation runs.
To explain more concretely, denote as a scaling of the data size, i.e., we assume all grow linearly with , which in particular implies that is of order . We analyze the error of in estimating . Since is unbiased for which is in turn close to , roughly speaking it suffices to focus on the variance of . To analyze this later quantity, we denote a generic simulation run in our procedure, , as
are the errors arising from the bootstrap of the input distributions and the simulation respectively. If is sufficiently smooth,
elicits a central limit theorem and is of order. On the other hand, the simulation noise is of order .
Via an ANOVA-type analysis as in Sun et al. (2011), we have
Now, putting and formally into (5), and ignoring constant factors, results in
The two terms in (6) correspond to the variances coming from the bootstrap resampling and the simulation runs respectively.
Since is of order , meaningful estimation of needs measured by the relative error. In other words, we want to achieve as the simulation budget grows. This property, which we call relative consistency, requires to have a variance of order in order to compensate for the decreasing order of .
We argue that this implies unfortunately that the total number of simulation runs, , must be , i.e., of order higher than the data size. To explain, note that the first term in (6) forces one to use , i.e., the bootstrap size needs to grow with , an implication that is quite natural. The second term in (6), on the other hand, dictates also that . Suppose, for the sake of contradiction, that and are chosen such that . Then, because we need , must be which, combining with , implies that and leads to a contradiction.
We summarize the above with the following result. Let be the total simulation effort, and recall as the scaling of the data size. We have: [Simulation complexity of the variance bootstrap] Under Assumptions 4.1-4.1 to be stated in Section 4.1, the required simulation budget to achieve relative consistency in estimating by Algorithm 1, i.e., , is .
Though out of the scope of this paper, there are indications that such a computational barrier occurs in other types of bootstrap. For instance, the percentile bootstrap studied in Barton and Schruben (1993, 2001) appears to also require an inner replication size large enough compared to the data size in order to obtain valid quantile estimates (the authors actually used one inner replication, but Barton (2012) commented that more is needed). Yi and Xie (2017) provides an interesting approach based on ranking and selection to reduce the simulation effort, though they do not investigate the order of the needed effort relative to the data size. The regression approach in Song and Nelson (2018) requires simulation effort of order where . The empirical likelihood framework studied in Lam and Qian (2017) requires a similarly higher order of simulation runs to estimate the influence function. Nonetheless, in this paper we focus only on how to reduce computation load in variance estimation.
3 Procedures and Guarantees in the Subsampling Framework
This section presents our methodologies and results on subsampling. Section 3.1 first explains the rationale and the subsampling procedure. Section 3.2 then presents our main theoretical guarantees, deferring some elaborate developments to Section 4.
3.1 Proportionate Subsampled Variance Bootstrap
As explained before, the reason why the in Algorithm 1 requires a huge simulation effort, as implied by its variance (6), lies in the small scale of the input variance. In general, in order to estimate a quantity that is of order , one must use a sample size more than so that the estimation error relatively vanishes. This requirement manifests in the inner replication size in constructing .
To reduce the inner replication size, we leverage the relation between the form of the input variance and the estimation variance depicted in (6) as follows. The approximate input variance contributed from model , with data size , has the form . If we use the variance bootstrap directly as in Algorithm 1, then we need an order more than total simulation runs due to (6). Now, pretend that we have fewer data, say , then the input variance will be , and the required simulation runs is now only of order higher than . An estimate of , however, already gives us enough information in estimating , because we can rescale our estimate of by to get an estimate of . Estimating can be done by subsampling the input distribution with size . With this, we can both use fewer simulation runs and also retain correct estimation via multiplying by a factor.
To make the above argument more transparent, the bootstrap principle and the asymptotic approximation of the input variance imply that
The subsampling approach builds on the observation that a similar relation holds for
where denotes a bootstrapped input distribution of size (i.e., an empirical distribution of size that is uniformly sampled with replacement from ). If we let for some so that (where is the floor function, i.e. the largest integer less than or equal to ), then we have
Multiplying both sides with , we get
Note that the right hand side above is the original input variance of interest. This leads to our proportionate subsampled variance bootstrap: We repeatedly subsample collections of input distributions from the data, with size for model , and use them to drive simulation replications. We then apply the ANOVA-based estimator in (4) on these replications, and multiply it by a factor of to obtain our final estimate. We summarize this procedure in Algorithm 2. The term “proportionate” refers to the fact that we scale the subsample size for all models with a single factor . For convenience, we call the subsample ratio.
Similar ideas apply to estimating the individual variance contribution from each input model, namely . Instead of subsampling all input distributions, we only subsample the distribution, say whose uncertainty is of interest, while fixing all the other distributions as the original empirical distributions, i.e., . All the remaining steps in Algorithm 2 remain the same (thus the “proportionate” part can be dropped). This procedure is depicted in Algorithm 3.
3.2 Statistical Guarantees
Algorithm 2 provides the following guarantees. Recall that is the total simulation effort, and is the scaling of the data size. We have the following result: [Procedural configurations to achieve relative consistency] Under Assumptions 4.1-4.1 to be stated in Section 4.1, if the parameters of Algorithm 2 are chosen such that
then the variance estimate is relatively consistent, i.e. . Theorem 3.2 tells us what orders of the bootstrap size , inner replication size and subsample ratio would guarantee a meaningful estimation of . Note that for each , so that is equivalent to setting the subsample size . In other words, we need the natural requirement that the subsample size grows with the data size, albeit can have an arbitrary rate.
Given a subsample ratio specified according to (7), the configurations of and under (7) that achieve the minimum overall simulation budget is and . This is because to minimize while satisfying the second requirement in (7), it is more economical to allocate as much budget to instead of . This is stated precisely as:
[Minimum configurations to achieve relative consistency] Under the same conditions of Theorem 3.2, given , the values of and to achieve (7) and hence relative consistency that requires the least order of effort are and , leading to a total simulation budget .
Note that is the order of the subsample size. Thus Corollary 3.2 implies that the required simulation budget must be of higher order than the subsample size. However, since the subsample size can be chosen to grow at an arbitrarily small rate, this implies that the total budget can also grow arbitrarily slowly. Therefore, we have:
[Simulation complexity of proportionate subsampled variance bootstrap] Under the same conditions of Theorem 3.2, the minimum required simulation budget to achieve relative consistency in estimating by Algorithm 2, i.e., , is by using .
Compared to Theorem 2.3, Corollary 3.2 stipulates that our subsampling approach reduces the required simulation effort from a higher order than to an arbitrary order, i.e., independent of the data size. This is achieved by using a subsample size that grows with at an arbitrary order, or equivalently a subsample ratio that grows faster that .
The following result describes the configurations of our scheme when a certain total simulation effort is given. In particular, it shows, for a given total simulation effort, the range of subsample ratio for which Algorithm 2 can possibly generate valid variance estimates by appropriately choosing and : [Valid subsample ratio given total budget] Assume the same conditions of Theorem 3.2. Given a total simulation budget , if the subsample ratio satisfies , then the bootstrap size and the inner replication size can be appropriately chosen according to criterion (7) to achieve relative consistency, i.e., .
The next result on the optimal configurations of our scheme will be useful: Assume the same conditions of Theorem 3.2. Given a simulation budget and a subsample ratio such that and , the optimal outer and inner sizes that minimize the order of the conditional mean squared error are
giving a conditional mean squared error . Note that the mean squared error, i.e. , of the Monte Carlo estimate is random because the underlying resampling is conditioned on the input data, therefore the bound at the end of Theorem 3.2 contains a stochastically vanishing term .
We next present the optimal tuning of the subsample ratio. This requires a balance of the trade-off between the input statistical error and the Monte Carlo simulation error. To explain, define
as the perfect form of our proportionate subsampled variance bootstrap introduced in Section 3.1, namely without any Monte Carlo noises, and is the subsample ratio. The overall error of by Algorithm 2 can be decomposed as
The first term is the Monte Carlo error for which the optimal outer size , inner size and the resulting mean square error are governed by Theorem 3.2. In particular, the mean squared error there shows that under a fixed simulation budget and the optimal allocation , the Monte Carlo error gets larger as increases. The second term is the statistical errors due to the finiteness of input data and . Since measures the amount of data contained in the resamples, we expect this second error to become smaller as increases. The optimal tuning of relies on balancing such a trade-off between the two sources of errors.
We have the following optimal configurations of , and altogether given a budget : [Optimal subsample size and budget allocation] Suppose Assumptions 4.1, 4.1-4.1 in Section 4.1 and Assumptions 4.3-4.3 in Section 4.3 hold. For a given simulation budget , if the subsample ratio and outer and inner sizes for Algorithm 2 are set to
then the gross error , where the leading term has a mean squared error
Moreover, if and at least one of the ’s are positive definite, where and are as defined in Lemma 4.3, then (12) holds with an exact order (i.e., becomes ) and the configuration (10), (11) is optimal in the sense that no configuration gives rise to a gross error .
Note from (12) that, if the budget , our optimal configurations guarantee the estimation mean square error decays faster than . Recall that the input variance is of order , and thus an estimation error of order higher than ensures that the estimator is relatively consistent in the sense . This recovers the result in Corollary 3.2.
We comment that all the results in this section hold if one estimates the individual variance contribution from each input model , namely by using Algorithm 3. In this case we are interested in estimating the variance , and relative consistency means . The data size scaling parameter can be replaced by in all our results.
Finally, we also comment that the complexity barrier described in Section 2.3 and our framework presented in this section applies in principle to the parametric regime, i.e., when the input distributions are known to lie in parametric families with unknown parameters. The assumptions and the mathematical details would need to be catered to that situation. Nonetheless, we note that more techniques are available in the parametric context, including the delta method and metamodels, and thus the motivation for investigating efficient direct bootstrap schemes is less imminent than in the nonparametric case.
4 Developments of Theoretical Results
We present our main developments leading to the algorithms and results in Section 3. Section 4.1 first states in detail our assumptions on the performance measure. Section 4.2 presents the theories leading to estimation accuracy, simulation complexity and optimal budget allocation in the proportionate subsampled variance bootstrap. Section 4.3 investigates optimal subsample sizes that lead to overall best configurations.
4.1 Regularity Assumptions
We first assume that the data sets for all input models are of comparable size. [Balanced data] for some finite constant as . Recall in Sections 2 and 3 that we have denoted as a scaling of the data size. More concretely, we take as the average input data size under Assumption 4.1.
We next state a series of general assumptions on the performance measure . These assumptions hold for common finite-horizon measures, as we will present. For each let be the support of the -th true input model , and the collection of distributions be the convex hull spanned by and all Dirac measures on , i.e.
We assume the following differentiability of the performance measure. [First order differentiability] For any distributions , denote for . Assume there exist functions