Discrete-event simulation is often used to assess the performance of complex stochastic systems, especially in the situations where the direct analytical solution and physical experiments are infeasible or prohibitive [banks2010discrete]. Advanced computer architectures have made parallel computing available and popular in many engineering and scientific areas. Nelson in [Nelson_2016] raises new research questions about how to exploit this computing advantage in estimating simulation system performance, especially risk measures, such as quantiles. As mentioned in [Nelson_2016], a fundamental challenge is how to efficiently utilize all available parallel computing processors to improve the estimation accuracy.
In this paper, we consider the steady-state system performance. A single run of simulation generates a sample path of detailed outputs with a given run-length. For various existing system performance (e.g., mean and risk) estimation approaches, both variance and bias of their estimators can be reduced by increasing the run-length[Nelson_2016]. Since the detailed outputs in a simulation sample path are generated sequentially, it could be challenging to chop a dependent sample path into chunks and run separately in parallel from multiple processors. As a result, the run-length becomes the key bottleneck in improving the computational efficiency with parallel simulation.
For the classical quantile estimation approach, we typically calculate the sample quantile of outputs from each simulation run and then take the average of the quantile estimators from multiple replications. The asymptotic properties of this estimator have been well-studied in the literature (e.g., [sun2010asymptotic]). As noted in [heidelberger1984quantile], accurate estimators of tail quantile measures greatly rely on a sufficiently large run-length, which could be a luxury for complex and fast-evolving stochastic systems. For example, we are interested in the 95% quantile of waiting time in a queueing service system. Since the system is required to adapt fast to the evolving demand, we need to assess the system performance and make decisions under a certain tight time deadline. Notice that the run-length greatly relates to the simulation running time before the deadline. Therefore, the parallel processors can increase the number of replication and utilize the available parallel processor under an urgent deadline. However, the classical quantile estimators may not make efficient use of the detailed output sample paths. This could impact the quantile estimation accuracy, especially when the time budget is tight.
In this paper, we introduce a quantile estimator which is computed by directly pooling the detailed simulation outputs from various replications to obtain a sample quantile estimator, called pooled sample quantile estimator. By pooling the dependent (within each replication) and independent (cross different replications) simulation outputs, the resulted sample quantile is introduced to estimate the quantile. The pooled quantile estimator has been investigated under independent and identically distributed observations, such as [asmussen2007stochastic, nakayama2014confidence]
, and recently been used to construct the confidence interval of quantile estimators in[alexopoulos2019sequest]. In this paper, we develop the asymptotic results of the pooled quantile estimator based on the framework in [sen1972bahadur]. Our asymptotic results show that the proposed estimator has better performance especially under the situations with multiple processors and urgent time deadlines, i.e., the job request is too urgent to produce sufficiently longer sample paths. We highlight our contribution as follows.
We provide the asymptotic results of the pooled quantile estimator that are generated from multiple replications of dependent sequences.
We illustrate how the pooled quantile estimator can be used to improve the accuracy of the classical average quantile estimator under the context of parallel simulation.
2 A pooled quantile estimator for parallel simulation
Our goal is to estimate the quantile of the marginal distribution for steady-state simulation. Let, and denote the -level quantile by , where .
We consider the detailed outputs generated from the steady-state stochastic simulation model,
with independent sample paths each with run-length . Particularly, denotes the -th element of the -th sample path output. Notice that different sample paths are independent with each other, but entries within each sample path are element-wise dependent. The pooled outputs in (1) contain entries, which construct an empirical CDF:
where is the indicator function. Let denote the order statistics over all the entries in (1). We define the -level pooled quantile estimator obtained by combining all the entries by
where denotes the smallest integer greater than or equal to .
As noted earlier in Section 1, the classical quantile estimator (see [chen2006quantile, bekki2009indirect] for examples) is to obtain the -level sample quantile from each replication, with for . The final estimator is
which is referred to as the average quantile estimator in this paper. The asymptotic properties of each individual have been studied by a vast collection of papers; see for example [sen1972bahadur]. Thus, the corresponding asymptotic properties of the average quantile estimator can be directly developed based on the mutual independence among for .
The pooled quantile estimator has been investigated under independent and identically distributed (i.i.d.) observations, such as [asmussen2007stochastic, nakayama2014confidence]. In [alexopoulos2019sequest], the comparison of the asymptotic properties between the pooled quantile estimator and the average quantile estimator based on i.i.d. observations have been stated. Also, the pooled estimator from dependent sequences is used to adjust the confidence interval of quantile estimators. However, the comparison of asymptotic properties between the pooled quantile estimator and the average quantile estimator based on multiple replications of dependent sequences has not been formally stated in the literature. To fill this gap, we provide the theoretical comparison of the pooled quantile estimator and the average quantile estimator. Our theory is developed based on Assumptions 2.1–2.4.
For each replication (), is a stationary sequence of -mixing random variables, i.e., for the fields and generated by and respectively, we have
where and , and with , and
is continuous and positive in the neighborhood of .
is positive and bounded in the neighborhood of .
Assumption 2.1 is called the -mixing condition, which is commonly adopted in the steady-state simulation output analysis [chen2000batching, steiger2001convergence]. It states that the serial dependency decreases as the lag increases. The studies in [bradley2005basic] and [bradley1986basic] provided the results of theoretically verifying the
-mixing condition for some popular examples of dependent sequences, such as Markov chains, stationary Gaussian processes, and etc. Assumption2.2 is that we normally consider the run-length far larger than the number of replications, and it also matches the situation in parallel computing where the number of available processors is often less than the run-length for the steady-state simulation. Assumptions 2.3 and 2.4 are the common assumptions for developing asymptotic representations of quantile estimators.
3 The asymptotic representation of the pooled quantile estimator
The asymptotic representation of sample quantile has been investigated under both independent data and dependent sequence data. For the pooled quantile estimator of simulation outputs from independent replications, asymptotic characterization is still missing in the literature. We aim to fill this gap and develop the asymptotic representation for the quantile estimator with pooled sample paths and provide the theoretical insights on how to deploy multiple processors to improve the estimation of system quantile response. We first provide the asymptotic representation of the pooled quantile estimator through Theorem 1, and then Theorem 2 gives its asymptotic distribution. The proofs of these Theorems follows a similar logic as in [sen1972bahadur], and extend their results to incorporate multiple replications of dependent sample paths.
Consider a small neighborhood around the true -level quantile , denoted by . Under Assumptions 2.1–2.3, as ,
almost surely. Further under Assumption 2.4, we have,
Let , where , and . Then for all , we have that
Since , by the Mean-Value Theorem,
and almost surely.
For , let . Notice that is 0-1 valued, and such that,
According to the Mean-Value Theorem and the definition of , we get . By directly applying Lemma 2, we have that, as ,
if let in Lemma 2.
For , let , and we could derive the same results. According to the Bonferroni Inequality,
Then, by Borel-Cantelli Lemma [serfling2009approximation],
holds almost surely. Then Equation (7) holds.
We now prove (8). Let ,
where and . Since as ,
From Lemma 1, as ,
holds almost surely. Thus, we have that
holds almost surely. Similarly,
By the monotonicity of , as ,
and . Thus, we have that,
as almost surely. Therefore, under Assumption 2.4, the conclusion holds by setting in (7).
Notice that Equation (8) can be rewritten as,
which gives the Bahadur representation of sample quantile of the pooled sample paths. Now we consider the asymptotic distribution of the estimator . Let for , and . Following the general definition in literature (e.g., [sen1972bahadur]), we denote
where , which is the same for all replications with . Under the setting of pooled sample paths in this paper, we obtain that
Under Assumptions 2.1–2.3, and , ,
According to Theorem 1, we have
. By the central limit theorem formixing variables (see [sen1972bahadur] for example), and by the independence between replications,
On the other hand, . As ,
On the other hand, for the classical sample quantile given by (4), we can directly apply the results from [sen1972bahadur] to obtain
by the mutual independence among different replications. Asymptotically, achieves the same variance as , while the bias of is in a smaller order than the bias of . We see that the bias of decreases as we increase the run-length, whereas stays the same as we increase the number of replications. Different from the classical quantile estimator, the bias of decreases as we increase either the run-length or the number of replications. Comparing these two estimators using the mean squared error (MSE), the pooled quantile estimator can achieve smaller MSE than the classical average quantile estimator.
As discussed earlier, multiple replications could be assigned to parallel processors. Thus, given a tight decision time, different replications can be allocated to parallel processors to provide a quantile estimator under this time constraint. With that said, the fixed decision time implies that the run-length is fixed to be , while the number of replications can be increased depending on how many parallel processors are available. Under this situation, the bias of stays at the same order no matter how many processors have been adopted to enlarge the number of replications. Different from , the bias of decreases as we increase the number of parallel processors. In Section 4, we use an empirical example to demonstrate that the pooled quantile estimator outperforms the classical average quantile estimator when the run-length is insufficient due to an urgent deadline.
4 Numerical Study
In this section, we provide an empirical example to illustrate the performance of the proposed approach under the parallel computing setting. We use MSE to demonstrate the performance of different estimators. In the following examples, the MSE is computed with 100 micro-replications:
where is the estimator for -level quantile from the pooled or the classical average method at the -th micro-replication. Two examples AR(1) and M/M/1 queue are used to generate the dependent sequences.
We consider that parallel processors are available to use, and the simulation running in each processor generate one replication of the dependent sequence. Two situations are considered: (1) there is an urgent deadline to provide a quantile estimator, so we only have time to generate dependent sequences with run-length ; and (2) the deadline to provide a quantile estimator is not urgent, and we are able to generate dependent sequences with sufficient run-length .
Example 1: AR(1) We consider an AR(1) process. For one replication of the dependent sequence, the outputs are given by
is a white noise process with zero mean and variance. We fix the mean and variance . The correlation parameter is varying from 0.3, 0.5, to 0.9. We estimate the quantiles of the outputs with 50%, 95%, and compare the performance of the pooled estimator in (3) with the classical average quantile estimator in (4). The results of MSEs under urgent deadline and non-urgent deadline are given in Figures 1 and 2, respectively.
Example 2: M/M/1 Queue We consider the steady-state Queueing system. We fix the arrival rate to be 1, and vary the utilization (traffic intensity) to be 0.7 or 0.9. We estimate the quantiles of time staying in the system with 50%, 95%, and compare the performance of the pooled estimator in (3) with the classical average quantile estimator in (4). The results of MSEs under urgent deadline and non-urgent deadline are given in Figures 3 and 4, respectively.
The results shown in Figures 1 and 3 represent the performances of the quantile estimators under an urgent deadline (i.e., run-length ), whereas the results shown in Figures 2 and 4 represent the performances of the quantile estimators under a non urgent deadline (i.e., run-length ). The y-axis represents the estimated MSE and the x-axis is the number of processors , and different scenarios are labeled on top of each sub-figure. For the cases representing an urgent deadline, the pooled quantile estimator gives smaller or competitive MSEs compared to the average quantile estimator. This demonstrates the benefits of using the pooled quantile estimator when there is not sufficient time to generate a lengthy sequence. For the cases representing a non-urgent deadline, we have sufficient time to generate dependent sequences with a relatively large run-length on each processor. As also demonstrated in the theoretical comparison, the MSEs from the pooled estimator significantly outperform the average quantile estimator if the run-length is sufficient.
As a summary, we study the pooled quantile estimator, which takes the sample quantile by pooling independently generated sample paths together. We develop the asymptotic representation of the proposed quantile estimator generated from multiple replications of dependent sequences. Compared with the classical average quantile estimator, the pooled quantile estimator demonstrates better asymptotic and finite-sample performance under the context of parallel simulation. The pooled quantile estimator can be advanced by combining various existing variance reduction techniques in the literature. Hence, as a promising future direction, the accuracy of the pooled quantile estimator can be further improved by incorporating existing techniques, such as control variates (i.e., [hsu1990control, hesterberg1998control]), importance sampling and stratified sampling (i.e., [glynn1996importance, glasserman2000variance, sun2010asymptotic]), antithetic variates and Latin hypercube sampling (i.e., [avramidis1998correlation, jin2003probabilistic]), as well as bias correction techniques (i.e., [matthys2004estimating, gomes2006bias, gomes2007sturdy]). Also, the -mixing condition used in our development can be difficult to check as mentioned in [alexopoulos2019sequest]. It is a promising future direction to extend the asymptotic properties in this paper based on the milder conditions in [wu2005bahadur]. Also, the trade-off between one single replication with multiple replications is an important issue to address for future study. If there is a choice to split one single replication into multiple replications, we may not simply assume that the total simulation cost is . The effort spending on the warm-up simulation procedure to achieve steady-state should also be considered. For the case with multiple replications, the warm-up procedure to generate steady-state outputs can not be negligible. It is critically important to further investigate how to balance the trade-off between one single replication with multiple replications.
Appendix A Some Useful Lemmas
We first consider that are series of 0-1 valued random variables which satisfies the same mixing condition as given through (5) and (6), and sharing the same marginal distribution . Assume that and , then we can have the following lemma by extending the results in Section 4 of [sen1972bahadur].
For a positive () that the mixing condition holds, as and ,
By Markov inequality, we would have,
And we can rewrite with , and choose integer , , and be the largest positive integer s.t. , notice that
. Then from the independence between replications and inequality between arithmetic and geometric means, we have,
Then, for ,
and . Applying those recursively yields:
Then, for any ,
By selecting , we can have,
where the equality is obtained by applying Taylor’s expansion on and , and requires . Notice that , for , we have , so (19) can directly follow Borel-Cantelli Lemma [serfling2009approximation].
We now consider sequence of 0-1 valued random variables s.t. and for all and . We can define , similar as before by replacing with .
If there exists positive and such that , for every positive and , there exists positive and , such that for ,
For simplicity, let , and then , follow the similar procedure in the proof of Lemma 1,
By selecting , we can have,
where the last equality is obtained by applying Taylor’s expansion on and , and requires . Since , , we have that