1 Introduction
Discreteevent simulation is often used to assess the performance of complex stochastic systems, especially in the situations where the direct analytical solution and physical experiments are infeasible or prohibitive [banks2010discrete]. Advanced computer architectures have made parallel computing available and popular in many engineering and scientific areas. Nelson in [Nelson_2016] raises new research questions about how to exploit this computing advantage in estimating simulation system performance, especially risk measures, such as quantiles. As mentioned in [Nelson_2016], a fundamental challenge is how to efficiently utilize all available parallel computing processors to improve the estimation accuracy.
In this paper, we consider the steadystate system performance. A single run of simulation generates a sample path of detailed outputs with a given runlength. For various existing system performance (e.g., mean and risk) estimation approaches, both variance and bias of their estimators can be reduced by increasing the runlength
[Nelson_2016]. Since the detailed outputs in a simulation sample path are generated sequentially, it could be challenging to chop a dependent sample path into chunks and run separately in parallel from multiple processors. As a result, the runlength becomes the key bottleneck in improving the computational efficiency with parallel simulation.For the classical quantile estimation approach, we typically calculate the sample quantile of outputs from each simulation run and then take the average of the quantile estimators from multiple replications. The asymptotic properties of this estimator have been wellstudied in the literature (e.g., [sun2010asymptotic]). As noted in [heidelberger1984quantile], accurate estimators of tail quantile measures greatly rely on a sufficiently large runlength, which could be a luxury for complex and fastevolving stochastic systems. For example, we are interested in the 95% quantile of waiting time in a queueing service system. Since the system is required to adapt fast to the evolving demand, we need to assess the system performance and make decisions under a certain tight time deadline. Notice that the runlength greatly relates to the simulation running time before the deadline. Therefore, the parallel processors can increase the number of replication and utilize the available parallel processor under an urgent deadline. However, the classical quantile estimators may not make efficient use of the detailed output sample paths. This could impact the quantile estimation accuracy, especially when the time budget is tight.
In this paper, we introduce a quantile estimator which is computed by directly pooling the detailed simulation outputs from various replications to obtain a sample quantile estimator, called pooled sample quantile estimator. By pooling the dependent (within each replication) and independent (cross different replications) simulation outputs, the resulted sample quantile is introduced to estimate the quantile. The pooled quantile estimator has been investigated under independent and identically distributed observations, such as [asmussen2007stochastic, nakayama2014confidence]
, and recently been used to construct the confidence interval of quantile estimators in
[alexopoulos2019sequest]. In this paper, we develop the asymptotic results of the pooled quantile estimator based on the framework in [sen1972bahadur]. Our asymptotic results show that the proposed estimator has better performance especially under the situations with multiple processors and urgent time deadlines, i.e., the job request is too urgent to produce sufficiently longer sample paths. We highlight our contribution as follows.
We provide the asymptotic results of the pooled quantile estimator that are generated from multiple replications of dependent sequences.

We illustrate how the pooled quantile estimator can be used to improve the accuracy of the classical average quantile estimator under the context of parallel simulation.
2 A pooled quantile estimator for parallel simulation
Our goal is to estimate the quantile of the marginal distribution for steadystate simulation. Let
be a random variable representing a single entry in a detailed simulation output sample path. We denote the marginal cumulative distribution function (CDF) by
, and denote the level quantile by , where .We consider the detailed outputs generated from the steadystate stochastic simulation model,
(1) 
with independent sample paths each with runlength . Particularly, denotes the th element of the th sample path output. Notice that different sample paths are independent with each other, but entries within each sample path are elementwise dependent. The pooled outputs in (1) contain entries, which construct an empirical CDF:
(2) 
where is the indicator function. Let denote the order statistics over all the entries in (1). We define the level pooled quantile estimator obtained by combining all the entries by
(3) 
where denotes the smallest integer greater than or equal to .
As noted earlier in Section 1, the classical quantile estimator (see [chen2006quantile, bekki2009indirect] for examples) is to obtain the level sample quantile from each replication, with for . The final estimator is
(4) 
which is referred to as the average quantile estimator in this paper. The asymptotic properties of each individual have been studied by a vast collection of papers; see for example [sen1972bahadur]. Thus, the corresponding asymptotic properties of the average quantile estimator can be directly developed based on the mutual independence among for .
The pooled quantile estimator has been investigated under independent and identically distributed (i.i.d.) observations, such as [asmussen2007stochastic, nakayama2014confidence]. In [alexopoulos2019sequest], the comparison of the asymptotic properties between the pooled quantile estimator and the average quantile estimator based on i.i.d. observations have been stated. Also, the pooled estimator from dependent sequences is used to adjust the confidence interval of quantile estimators. However, the comparison of asymptotic properties between the pooled quantile estimator and the average quantile estimator based on multiple replications of dependent sequences has not been formally stated in the literature. To fill this gap, we provide the theoretical comparison of the pooled quantile estimator and the average quantile estimator. Our theory is developed based on Assumptions 2.1–2.4.
Assumption 2.1.
For each replication (), is a stationary sequence of mixing random variables, i.e., for the fields and generated by and respectively, we have
(5) 
where and , and with , and
(6) 
Assumption 2.2.
as .
Assumption 2.3.
is continuous and positive in the neighborhood of .
Assumption 2.4.
is positive and bounded in the neighborhood of .
Assumption 2.1 is called the mixing condition, which is commonly adopted in the steadystate simulation output analysis [chen2000batching, steiger2001convergence]. It states that the serial dependency decreases as the lag increases. The studies in [bradley2005basic] and [bradley1986basic] provided the results of theoretically verifying the
mixing condition for some popular examples of dependent sequences, such as Markov chains, stationary Gaussian processes, and etc. Assumption
2.2 is that we normally consider the runlength far larger than the number of replications, and it also matches the situation in parallel computing where the number of available processors is often less than the runlength for the steadystate simulation. Assumptions 2.3 and 2.4 are the common assumptions for developing asymptotic representations of quantile estimators.3 The asymptotic representation of the pooled quantile estimator
The asymptotic representation of sample quantile has been investigated under both independent data and dependent sequence data. For the pooled quantile estimator of simulation outputs from independent replications, asymptotic characterization is still missing in the literature. We aim to fill this gap and develop the asymptotic representation for the quantile estimator with pooled sample paths and provide the theoretical insights on how to deploy multiple processors to improve the estimation of system quantile response. We first provide the asymptotic representation of the pooled quantile estimator through Theorem 1, and then Theorem 2 gives its asymptotic distribution. The proofs of these Theorems follows a similar logic as in [sen1972bahadur], and extend their results to incorporate multiple replications of dependent sample paths.
Theorem 1.
Consider a small neighborhood around the true level quantile , denoted by . Under Assumptions 2.1–2.3, as ,
(7) 
almost surely. Further under Assumption 2.4, we have,
(8) 
almost surely.
Proof.
Let , where , and . Then for all , we have that
Since , by the MeanValue Theorem,
and almost surely.
For , let . Notice that is 01 valued, and such that,
and .
According to the MeanValue Theorem and the definition of , we get . By directly applying Lemma 2, we have that, as ,
if let in Lemma 2.
For , let , and we could derive the same results. According to the Bonferroni Inequality,
Then, by BorelCantelli Lemma [serfling2009approximation],
holds almost surely. Then Equation (7) holds.
We now prove (8). Let ,
where and . Since as ,
From Lemma 1, as ,
holds almost surely. Thus, we have that
(9) 
holds almost surely. Similarly,
By the monotonicity of , as ,
and . Thus, we have that,
(10) 
as almost surely. Therefore, under Assumption 2.4, the conclusion holds by setting in (7).
∎
Notice that Equation (8) can be rewritten as,
(11) 
which gives the Bahadur representation of sample quantile of the pooled sample paths. Now we consider the asymptotic distribution of the estimator . Let for , and . Following the general definition in literature (e.g., [sen1972bahadur]), we denote
(12) 
where , which is the same for all replications with . Under the setting of pooled sample paths in this paper, we obtain that
Theorem 2.
Under Assumptions 2.1–2.3, and , ,
(13) 
Proof.
According to Theorem 1, we have
(14) 
as
. By the central limit theorem for
mixing variables (see [sen1972bahadur] for example), and by the independence between replications,(15) 
On the other hand, . As ,
(16) 
where . Since is continuous in some neighborhood of , , and from Theorem 1, , then as , . By applying (14), (15) and (16), and the Slutsky’s Theorem, (13) holds. ∎
According to (11) and (13), for the sample quantile given by (3), we have the following asymptotic bias and variance:
(17) 
On the other hand, for the classical sample quantile given by (4), we can directly apply the results from [sen1972bahadur] to obtain
(18) 
by the mutual independence among different replications. Asymptotically, achieves the same variance as , while the bias of is in a smaller order than the bias of . We see that the bias of decreases as we increase the runlength, whereas stays the same as we increase the number of replications. Different from the classical quantile estimator, the bias of decreases as we increase either the runlength or the number of replications. Comparing these two estimators using the mean squared error (MSE), the pooled quantile estimator can achieve smaller MSE than the classical average quantile estimator.
As discussed earlier, multiple replications could be assigned to parallel processors. Thus, given a tight decision time, different replications can be allocated to parallel processors to provide a quantile estimator under this time constraint. With that said, the fixed decision time implies that the runlength is fixed to be , while the number of replications can be increased depending on how many parallel processors are available. Under this situation, the bias of stays at the same order no matter how many processors have been adopted to enlarge the number of replications. Different from , the bias of decreases as we increase the number of parallel processors. In Section 4, we use an empirical example to demonstrate that the pooled quantile estimator outperforms the classical average quantile estimator when the runlength is insufficient due to an urgent deadline.
4 Numerical Study
In this section, we provide an empirical example to illustrate the performance of the proposed approach under the parallel computing setting. We use MSE to demonstrate the performance of different estimators. In the following examples, the MSE is computed with 100 microreplications:
where is the estimator for level quantile from the pooled or the classical average method at the th microreplication. Two examples AR(1) and M/M/1 queue are used to generate the dependent sequences.
We consider that parallel processors are available to use, and the simulation running in each processor generate one replication of the dependent sequence. Two situations are considered: (1) there is an urgent deadline to provide a quantile estimator, so we only have time to generate dependent sequences with runlength ; and (2) the deadline to provide a quantile estimator is not urgent, and we are able to generate dependent sequences with sufficient runlength .
Example 1: AR(1) We consider an AR(1) process. For one replication of the dependent sequence, the outputs are given by
where
is a white noise process with zero mean and variance
. We fix the mean and variance . The correlation parameter is varying from 0.3, 0.5, to 0.9. We estimate the quantiles of the outputs with 50%, 95%, and compare the performance of the pooled estimator in (3) with the classical average quantile estimator in (4). The results of MSEs under urgent deadline and nonurgent deadline are given in Figures 1 and 2, respectively.Example 2: M/M/1 Queue We consider the steadystate Queueing system. We fix the arrival rate to be 1, and vary the utilization (traffic intensity) to be 0.7 or 0.9. We estimate the quantiles of time staying in the system with 50%, 95%, and compare the performance of the pooled estimator in (3) with the classical average quantile estimator in (4). The results of MSEs under urgent deadline and nonurgent deadline are given in Figures 3 and 4, respectively.
The results shown in Figures 1 and 3 represent the performances of the quantile estimators under an urgent deadline (i.e., runlength ), whereas the results shown in Figures 2 and 4 represent the performances of the quantile estimators under a non urgent deadline (i.e., runlength ). The yaxis represents the estimated MSE and the xaxis is the number of processors , and different scenarios are labeled on top of each subfigure. For the cases representing an urgent deadline, the pooled quantile estimator gives smaller or competitive MSEs compared to the average quantile estimator. This demonstrates the benefits of using the pooled quantile estimator when there is not sufficient time to generate a lengthy sequence. For the cases representing a nonurgent deadline, we have sufficient time to generate dependent sequences with a relatively large runlength on each processor. As also demonstrated in the theoretical comparison, the MSEs from the pooled estimator significantly outperform the average quantile estimator if the runlength is sufficient.
5 Discussion
As a summary, we study the pooled quantile estimator, which takes the sample quantile by pooling independently generated sample paths together. We develop the asymptotic representation of the proposed quantile estimator generated from multiple replications of dependent sequences. Compared with the classical average quantile estimator, the pooled quantile estimator demonstrates better asymptotic and finitesample performance under the context of parallel simulation. The pooled quantile estimator can be advanced by combining various existing variance reduction techniques in the literature. Hence, as a promising future direction, the accuracy of the pooled quantile estimator can be further improved by incorporating existing techniques, such as control variates (i.e., [hsu1990control, hesterberg1998control]), importance sampling and stratified sampling (i.e., [glynn1996importance, glasserman2000variance, sun2010asymptotic]), antithetic variates and Latin hypercube sampling (i.e., [avramidis1998correlation, jin2003probabilistic]), as well as bias correction techniques (i.e., [matthys2004estimating, gomes2006bias, gomes2007sturdy]). Also, the mixing condition used in our development can be difficult to check as mentioned in [alexopoulos2019sequest]. It is a promising future direction to extend the asymptotic properties in this paper based on the milder conditions in [wu2005bahadur]. Also, the tradeoff between one single replication with multiple replications is an important issue to address for future study. If there is a choice to split one single replication into multiple replications, we may not simply assume that the total simulation cost is . The effort spending on the warmup simulation procedure to achieve steadystate should also be considered. For the case with multiple replications, the warmup procedure to generate steadystate outputs can not be negligible. It is critically important to further investigate how to balance the tradeoff between one single replication with multiple replications.
References
Appendix A Some Useful Lemmas
We first consider that are series of 01 valued random variables which satisfies the same mixing condition as given through (5) and (6), and sharing the same marginal distribution . Assume that and , then we can have the following lemma by extending the results in Section 4 of [sen1972bahadur].
Lemma 1.
For a positive () that the mixing condition holds, as and ,
(19) 
Proof.
By Markov inequality, we would have,
And we can rewrite with , and choose integer , , and be the largest positive integer s.t. , notice that
. Then from the independence between replications and inequality between arithmetic and geometric means, we have,
According to the mixing condition (5), for every , and from condition (6) and the choice of , we have,
Then, for ,
(20) 
and . Applying those recursively yields:
(21) 
Then, for any ,
(22) 
By selecting , we can have,
where the equality is obtained by applying Taylor’s expansion on and , and requires . Notice that , for , we have , so (19) can directly follow BorelCantelli Lemma [serfling2009approximation].
∎
We now consider sequence of 01 valued random variables s.t. and for all and . We can define , similar as before by replacing with .
Lemma 2.
If there exists positive and such that , for every positive and , there exists positive and , such that for ,
(23) 
Proof.
For simplicity, let , and then , follow the similar procedure in the proof of Lemma 1,
where is also defined similarly as in Lemma 1, by replacing with . Again we choose , then (20), (21) still hold. Then for any ,
By selecting , we can have,
where the last equality is obtained by applying Taylor’s expansion on and , and requires . Since , , we have that
(24) 
Comments
There are no comments yet.