# A Multi-Level Simulation Optimization Approach for Quantile Functions

Quantile is a popular performance measure for a stochastic system to evaluate its variability and risk. To reduce the risk, selecting the actions that minimize the tail quantiles of some loss distributions is typically of interest for decision makers. When the loss distribution is observed via simulations, evaluating and optimizing its quantile functions can be challenging, especially when the simulations are expensive, as it may cost a large number of simulation runs to obtain accurate quantile estimators. In this work, we propose a multi-level metamodel (co-kriging) based algorithm to optimize quantile functions more efficiently. Utilizing non-decreasing properties of quantile functions, we first search on cheaper and informative lower quantiles which are more accurate and easier to optimize. The quantile level iteratively increases to the objective level while the search has a focus on the possible promising regions identified by the previous levels. This enables us to leverage the accurate information from the lower quantiles to find the optimums faster and improve algorithm efficiency.

## Authors

• 2 publications
• 2 publications
• 1 publication
10/13/2019

### A Pooled Quantile Estimator for Parallel Simulations

Quantile is an important risk measure quantifying the stochastic system ...
03/08/2019

### Nonparametric smoothing for extremal quantile regression with heavy tailed distributions

In several different fields, there is interest in analyzing the upper or...
11/12/2021

### Learning Quantile Functions without Quantile Crossing for Distribution-free Time Series Forecasting

Quantile regression is an effective technique to quantify uncertainty, f...
09/07/2015

### Risk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures

In this paper, we consider a finite-horizon Markov decision process (MDP...
07/27/2019

### Modellvalidierung mit Hilfe von Quantil-Quantil-Plots unter Solvency II (Model validation on the basis of quantile-quantile-plots under Solvency II)

After several years of development, the Solvency II-project has finally ...
02/18/2021

### Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms

Estimating the distribution and quantiles of data is a foundational task...
06/01/2019

### Approximate Quantiles for Datacenter Telemetry Monitoring

Datacenter systems require efficient troubleshooting and effective resou...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Traditionally, the mean of the response is a widely-used performance measure of a stochastic system. However, the mean itself is not able to evaluate the possible variability or describe the entire distribution adequately. To provide more thorough profiles of the response distribution, the quantile has become increasingly popular and of great interest in many fields, including insurance, engineering safety, finance, and healthcare (wipplinger2007philippe, morgan1996riskmetrics, cope2009challenges). In risk management, quantile, also termed as Value-at-risk (VaR), is one of the primary risk measures to quantify and interpret the risk that one system may face. For instance, in the finance industry, the quantile of a loss distribution represents the lower bound of large losses that the investor can suffer from an activity, where the large losses are defined to be the upper -tail of distribution with very close to 1 (like 0.95, 0.99) (hong2009simulating).

Optimizing the quantiles of loss functions is a common practice for decision-makers to manage the risk. In this case, searching the best design with the smallest

-quantile of will return the desired decision. More formally, for design choice ( is the design space assumed compact), we want to minimize the -quantile, , for loss function (with and

defined as the cumulative distribution function and probability density function of

):

 minx∈X  vα(L(x)). (1)

As large losses are typically of interest, in this work, we consider high quantiles (whose level is close to 1) of .

### 1.1 Motivation

The optimization problem (1) can be challenging for a few reasons. First, the loss function usually has no closed-form and is difficult or expensive to observe from the real system. Instead, some simulation engines for are built, such as the financial model for risk management. Therefore, optimizing is often conducted via its simulation and a lot of Monte Carlo methods based on the simulation results have been developed (see hong2014monte for a review). Second, even with possible simulation models for , is still not easy to solve, as is not directly returned by simulation results but estimated from them. When is large, it may require a large number of simulations to estimate precisely. This can be seen from the noise of the quantile estimator. If we denote as the number of simulation replications at

, the noise variance of the empirical quantile estimator is approximately

(bahadur1966note). For close to 1, is typically quite small, especially for heavy-tailed distributions. Therefore, a large number of simulations are required to obtain accurate quantile estimator. The simulation models, however, can be very complicated and time-consuming due to the complex nature of the real system. This restricts the applicable number of simulation runs and makes it almost impossible to obtain results for every considered design with a limited budget. Third, the quantile functions may be non-convex and thus difficult to optimize.

With these challenges, some optimization algorithms via simulation can be designed to solve (1). The proper algorithm should have at least the following two characteristics. First, it should not require some strict properties from the objective functions, like convexity. Second, it should be efficient and can be used for expensive simulations with a limited budget. In this work, we aim to develop a metamodel-based simulation optimization algorithm which satisfies both characteristics.

### 1.2 Literature Review

The main idea of metamodel-based simulation optimization approach is to introduce a statistical model to guide the search when optimizing black-box functions. With a limited budget, we can only observe the objective functions at a small number of design inputs, while at the unknown regions, the metamodel serves as an approximation of the true response surface. It provides the information about the entire space and helps decide new points to locate the optimum efficiently. This type of approach has been successfully used in optimizing expensive functions (jones1998efficient, srinivas2009gaussian, regis2007stochastic, muller2017socemo)

. It can be classified with respect to the type of metamodel adopted. Some commonly-used metamodels including polynomial regression, radial basis functions, Gaussian process model, artificial neural networks (see

barton2006metamodel and jones2001taxonomy for reviews). Among these methods, the Gaussian process (GP, also termed as kriging) model has become popular as it provides an estimate of the prediction uncertainty, which can be used to construct the selection criterion for further design choices. In this work, we also adopt the GP type metamodel.

Based on the GP model, a few different simulation optimization approaches have been proposed. For deterministic problems (the objective function in (1) is replaced by some deterministic function ), the Efficient Global Optimization (EGO) (jones1998efficient) algorithm with Expected Improvement (EI) criterion is the most widely used for its capability to balance between exploration (searching unexplored region) and exploitation (searching the current promising region). As the function value

can be simulated with no error, no replications at each design input are needed and thus EGO only considers how to select new design points. In parallel with the EGO paradigm, a few different algorithms were developed. GP upper confidence bound (GP-UCB) algorithm provided an alternative to negotiate exploration and exploitation with a tuning hyperparameter making balance between them

(srinivas2009gaussian). Stepwise uncertainty reduction (SUR) approach was to reduce an uncertainty measure with sequentially selected design points (picheny2015multiobjective). Moreover, some information-based algorithms were developed considering the distributions of the global minimizer (see shahriari2016taking for a review). For stochastic simulations, is simulated with noise and the expected value of is often considered to be optimized (objective function in (1) becomes , where represents the randomness). The noise in response needs to be taken care of by the metamodel and optimization algorithm. For response with homoscedastic noises, huang2006global proposed Sequential Kriging Optimization (SKO) with the nugget effect GP as metamodels. It introduced an augmented EI to consider the ‘usefulness’ of more replications at one location. For responses with heterogeneous noises, picheny2013quantile proposed the Expected Quantile Improvement (EQI), which is an extension of EI, to consider the known noise levels at both the already observed design points and the future candidate. When the noise levels are unknown, recently, some algorithms were proposed including Two Stage Sequential Optimization (TSSO) (quan2013simulation) and extended TSSO (eTSSO) (Pedrielli2018etsso) with the stochastic GP (ankenman2010stochastic, yin2011kriging) as metamodels. They tried to combine the Optimal Computing Budget Allocation (OCBA) (chen2000simulation) technique to decide the number of replications at design point with the EI criterion. The spatial uncertainty of the GP model and the noises of the observations are then reduced iteratively with a global and local search.

Instead of optimizing the expectation of stochastic functions, in this work, we aim to optimize the quantile functions of loss distributions. Therefore, a metamodel for the quantile function is required. Developing metamodels for quantiles has been extensively studied (koenker2005quantile, dabo2010robust, chen2009metamodels). Among these models, the quantile regression (QR) (koenker2005quantile) is the primary and most widely used. Recently, the stochastic GP model has been generalized for quantile metamodeling (chen2016efficient). It shows competitive performance compared with QR model and thus enables us to integrate the GP model into some optimization algorithms for quantile optimization.

### 1.3 Illustration & Contributions

Using the generalization of chen2016efficient, we can extend the eTSSO algorithm for Quantile (eTSSO-Q) optimization. This extended algorithm, however, can still be costly for high quantiles with a limited computing budget. To address this challenge and further improve efficiency, we propose a novel eTSSO-Q Multi-Level (eTSSO-QML) algorithm, which is the main contribution of this work. Different from traditional approaches which directly optimize the quantile function at the objective level, eTSSO-QML starts with optimizing some lower quantiles. Typically, the lower quantiles are cheaper and easier to estimate and their estimations are likely to be less noisy compared with that of a high quantile (bahadur1966note).

We next illustrate this idea with an test function from shim2009non (the quantile functions are shown in Figure 1).

In the example, we first optimize the more accurate lower levels (0.5, 0.75) to identify promising regions (near 0 and 1.5 in Figure 1). As the algorithm proceeds, more simulation replications are assigned and the quantile estimators at higher levels (0.9) improve. We then increase the level of the quantile being optimized iteratively up to the objective level. At the same time, the search process is guided by the metamodels for these increasing levels with a focus on the possible promising regions identified by the previous levels. In contrast, if we directly optimize the 0.9 quantile, due to the inaccurate quantile estimators obtained with a limited budget, the constructed metamodel can be very unreliable and can mislead the search, resulting in inefficient usage of the budget.

Optimizing the lower quantile functions can be informative for the objective level for a few reasons. First, the quantile functions at different levels are likely to be correlated since they come from the same loss distribution (wang2017joint). In this case, as the quantile level approaches the objective level, the shapes of the surfaces tend to be similar, and thus the solutions found by the previous levels are likely to be promising for the objective level. Second, consider two levels 0.5 and 0.9 in the example above. For near 0.8, is very large leading us to conclude that the here will be even larger due to the non-decreasing property of the quantile functions (i.e. for ). In particular, when we see here that is larger than , it is obvious that cannot be optimal for and hence, there seems no need to allocate further replications to the region near 0.8. In this sense, leveraging the lower levels may help eliminate some bad regions and thus can improve the algorithm efficiency.

More formally, with eTSSO-QML, we consider the problem where the -quantile for loss function is to be minimized leveraging on lower quantiles . Our main contributions can be summarized as follows:

1. We propose a multi-level co-kriging model for the quantile functions. This model ensures that the predictive curves for different quantiles do not cross and thus the non-decreasing property of quantile functions is maintained.

2. With the proposed metamodel, we design the eTSSO-QML algorithm. This algorithm leverages on the multi-level model and starts by searching informative and cheaper lower quantiles to quickly identify promising regions for the objective -quantile level.

3. We prove the convergence of eTSSO-QML and test its empirical performance with several numerical examples.

The rest of this article is organized as follows. Section 2 reviews co-kriging model basics and Section 3 extends it to the multi-level quantile case. Section 4 provides details of eTSSO-QML algorithm and Section 5 states its convergence results. Section 6 provides numerical examples to show the effectiveness of eTSSO-QML. Section 7 summarizes the work and presents some future work. The proofs of all lemmas and theorems are provided in the supplementary material.

## 2 A Review of Stochastic Co-Kriging Model Basics

To jointly model these quantile functions, we propose to use co-kriging. It was originally developed to model deterministic multi-fidelity problems (where a response can be observed with different fidelities) (kennedy2000predicting) and has recently been extended to stochastic simulation metamodeling for expectations (chen2017stochastic). In this section, we briefly review some basics of the stochastic co-kriging model.

Here, we first introduce some notations used and the simulation background. To develop a stochastic kriging model, replications of the experiments are required. That is, at each design input , a few simulation runs are required. Throughout this work, we use to represent the results of the simulations (or equivalently, the random samples of the loss distributions from simulations). For instance, where there are simulations at , we observe results: , where represents the randomness of the th simulation. With these simulation results, a point estimate for the response of interest (denoted by ) can be obtained. For instance, when modeling the quantile function, is the sample -quantile: , where is the -th order statistic for the sample . Due to a limited number of simulation runs that can be conducted, is a noisy estimate. Point estimates taken at all design inputs can then be used to develop a predictive model.

The standard stochastic co-kriging model is designed for expectations of a series of stochastic responses. Models at different levels satisfy the following relations:

 Yl(x)=Zl(x)+ϵl(x)=ρl−1Zl−1(x)+δl(x)+ϵl(x),  if  1
 Yl(x)=Zl(x)+ϵl(x)=δl(x)+ϵl(x),  if  l=1,

where and represent the noisy and noise-free responses at level , respectively, and () are independent second-order stationary GPs (santner2013design). In each model , for any finite set of , the GP value

follows a multivariate Gaussian distribution with mean

and pairwise covariance: . Here, is a vector of known functions and is a vector of model parameters. Without prior knowledge of the mean functions, is used in this work for illustration. For the correlation function, we adopt the popular Gaussian function: , where is the th coordinate of and is the sensitivity parameter determining how large the correlation is in each dimension of . The random noises, , , follow an

-dimensional normal distribution with zero mean. These noises are assumed to be independent of

. It is clear that in this model, is represented by a scaled term, plus a difference term. This type of autoregressive structure is first introduced by kennedy2000predicting for deterministic multi-fidelity problems.

When the estimates of the responses at some selected design points are obtained, the prediction at any unknown point in can be computed based on the co-kriging model. Denote as the set of design points with representing its cardinality and as the point estimate for for . We assume the design sets for all levels of are the same and thus for all , the estimates , are available. This assumption holds in our multi-level quantile case since the point estimators can be obtained by the order statistics of for all desired quantiles. With , where is the point estimate vector of the th level for points in , the predictor and its predictive variance of at any unobserved point can be derived as (chen2017stochastic):

 ˆZl(x):=hl(x)Tˆβ+tl(x)TR−1(Y−Hˆβ), (2)
 var(ˆZl(x)):=σ2l+l−1∑j=1(Pl−1j)2σ2j−tl(x)TR−1tl(x)+ζl(x)T(HTR−1H)−1ζl(x). (3)

The notations used in (2) and (3) are listed in Table 1. If only one response is considered, , we get the stochastic GP model (ankenman2010stochastic). Furthermore, if the response is observed with no noise, and , we get the deterministic GP model.

The above results assume known hyperparameters , , and covariance matrix for noise, . When building the model in practice, these are typically unknown and should be estimated. Depending on how they are estimated, we separate these hyperparameters into two categories: model inputs and model parameters. The model inputs include the point estimates vector and the estimators for the associated noise covariance matrix . These estimators are directly drawn from the initial simulation results and serve as the inputs to the co-kriging model. For the standard stochastic co-kriging model for the mean performance measures, the inputs are the sample means and sample covariance for the mean estimates. The remaining hyperparameters () are referred to as model parameters and can be estimated by maximizing the likelihood function for point estimate vectors (see Appendix A for the likelihood function and some detailed discussion). After this, the predictor (2) and predictive variance (3) can be obtained by plugging in the estimated parameters.

## 3 Stochastic Co-kriging Model for Quantiles

When applied in quantile predictions, the predictive model structures remain the same as (2) and (3). However, several important adaptions are required. First, we need to find proper approaches to estimate the model inputs, and , which are the point estimate and noise covariance matrix for quantiles instead of expectations in traditional co-kriging model. Section 3.1 introduces the estimation of these inputs and derives some of their properties. Furthermore, due to the non-decreasing property of quantiles, the predictive curves for different levels of quantiles should not cross (which is a criterion not considered in traditional co-kriging models). In Section 3.2, we propose a penalized maximum likelihood estimation (PMLE) approach to ensure non-crossing of our estimates.

### 3.1 Estimation of Model Inputs

In practice, and are calculated from simulation results and then plugged into (2) and (3). Specifically, given original simulation results at , we can easily obtain the point estimates for and :

 Yj(x)=L└αjn┘(x),  Yk(x)=L└αkn┘(x).

Following the recommendations of chen2016efficient who tested different approaches to estimate the noise variance of the quantile estimates applied in the GP model, including batching (seila1982batching), sectioning (asmussen2007stochastic), sectioning-batching (nakayama2014confidence) and jackknifing (nance2002perspectives), here we use the sectioning method to estimate and . Furthermore, as the noise of these two estimates are correlated since they are drawn from similar simulation results, in this section, we also propose a sectioning method to estimate this noise covariance and derive the asymptotic properties of this estimator.

With the sectioning method, the simulation runs are first divided into batches with runs in each batch (). Then the covariance of and is estimated based on the quantile estimators with all simulation runs at , and , and the estimators within each batch, and , where and are the sample and quantiles of the th batch: , , respectively.

 ˆvar(ϵi(x))=ˆvar(Yi(x))=1nb(nb−1)nb∑l=1(Yi,l(x)−Yi(x))2, (4)
 ˆcov(ϵj(x),ϵk(x))=ˆcov(Yj(x),Yk(x))=1nb(nb−1)nb∑l=1(Yj,l(x)−Yj(x))(Yk,l(x)−Yk(x)), (5)
###### Assumption 3.1.

For all , has continuous distribution with density function , and finite mean and variance. The function has bounded first order derivatives in the neighborhood of with , where is the true -quantile.

Under Assumption 3.1, chen2016efficient has shown that is asymptotically unbiased with mean squared error (MSE) of order as . Following a similar approach, we can also prove the asymptotic properties of the proposed noise covariance estimator (5).

###### Theorem 3.2.

Under Assumption 3.1, when , is consistent and asymptotically unbiased, and the MSE of is .

### 3.2 A PMLE Approach to Avoid the Crossing Problem

Traditional QR models quantile functions at different levels separately, which can result in possible crossing between different quantile predictive curves. This, for example, will cause the predictive value of the 0.95 quantile at some points to be larger than that of the 0.99 quantile. This crossing phenomenon is a widely acknowledged problem in quantile modeling and can lead to an invalid distribution of the response and problematic inferences (koenker1984note, cole1988fitting, he1997quantile). For our quantile co-kriging model, preventing crossing to ensure monotonicity not only improves inferences but more importantly ensures that the multi-level search in the optimization algorithm is valid and efficient. Imagine if the crossing happens between two quantile models, the non-promising region identified by the lower quantile model can be misleading, since the higher quantile can be smaller than the lower one, and hence, can have promising (and even optimal) values in those non-promising regions. Therefore, it is vital for model validity and optimization efficiency to ensure non-crossing in the models before the optimization process. Although in the co-kriging model, multiple quantiles are modeled jointly, non-crossing is not guaranteed. Note that in the traditional application of the co-kriging model where deterministic or mean responses have typically been modeled, the crossing of the models is not a problem.

In this section, we propose a new penalized version of the stochastic co-kriging model to prevent crossing for the quantile models. In our multi-level quantile problem, there is no crossing when:

 ˆZl+1(x)−ˆZl(x)≥0,  ∀x∈X,l=1,...,m−1.

In other words, the difference between the predictive curves for any two successive quantiles should be non-negative across the design space. We first propose a penalized GP model that can ensure non-negative predictions for a single deterministic response (which can be considered as the difference between two quantiles and thus is non-negative everywhere in the design space) and then apply it to our multi-level model.

Consider first a deterministic GP model for a non-negative function (to distinguish with the model in the previous section, we use here to represent this response and as the observations for it):

 W(x)=f(x)Tβ+M(x;θ), (6)

where is a known function, is a vector of model parameters and is assumed to be a zero-mean second-order-stationary GP controlled by hyperparameters . Given that the true function is non-negative and that the observations have no noise, the observation vector we get,

, is non-negative. The standard GP model for a deterministic function is actually an interpolation of the observations

, and the shape of the predictive curve changes with the hyperparameters . Therefore, when estimating , we must make sure that the resulting curve should not intersect with the surface . In other words, those values of that cause the intersection should be eliminated. This intuition can naturally translate into the following penalization method. Instead of optimizing the ordinary loglikelihood of , we propose to minimize the following penalized likelihood function to get the PMLE estimator for :

 Q(W,θ):=−l(W,θ)+P(W,θ)=12ln(|(R′)|)+12(W−Fˆβ′)T(R′)−1(W−Fˆβ′)+λ⋅κ(W,θ),

where is the ordinary loglikelihood function, is the penalty term, is a non-negative penalty coefficient, , is the covariance matrix for , and

With this penalty term, the parameters that cause the predictive curve to go below the plane will be penalized. Theorem 2 demonstrates the consistency of the parameter estimated with this approach. It is established based on the asymptotic property of the MLE for the ordinary GP model (denoted as ). Specifically, under certain regularity conditions, as in distribution, where is the number of design points and is the Fisher information matrix (mardia1984maximum).

###### Theorem 3.3.

Denote as the true value of for model (6) and as the number of design points. There exists a local minimizer of such that .

This PMLE approach involves an optimization problem over the predictive surface . We highlight that this optimization is much easier compared to optimizing the true unknown response surface since the predictive response function is much cheaper with explicit form. When applied in our case, where the function (the difference between two quantile functions) is stochastic, this method can also return non-negative predictions by preventing crossing between the predictive curves and the plane . With a slight modification of the penalty function , this method can be easily applied in our multi-level quantile problem:

 κ(φ)={|φ|,if  φ<00,if  φ≥0,

where . It is easy to see that the parameters will be penalized once crossing happens between any two successive predictive curves among the quantile models.

With the approaches proposed here, we can build a co-kriging model for multi-level quantiles that does not cross. As mentioned before, there exist quite a few different approaches to do this more rigorously. For the GP based model, some other more complicated and refined methods have also been proposed to ensure positive response prediction (szidarovszky1987kriging, dowd1982lognormal). Compared with those methods, the PMLE approach keeps the nice auto-regressive structure and is convenient to use and integrate into the multi-level algorithm. From a more pragmatic viewpoint, as the metamodel here is mainly used as an aid to the optimization process, we do not consider more sophisticated techniques and just apply the PMLE approach.

## 4 Multilevel Quantile Optimization (eTSSO-QML) Algorithm

This section presents the eTSSO-QML algorithm, which optimizes the quantile with a multi-level model built from the quantiles. As previously noted, the optimization process is guided by the proposed stochastic quantile co-kriging model. It starts with searching the lower quantiles and then searches on the promising regions for higher quantiles identified by the lower ones. The algorithm is fully sequential where the overall computing budget is iteratively allocated. Within each iteration, we apply the two-stage framework from the eTSSO algorithm to provide a “division of labor" (Pedrielli2018etsso). In the first stage (Searching Stage), we adopt the EI criterion to select a new design input with the highest probability of achieving a better result than the current best. The second stage (Allocation Stage) focuses on distributing additional simulation replications to the existing design points. This is to improve the model fit and increase our confidence in the estimators to correctly identify the optimum. The distribution of budget used in these two stages is allowed to change with iteration. At the beginning of the algorithm, as little is known about the response, more budget is used to search the design space to identify the promising regions; and towards the end, more budget will be saved for the allocation stage, since the emphasis then becomes refining the point estimates at already selected designs when we are in proximity to the promising regions. These two stages will be discussed in detail in Section 4.3 and Section 4.4 after an overview of the algorithm is given in Section 4.1 and an introduction of the algorithm parameters is given in Section 4.2. Our algorithm is based on the eTSSO procedure, and we refer interested readers to Pedrielli2018etsso for full details of the algorithm.

### 4.1 Algorithm Overview

Before describing the algorithm, we list key parameters in Table 2.

The first five parameters are user-defined to start the algorithm. The total number of replications, , is typically determined by the computing budget and for , if no prior knowledge or preference is available, users can apply non-informative design strategies such as the uniform and Latin Hypercube sampling strategies. The values of should also be specified in advance. As noted above, to optimize a high quantile , we start with the base level . This level should not be too high and we suggest based on our experience. For the remaining level, we consider fixed evenly distributed inter-levels between and . The number of levels, , can be selected depending on the budget. A larger can slow the approach to the objective level and increase the co-kriging model complexity. However, as the difference between any two successive levels becomes smaller, the correlation between them increases, and thus the promising regions identified by lower levels become more accurate. In contrast, a smaller can reduce the computational burden but may weaken the correlation among the levels adopted. For any newly selected design point, we first assign replications to it. This can be chosen to ensure that the point estimates for the base level have reasonable accuracy. The parameter is used to examine the accuracy of a quantile estimator and only the estimator whose variance is smaller than is accepted. These two parameters can be chosen through a cross-validation test over . To achieve this, we can start with a small number of , and iteratively increase its value until the model for built with and replications at each input passes the cross-validation test. After that, can be selected as the maximum of the quantile estimators from the points in . The other algorithm parameters are updated with each iteration, and these will be described in detail in Section 4.2.

The eTSSO-QML algorithm is a iterative algorithm, iterating between the Searching Stage and the Allocation Stage until the computing budget runs out. We illustrate the overall idea of the algorithm with the example in Figure 1. At the start of the algorithm, a small budget is first applied. With a small number of replications, the point estimates of the target quantile (0.9) can be inaccurate with high uncertainty (noisy). At this stage, a more reliable lower quantile model (0.5) is first built and used to guide the initial search. In other words, we optimize the first level as a start to identify possible promising regions (like the regions near 0 and 1.5). As the algorithm proceeds, more budget is allocated and the accuracy of the higher quantile estimators improves. The algorithm will then stepwise increase the level of the quantile metamodels developed, and use the current highest level to guide the search. As a result, the algorithm gradually optimizes higher and higher quantile levels with a focus on the promising regions identified by previous levels, to finally optimize the quantile at the target level . The eTSSO-QML algorithm is described in Algorithm 1. In Section 4.3 and Section 4.4, we describe in further detail about the Searching Stage and the Allocation Stage.

### 4.2 Modeling Update in Each Iteration

In each iteration, the model (2) and (3) and the algorithm parameters are updated. In iteration , the search is guided by the -th level, which is the level to be optimized. The value of gradually increases from 1 to . We choose its value as follows. For each in , we find the largest value such that drawn from simulation results has noise variance smaller than . After that, we set , . As a result, consists of design points that have acceptable accuracy at level . After that, we select as the largest value in such that . In this case, we choose as the current highest level and then we can build a multi-level model for in iteration . However, the increasing value of the current highest level naturally increases the model complexity and so we would like to select some but not all from the quantiles to build the co-kriging model.

In fact, as the algorithm proceeds, some inter-level quantiles become redundant. Consider when the objective level is 0.95 quantile and we have inter-levels 0.9 and 0.8 quantiles which have similar design sets with acceptable accuracy, we may remove the 0.8 quantile as the 0.9 quantile is closer to our objective. This removal can be partly interpreted by the auto-regressive structure of the co-kriging model: the 0.95 quantile model depends on the previous levels only through the nearest level, the 0.9 quantile. Therefore, the 0.8 quantile function has no contribution if the 0.9 quantile is reasonably good and thus it can be removed. In this sense, we select a subset from by removing some redundant inter-levels and building a co-kriging model for levels in . Specifically, for , , only when , we set . Selecting in this way also ensures that asymptotically, we only select to build a single stochastic GP model. This is intuitive since when the number of iterations assigned to each design point tends to infinity, the -quantile estimators become accurate and there is no need to leverage on the information from lower levels (see Section 5 for detailed discussion).

The budget changes with and a specific choice will be introduced in detail in Section 4.4. The values of and can be easily updated after the two stages finish. At the end of each iteration, we choose the observed lowest quantile value at the objective level as the optimum found by iteration .

### 4.3 Searching Stage

In the Searching stage, we select the next design input based on the following EI criterion (jones1998efficient):

 xk+1=argmaxx∈XE[max{z∗αh(k)−~Zαh(k)(x),0}]=argmaxx∈X⎧⎨⎩ˆsh(k)(x)ϕ(z∗αh(k)−ˆZαh(k)(x)ˆsh(k)(x))+(z∗αh(k)−ˆZαh(k)(x))Φ(z∗αh(k)−ˆZαh(k)(x)ˆsh(k)(x))⎫⎬⎭,

where is the lowest value of the predictive responses at , and represent the probability and cumulative distribution functions of the standard normal distribution, respectively, and is a Gaussian random variable with distribution , where

 ˆs2l(x):=σ2l+l−1∑j=1(Pl−1j)2σ2j−tl(x)TR−1ztl(x)+ζl(x)T(HTR−1zH)−1ζl(x). (7)

Compared to (3), (7) uses instead of and thus it only considers the response covariance generated by the spatial process and ignores the noise variance. The rationale behind this is that the Searching Stage is by design to choose new points to reduce the spatial uncertainty (the noise is taken care of by the Allocation Stage). Moreover, (7) ensures that the EI values at all selected design points are zero so that they will not be reselected in the future iterations. With this criterion, we use the current highest level, , to guide our search. The EI criterion selects the point which has the largest expected improvement with respect to the current best. Typically, the points with small response predictions (from current promising regions) or large predictive variances (from less explored regions) have large EI values. By selecting these points, EI balances between exploitation and exploration. At this new design point, we run simulations, add it to the design set and then update the point estimate vectors accordingly.

### 4.4 Allocation Stage

In this stage, we adopt the OCBA technique to allocate computing resources to the selected design points. The original OCBA technique, however, is designed for ranking and selection problems with finitely many alternatives. When the number of possible alternatives is infinite, to guarantee the convergence of the algorithm, we make the following assumption on the allocation rule.

###### Assumption 4.1.

Suppose there exists a sequence such that , as and that . Denote as the cumulative number of replications assigned to the selected design point by iteration . It follows that for all .

This assumption has been used for problems with discrete but infinite alternatives (hong2006discrete, andradottir2006simulation). Although we consider optimization problems within a continuous domain, this assumption is important to ensure convergence (see the detailed discussion in Section 5). To fulfill this assumption, in the Allocation Stage of eTSSO-QML, we first spare some budget to ensure that for each selected design point, there are at least replications assigned to it by iteration (including the newly selected input in iteration ). After this initial stage, we adopt the OCBA technique to allocate the remaining replications.

As noted before, the budget of Allocation Stage increases with to refine the point estimates at selected design inputs. This is intuitive since at the beginning, more budget can be used to search the design space and when gets larger, we are more likely to be in proximity of the promising region. At this point, we can reduce the number of newly selected designs and assign more budget to the already sampled points in the promising region to refine our estimate of the optimum. We, therefore, let and increases with iteration and update its value as follows when :

 Bk=max{|D0|+k∑i=1max{0,rk−Nk(xi)},⌊Bk−1(1+maxxi∈Dkˆvar(Yh(k)(xi))maxxi∈Dkˆvar(Yh(k)(xi))+ˆs2h(k)(xk+1)⌋},

where is the sample noise variance of the point estimate at (estimated by (4)). This adaptive scheme is first adopted in the eTSSO algorithm. Its increase is controlled by the relationship between the point estimator noise, measured by , and the spatial uncertainty of the GP model, measured by the predictive variance (see equation (7)). At the beginning, when the spatial uncertainty is very large, has a slow growth to save more budget for new design point selection. When the spatial uncertainty gets smaller, i.e., the design space has been better explored, will then experience a faster growth, focusing more on the already selected points in the promising regions. The advantage of this scheme is that it does not require user-defined budgets for each iteration. Furthermore, it can improve the identification of the optimum and lead to efficient use of the computing budget.

After updating and checking that each design point has at least replications, we can allocate remaining replications to the selected design inputs with the OCBA technique. Denote as the current best design point in with respect to the current highest level: . OCBA decides the number of new replications assigned to each input as follows:

 ninj=√ˆvar(Yh(k)(xi))/λb,i√ˆvar(Yh(k)(xj))/λb,j,  (i,j≠b);  nb=√ˆvar(Yh(k)(xb))√∑i≠bniˆvar(Yh(k)(xi)),

where . The OCBA technique actually prefers allocating additional replications to points with low response values and large noises, which is intended to refine the point estimates at promising regions and those with large noise variances. Next, we can run additional simulations for the existing inputs and update the point estimate vector, , and current highest level, , accordingly. Finally, a new co-kriging metamodel can be built and then the algorithm goes to the next iteration.

As the overall algorithm proceeds, the search focuses more and more on promising regions of higher quantiles, and as a result, less budget is spent in the non-promising regions. This can be seen through our two-stage procedures. In the Searching Stage, as the lower quantiles are easy to estimate, it generally takes a small budget to search the lower quantile. Through this search, the algorithm has a focus on the promising regions for lower levels. If these regions are also promising for , it would have already been sampled. In the Allocation Stage, the budget used for increasing the precision of point estimate at lower quantiles essentially also improves the estimates for higher quantiles. Therefore, when optimizing lower quantiles, more budget is spent in its promising regions and the estimates (for all levels of quantiles) are improved. In other words, the algorithm digs into the promising regions identified by the lower quantiles when estimating . We highlight that this improvement is due to the specificity of our problem and does not apply to general multi-fidelity problems. In those problems, the experiments for lower fidelity and higher fidelity are different. Running simulations for one level does not necessarily improve the point estimates for another level.

Before we close this section, we briefly analyze how to make the current highest level, , approach the objective level, , i.e., to ensure that the objective level is optimized. This can be easily achieved when the budget is unlimited (see Section 5). With a finite budget, we can adjust the value of . Recall that represents our tolerance to the sample noise variance of the point estimates and that only if at some points. Smaller are more conservative and require a larger number of replications to drive down below . As a result, the search approaches the objective level slowly. Therefore, to speed up the increase of to , we can enlarge to ensure that at some design points, before the budget runs out. More formally, we can update in the th iteration, denoted by , as follows:

 Ck0=max{Ck−10,  ˆϵ(ˆxk)Nk(ˆxk)Nk(ˆxk)+A|Dk|+ABk},

where is the current best, is the current number of replications at , and is the noise estimate of . Suppose the number of iterations spent in the next few iterations have the same magnitude as . The quantity represents an estimate of total possible remaining iterations, and thus the number of new design points selected in the future. It follows that is an estimate of the total number of design points at the end of the search. We further assume the remaining replications are evenly distributed and then at the end, can receive more replications. Therefore, can be treated as the sample noise variance of the -quantile estimator at . By updating in this way, we aim to make the noise of the -quantile estimator at smaller than in the end. As a result, the design set for is non-empty and we reach the objective level. A simpler effort-based rule can be used to approach the -th level as well. In this rule, users can define the maximum number of replications that can be spent at the quantiles lower than . The algorithm will be forced to go to the th level after this budget is exhausted.

## 5 eTSSO-QML Convergence Analysis

This section demonstrates the consistency of the eTSSO-QML algorithm. We first introduce our main assumptions.

###### Assumption 5.1.

(i) There exist and , such that and for all , where is the probability density function for .

(ii) The variance parameter and the sensitivity parameter of the GP model are bounded away from zero.

(iii) The model input is known.

Similar assumptions are used in the analysis of the EGO algorithm (jones1998efficient). Assumption 5.1(i) bounds the optimization objective function and the noise variance of the point estimates. The uniform bound on helps ensure the consistency of eTSSO-QML. Assumption 5.1(ii) ensures that the GP model is efficient and reasonable to use. Otherwise, with zero , the uncertainties at unsampled inputs would become zero meaning all unobserved points are actually known. With zero , the correlation between responses at any two inputs would become zero. In other words, point estimates at the design points would not help predict the responses at unobserved points. Hence, any spatial metamodel would be ineffective under either condition. Assumption 5.1(iii) is used in the convergence proof of the original eTSSO algorithm. This convergence, however, can be affected by the quality of the estimator. To the best of our knowledge, when is estimated, convergence has only been studied empirically (kleijnen2012expected, Pedrielli2018etsso) (in the deterministic setting where , the convergence has been studied theoretically by bull2011convergence). The empirical convergence with estimated can be partially seen from the numerical tests in Section 6.

The convergence proof for the eTSSO-QML algorithm consists of three parts. First, in Lemma 5.2, we prove that as the iteration increases, the adopted co-kriging model tends to a single-level model for the objective level . This is intuitive since each selected design point will be allocated an infinite number of replications as the iteration number increases to infinity. In this case, the model at the objective level is accurate enough so that we may optimize it without leveraging the lower levels.

###### Lemma 5.2.

Under Assumptions 4.1 and Assumption 5.1, there exists such that for iterations , eTSSO-QML reaches a single-level stochastic GP model for , i.e., .

Second, in Lemma 5.3, we prove that the design points selected by the algorithm are dense in the design space. Generally, convergence proofs for global optimization algorithms require dense design points (torn1989global).

###### Lemma 5.3.

Under Assumption 4.1, the sequence of design points selected by eTSSO-QML is dense in the design space as .

Finally, in Theorem 5.4, we prove the convergence of the overall eTSSO-QML algorithm when the replications at each design point tend to infinity. Recall that the algorithm reports