1 Introduction
Traditionally, the mean of the response is a widelyused performance measure of a stochastic system. However, the mean itself is not able to evaluate the possible variability or describe the entire distribution adequately. To provide more thorough profiles of the response distribution, the quantile has become increasingly popular and of great interest in many fields, including insurance, engineering safety, finance, and healthcare (wipplinger2007philippe, morgan1996riskmetrics, cope2009challenges). In risk management, quantile, also termed as Valueatrisk (VaR), is one of the primary risk measures to quantify and interpret the risk that one system may face. For instance, in the finance industry, the quantile of a loss distribution represents the lower bound of large losses that the investor can suffer from an activity, where the large losses are defined to be the upper tail of distribution with very close to 1 (like 0.95, 0.99) (hong2009simulating).
Optimizing the quantiles of loss functions is a common practice for decisionmakers to manage the risk. In this case, searching the best design with the smallest
quantile of will return the desired decision. More formally, for design choice ( is the design space assumed compact), we want to minimize the quantile, , for loss function (with anddefined as the cumulative distribution function and probability density function of
):(1) 
As large losses are typically of interest, in this work, we consider high quantiles (whose level is close to 1) of .
1.1 Motivation
The optimization problem (1) can be challenging for a few reasons. First, the loss function usually has no closedform and is difficult or expensive to observe from the real system. Instead, some simulation engines for are built, such as the financial model for risk management. Therefore, optimizing is often conducted via its simulation and a lot of Monte Carlo methods based on the simulation results have been developed (see hong2014monte for a review). Second, even with possible simulation models for , is still not easy to solve, as is not directly returned by simulation results but estimated from them. When is large, it may require a large number of simulations to estimate precisely. This can be seen from the noise of the quantile estimator. If we denote as the number of simulation replications at
, the noise variance of the empirical quantile estimator is approximately
(bahadur1966note). For close to 1, is typically quite small, especially for heavytailed distributions. Therefore, a large number of simulations are required to obtain accurate quantile estimator. The simulation models, however, can be very complicated and timeconsuming due to the complex nature of the real system. This restricts the applicable number of simulation runs and makes it almost impossible to obtain results for every considered design with a limited budget. Third, the quantile functions may be nonconvex and thus difficult to optimize.With these challenges, some optimization algorithms via simulation can be designed to solve (1). The proper algorithm should have at least the following two characteristics. First, it should not require some strict properties from the objective functions, like convexity. Second, it should be efficient and can be used for expensive simulations with a limited budget. In this work, we aim to develop a metamodelbased simulation optimization algorithm which satisfies both characteristics.
1.2 Literature Review
The main idea of metamodelbased simulation optimization approach is to introduce a statistical model to guide the search when optimizing blackbox functions. With a limited budget, we can only observe the objective functions at a small number of design inputs, while at the unknown regions, the metamodel serves as an approximation of the true response surface. It provides the information about the entire space and helps decide new points to locate the optimum efficiently. This type of approach has been successfully used in optimizing expensive functions (jones1998efficient, srinivas2009gaussian, regis2007stochastic, muller2017socemo)
. It can be classified with respect to the type of metamodel adopted. Some commonlyused metamodels including polynomial regression, radial basis functions, Gaussian process model, artificial neural networks (see
barton2006metamodel and jones2001taxonomy for reviews). Among these methods, the Gaussian process (GP, also termed as kriging) model has become popular as it provides an estimate of the prediction uncertainty, which can be used to construct the selection criterion for further design choices. In this work, we also adopt the GP type metamodel.Based on the GP model, a few different simulation optimization approaches have been proposed. For deterministic problems (the objective function in (1) is replaced by some deterministic function ), the Efficient Global Optimization (EGO) (jones1998efficient) algorithm with Expected Improvement (EI) criterion is the most widely used for its capability to balance between exploration (searching unexplored region) and exploitation (searching the current promising region). As the function value
can be simulated with no error, no replications at each design input are needed and thus EGO only considers how to select new design points. In parallel with the EGO paradigm, a few different algorithms were developed. GP upper confidence bound (GPUCB) algorithm provided an alternative to negotiate exploration and exploitation with a tuning hyperparameter making balance between them
(srinivas2009gaussian). Stepwise uncertainty reduction (SUR) approach was to reduce an uncertainty measure with sequentially selected design points (picheny2015multiobjective). Moreover, some informationbased algorithms were developed considering the distributions of the global minimizer (see shahriari2016taking for a review). For stochastic simulations, is simulated with noise and the expected value of is often considered to be optimized (objective function in (1) becomes , where represents the randomness). The noise in response needs to be taken care of by the metamodel and optimization algorithm. For response with homoscedastic noises, huang2006global proposed Sequential Kriging Optimization (SKO) with the nugget effect GP as metamodels. It introduced an augmented EI to consider the ‘usefulness’ of more replications at one location. For responses with heterogeneous noises, picheny2013quantile proposed the Expected Quantile Improvement (EQI), which is an extension of EI, to consider the known noise levels at both the already observed design points and the future candidate. When the noise levels are unknown, recently, some algorithms were proposed including Two Stage Sequential Optimization (TSSO) (quan2013simulation) and extended TSSO (eTSSO) (Pedrielli2018etsso) with the stochastic GP (ankenman2010stochastic, yin2011kriging) as metamodels. They tried to combine the Optimal Computing Budget Allocation (OCBA) (chen2000simulation) technique to decide the number of replications at design point with the EI criterion. The spatial uncertainty of the GP model and the noises of the observations are then reduced iteratively with a global and local search.Instead of optimizing the expectation of stochastic functions, in this work, we aim to optimize the quantile functions of loss distributions. Therefore, a metamodel for the quantile function is required. Developing metamodels for quantiles has been extensively studied (koenker2005quantile, dabo2010robust, chen2009metamodels). Among these models, the quantile regression (QR) (koenker2005quantile) is the primary and most widely used. Recently, the stochastic GP model has been generalized for quantile metamodeling (chen2016efficient). It shows competitive performance compared with QR model and thus enables us to integrate the GP model into some optimization algorithms for quantile optimization.
1.3 Illustration & Contributions
Using the generalization of chen2016efficient, we can extend the eTSSO algorithm for Quantile (eTSSOQ) optimization. This extended algorithm, however, can still be costly for high quantiles with a limited computing budget. To address this challenge and further improve efficiency, we propose a novel eTSSOQ MultiLevel (eTSSOQML) algorithm, which is the main contribution of this work. Different from traditional approaches which directly optimize the quantile function at the objective level, eTSSOQML starts with optimizing some lower quantiles. Typically, the lower quantiles are cheaper and easier to estimate and their estimations are likely to be less noisy compared with that of a high quantile (bahadur1966note).
We next illustrate this idea with an test function from shim2009non (the quantile functions are shown in Figure 1).
In the example, we first optimize the more accurate lower levels (0.5, 0.75) to identify promising regions (near 0 and 1.5 in Figure 1). As the algorithm proceeds, more simulation replications are assigned and the quantile estimators at higher levels (0.9) improve. We then increase the level of the quantile being optimized iteratively up to the objective level. At the same time, the search process is guided by the metamodels for these increasing levels with a focus on the possible promising regions identified by the previous levels. In contrast, if we directly optimize the 0.9 quantile, due to the inaccurate quantile estimators obtained with a limited budget, the constructed metamodel can be very unreliable and can mislead the search, resulting in inefficient usage of the budget.
Optimizing the lower quantile functions can be informative for the objective level for a few reasons. First, the quantile functions at different levels are likely to be correlated since they come from the same loss distribution (wang2017joint). In this case, as the quantile level approaches the objective level, the shapes of the surfaces tend to be similar, and thus the solutions found by the previous levels are likely to be promising for the objective level. Second, consider two levels 0.5 and 0.9 in the example above. For near 0.8, is very large leading us to conclude that the here will be even larger due to the nondecreasing property of the quantile functions (i.e. for ). In particular, when we see here that is larger than , it is obvious that cannot be optimal for and hence, there seems no need to allocate further replications to the region near 0.8. In this sense, leveraging the lower levels may help eliminate some bad regions and thus can improve the algorithm efficiency.
More formally, with eTSSOQML, we consider the problem where the quantile for loss function is to be minimized leveraging on lower quantiles . Our main contributions can be summarized as follows:

We propose a multilevel cokriging model for the quantile functions. This model ensures that the predictive curves for different quantiles do not cross and thus the nondecreasing property of quantile functions is maintained.

With the proposed metamodel, we design the eTSSOQML algorithm. This algorithm leverages on the multilevel model and starts by searching informative and cheaper lower quantiles to quickly identify promising regions for the objective quantile level.

We prove the convergence of eTSSOQML and test its empirical performance with several numerical examples.
The rest of this article is organized as follows. Section 2 reviews cokriging model basics and Section 3 extends it to the multilevel quantile case. Section 4 provides details of eTSSOQML algorithm and Section 5 states its convergence results. Section 6 provides numerical examples to show the effectiveness of eTSSOQML. Section 7 summarizes the work and presents some future work. The proofs of all lemmas and theorems are provided in the supplementary material.
2 A Review of Stochastic CoKriging Model Basics
To jointly model these quantile functions, we propose to use cokriging. It was originally developed to model deterministic multifidelity problems (where a response can be observed with different fidelities) (kennedy2000predicting) and has recently been extended to stochastic simulation metamodeling for expectations (chen2017stochastic). In this section, we briefly review some basics of the stochastic cokriging model.
Here, we first introduce some notations used and the simulation background. To develop a stochastic kriging model, replications of the experiments are required. That is, at each design input , a few simulation runs are required. Throughout this work, we use to represent the results of the simulations (or equivalently, the random samples of the loss distributions from simulations). For instance, where there are simulations at , we observe results: , where represents the randomness of the th simulation. With these simulation results, a point estimate for the response of interest (denoted by ) can be obtained. For instance, when modeling the quantile function, is the sample quantile: , where is the th order statistic for the sample . Due to a limited number of simulation runs that can be conducted, is a noisy estimate. Point estimates taken at all design inputs can then be used to develop a predictive model.
The standard stochastic cokriging model is designed for expectations of a series of stochastic responses. Models at different levels satisfy the following relations:
where and represent the noisy and noisefree responses at level , respectively, and () are independent secondorder stationary GPs (santner2013design). In each model , for any finite set of , the GP value
follows a multivariate Gaussian distribution with mean
and pairwise covariance: . Here, is a vector of known functions and is a vector of model parameters. Without prior knowledge of the mean functions, is used in this work for illustration. For the correlation function, we adopt the popular Gaussian function: , where is the th coordinate of and is the sensitivity parameter determining how large the correlation is in each dimension of . The random noises, , , follow andimensional normal distribution with zero mean. These noises are assumed to be independent of
. It is clear that in this model, is represented by a scaled term, plus a difference term. This type of autoregressive structure is first introduced by kennedy2000predicting for deterministic multifidelity problems.When the estimates of the responses at some selected design points are obtained, the prediction at any unknown point in can be computed based on the cokriging model. Denote as the set of design points with representing its cardinality and as the point estimate for for . We assume the design sets for all levels of are the same and thus for all , the estimates , are available. This assumption holds in our multilevel quantile case since the point estimators can be obtained by the order statistics of for all desired quantiles. With , where is the point estimate vector of the th level for points in , the predictor and its predictive variance of at any unobserved point can be derived as (chen2017stochastic):
(2) 
(3) 
The notations used in (2) and (3) are listed in Table 1. If only one response is considered, , we get the stochastic GP model (ankenman2010stochastic). Furthermore, if the response is observed with no noise, and , we get the deterministic GP model.
Notation  Definition 

Products of . if ;  
Correlation of the design points generated by whose th entry is ,  
where and are the th and the th design point in , respectively.  
Correlation between and the design points generated by  
, with  
The covariance matrix of the spatial uncertainty  
A symmetric matrix with blocks:  
The covariance matrix of the noises  
A symmetric matrix with blocks:  
A matrix with blocks: if ; if  
Best linear unbiased estimator for : 
The above results assume known hyperparameters , , and covariance matrix for noise, . When building the model in practice, these are typically unknown and should be estimated. Depending on how they are estimated, we separate these hyperparameters into two categories: model inputs and model parameters. The model inputs include the point estimates vector and the estimators for the associated noise covariance matrix . These estimators are directly drawn from the initial simulation results and serve as the inputs to the cokriging model. For the standard stochastic cokriging model for the mean performance measures, the inputs are the sample means and sample covariance for the mean estimates. The remaining hyperparameters () are referred to as model parameters and can be estimated by maximizing the likelihood function for point estimate vectors (see Appendix A for the likelihood function and some detailed discussion). After this, the predictor (2) and predictive variance (3) can be obtained by plugging in the estimated parameters.
3 Stochastic Cokriging Model for Quantiles
When applied in quantile predictions, the predictive model structures remain the same as (2) and (3). However, several important adaptions are required. First, we need to find proper approaches to estimate the model inputs, and , which are the point estimate and noise covariance matrix for quantiles instead of expectations in traditional cokriging model. Section 3.1 introduces the estimation of these inputs and derives some of their properties. Furthermore, due to the nondecreasing property of quantiles, the predictive curves for different levels of quantiles should not cross (which is a criterion not considered in traditional cokriging models). In Section 3.2, we propose a penalized maximum likelihood estimation (PMLE) approach to ensure noncrossing of our estimates.
3.1 Estimation of Model Inputs
In practice, and are calculated from simulation results and then plugged into (2) and (3). Specifically, given original simulation results at , we can easily obtain the point estimates for and :
Following the recommendations of chen2016efficient who tested different approaches to estimate the noise variance of the quantile estimates applied in the GP model, including batching (seila1982batching), sectioning (asmussen2007stochastic), sectioningbatching (nakayama2014confidence) and jackknifing (nance2002perspectives), here we use the sectioning method to estimate and . Furthermore, as the noise of these two estimates are correlated since they are drawn from similar simulation results, in this section, we also propose a sectioning method to estimate this noise covariance and derive the asymptotic properties of this estimator.
With the sectioning method, the simulation runs are first divided into batches with runs in each batch (). Then the covariance of and is estimated based on the quantile estimators with all simulation runs at , and , and the estimators within each batch, and , where and are the sample and quantiles of the th batch: , , respectively.
(4) 
(5) 
Assumption 3.1.
For all , has continuous distribution with density function , and finite mean and variance. The function has bounded first order derivatives in the neighborhood of with , where is the true quantile.
Under Assumption 3.1, chen2016efficient has shown that is asymptotically unbiased with mean squared error (MSE) of order as . Following a similar approach, we can also prove the asymptotic properties of the proposed noise covariance estimator (5).
Theorem 3.2.
Under Assumption 3.1, when , is consistent and asymptotically unbiased, and the MSE of is .
3.2 A PMLE Approach to Avoid the Crossing Problem
Traditional QR models quantile functions at different levels separately, which can result in possible crossing between different quantile predictive curves. This, for example, will cause the predictive value of the 0.95 quantile at some points to be larger than that of the 0.99 quantile. This crossing phenomenon is a widely acknowledged problem in quantile modeling and can lead to an invalid distribution of the response and problematic inferences (koenker1984note, cole1988fitting, he1997quantile). For our quantile cokriging model, preventing crossing to ensure monotonicity not only improves inferences but more importantly ensures that the multilevel search in the optimization algorithm is valid and efficient. Imagine if the crossing happens between two quantile models, the nonpromising region identified by the lower quantile model can be misleading, since the higher quantile can be smaller than the lower one, and hence, can have promising (and even optimal) values in those nonpromising regions. Therefore, it is vital for model validity and optimization efficiency to ensure noncrossing in the models before the optimization process. Although in the cokriging model, multiple quantiles are modeled jointly, noncrossing is not guaranteed. Note that in the traditional application of the cokriging model where deterministic or mean responses have typically been modeled, the crossing of the models is not a problem.
In this section, we propose a new penalized version of the stochastic cokriging model to prevent crossing for the quantile models. In our multilevel quantile problem, there is no crossing when:
In other words, the difference between the predictive curves for any two successive quantiles should be nonnegative across the design space. We first propose a penalized GP model that can ensure nonnegative predictions for a single deterministic response (which can be considered as the difference between two quantiles and thus is nonnegative everywhere in the design space) and then apply it to our multilevel model.
Consider first a deterministic GP model for a nonnegative function (to distinguish with the model in the previous section, we use here to represent this response and as the observations for it):
(6) 
where is a known function, is a vector of model parameters and is assumed to be a zeromean secondorderstationary GP controlled by hyperparameters . Given that the true function is nonnegative and that the observations have no noise, the observation vector we get,
, is nonnegative. The standard GP model for a deterministic function is actually an interpolation of the observations
, and the shape of the predictive curve changes with the hyperparameters . Therefore, when estimating , we must make sure that the resulting curve should not intersect with the surface . In other words, those values of that cause the intersection should be eliminated. This intuition can naturally translate into the following penalization method. Instead of optimizing the ordinary loglikelihood of , we propose to minimize the following penalized likelihood function to get the PMLE estimator for :where is the ordinary loglikelihood function, is the penalty term, is a nonnegative penalty coefficient, , is the covariance matrix for , and
With this penalty term, the parameters that cause the predictive curve to go below the plane will be penalized. Theorem 2 demonstrates the consistency of the parameter estimated with this approach. It is established based on the asymptotic property of the MLE for the ordinary GP model (denoted as ). Specifically, under certain regularity conditions, as in distribution, where is the number of design points and is the Fisher information matrix (mardia1984maximum).
Theorem 3.3.
Denote as the true value of for model (6) and as the number of design points. There exists a local minimizer of such that .
This PMLE approach involves an optimization problem over the predictive surface . We highlight that this optimization is much easier compared to optimizing the true unknown response surface since the predictive response function is much cheaper with explicit form. When applied in our case, where the function (the difference between two quantile functions) is stochastic, this method can also return nonnegative predictions by preventing crossing between the predictive curves and the plane . With a slight modification of the penalty function , this method can be easily applied in our multilevel quantile problem:
where . It is easy to see that the parameters will be penalized once crossing happens between any two successive predictive curves among the quantile models.
With the approaches proposed here, we can build a cokriging model for multilevel quantiles that does not cross. As mentioned before, there exist quite a few different approaches to do this more rigorously. For the GP based model, some other more complicated and refined methods have also been proposed to ensure positive response prediction (szidarovszky1987kriging, dowd1982lognormal). Compared with those methods, the PMLE approach keeps the nice autoregressive structure and is convenient to use and integrate into the multilevel algorithm. From a more pragmatic viewpoint, as the metamodel here is mainly used as an aid to the optimization process, we do not consider more sophisticated techniques and just apply the PMLE approach.
4 Multilevel Quantile Optimization (eTSSOQML) Algorithm
This section presents the eTSSOQML algorithm, which optimizes the quantile with a multilevel model built from the quantiles. As previously noted, the optimization process is guided by the proposed stochastic quantile cokriging model. It starts with searching the lower quantiles and then searches on the promising regions for higher quantiles identified by the lower ones. The algorithm is fully sequential where the overall computing budget is iteratively allocated. Within each iteration, we apply the twostage framework from the eTSSO algorithm to provide a “division of labor" (Pedrielli2018etsso). In the first stage (Searching Stage), we adopt the EI criterion to select a new design input with the highest probability of achieving a better result than the current best. The second stage (Allocation Stage) focuses on distributing additional simulation replications to the existing design points. This is to improve the model fit and increase our confidence in the estimators to correctly identify the optimum. The distribution of budget used in these two stages is allowed to change with iteration. At the beginning of the algorithm, as little is known about the response, more budget is used to search the design space to identify the promising regions; and towards the end, more budget will be saved for the allocation stage, since the emphasis then becomes refining the point estimates at already selected designs when we are in proximity to the promising regions. These two stages will be discussed in detail in Section 4.3 and Section 4.4 after an overview of the algorithm is given in Section 4.1 and an introduction of the algorithm parameters is given in Section 4.2. Our algorithm is based on the eTSSO procedure, and we refer interested readers to Pedrielli2018etsso for full details of the algorithm.
4.1 Algorithm Overview
Before describing the algorithm, we list key parameters in Table 2.
Parameter  Definition 

Total number of replications (computing budget) at the beginning  
Initial design set  
Quantiles used for modeling  
Minimum number of replications for a newly selected design input  
The maximum noise variance of a quantile estimate that can be tolerated  
Current iteration  
Design set at iteration  
Current level of quantile guiding the search  
Number of available replications in iteration  
The set of the quantile levels building the cokriging model in iteration  
Set of inputs whose estimates at the th level have acceptable accuracy  
Observations for th level  
Remaining number of replications (Algorithm terminates when )  
The best input for the th quantile found by iteration 
The first five parameters are userdefined to start the algorithm. The total number of replications, , is typically determined by the computing budget and for , if no prior knowledge or preference is available, users can apply noninformative design strategies such as the uniform and Latin Hypercube sampling strategies. The values of should also be specified in advance. As noted above, to optimize a high quantile , we start with the base level . This level should not be too high and we suggest based on our experience. For the remaining level, we consider fixed evenly distributed interlevels between and . The number of levels, , can be selected depending on the budget. A larger can slow the approach to the objective level and increase the cokriging model complexity. However, as the difference between any two successive levels becomes smaller, the correlation between them increases, and thus the promising regions identified by lower levels become more accurate. In contrast, a smaller can reduce the computational burden but may weaken the correlation among the levels adopted. For any newly selected design point, we first assign replications to it. This can be chosen to ensure that the point estimates for the base level have reasonable accuracy. The parameter is used to examine the accuracy of a quantile estimator and only the estimator whose variance is smaller than is accepted. These two parameters can be chosen through a crossvalidation test over . To achieve this, we can start with a small number of , and iteratively increase its value until the model for built with and replications at each input passes the crossvalidation test. After that, can be selected as the maximum of the quantile estimators from the points in . The other algorithm parameters are updated with each iteration, and these will be described in detail in Section 4.2.
The eTSSOQML algorithm is a iterative algorithm, iterating between the Searching Stage and the Allocation Stage until the computing budget runs out. We illustrate the overall idea of the algorithm with the example in Figure 1. At the start of the algorithm, a small budget is first applied. With a small number of replications, the point estimates of the target quantile (0.9) can be inaccurate with high uncertainty (noisy). At this stage, a more reliable lower quantile model (0.5) is first built and used to guide the initial search. In other words, we optimize the first level as a start to identify possible promising regions (like the regions near 0 and 1.5). As the algorithm proceeds, more budget is allocated and the accuracy of the higher quantile estimators improves. The algorithm will then stepwise increase the level of the quantile metamodels developed, and use the current highest level to guide the search. As a result, the algorithm gradually optimizes higher and higher quantile levels with a focus on the promising regions identified by previous levels, to finally optimize the quantile at the target level . The eTSSOQML algorithm is described in Algorithm 1. In Section 4.3 and Section 4.4, we describe in further detail about the Searching Stage and the Allocation Stage.
4.2 Modeling Update in Each Iteration
In each iteration, the model (2) and (3) and the algorithm parameters are updated. In iteration , the search is guided by the th level, which is the level to be optimized. The value of gradually increases from 1 to . We choose its value as follows. For each in , we find the largest value such that drawn from simulation results has noise variance smaller than . After that, we set , . As a result, consists of design points that have acceptable accuracy at level . After that, we select as the largest value in such that . In this case, we choose as the current highest level and then we can build a multilevel model for in iteration . However, the increasing value of the current highest level naturally increases the model complexity and so we would like to select some but not all from the quantiles to build the cokriging model.
In fact, as the algorithm proceeds, some interlevel quantiles become redundant. Consider when the objective level is 0.95 quantile and we have interlevels 0.9 and 0.8 quantiles which have similar design sets with acceptable accuracy, we may remove the 0.8 quantile as the 0.9 quantile is closer to our objective. This removal can be partly interpreted by the autoregressive structure of the cokriging model: the 0.95 quantile model depends on the previous levels only through the nearest level, the 0.9 quantile. Therefore, the 0.8 quantile function has no contribution if the 0.9 quantile is reasonably good and thus it can be removed. In this sense, we select a subset from by removing some redundant interlevels and building a cokriging model for levels in . Specifically, for , , only when , we set . Selecting in this way also ensures that asymptotically, we only select to build a single stochastic GP model. This is intuitive since when the number of iterations assigned to each design point tends to infinity, the quantile estimators become accurate and there is no need to leverage on the information from lower levels (see Section 5 for detailed discussion).
The budget changes with and a specific choice will be introduced in detail in Section 4.4. The values of and can be easily updated after the two stages finish. At the end of each iteration, we choose the observed lowest quantile value at the objective level as the optimum found by iteration .
4.3 Searching Stage
In the Searching stage, we select the next design input based on the following EI criterion (jones1998efficient):
where is the lowest value of the predictive responses at , and represent the probability and cumulative distribution functions of the standard normal distribution, respectively, and is a Gaussian random variable with distribution , where
(7) 
Compared to (3), (7) uses instead of and thus it only considers the response covariance generated by the spatial process and ignores the noise variance. The rationale behind this is that the Searching Stage is by design to choose new points to reduce the spatial uncertainty (the noise is taken care of by the Allocation Stage). Moreover, (7) ensures that the EI values at all selected design points are zero so that they will not be reselected in the future iterations. With this criterion, we use the current highest level, , to guide our search. The EI criterion selects the point which has the largest expected improvement with respect to the current best. Typically, the points with small response predictions (from current promising regions) or large predictive variances (from less explored regions) have large EI values. By selecting these points, EI balances between exploitation and exploration. At this new design point, we run simulations, add it to the design set and then update the point estimate vectors accordingly.
4.4 Allocation Stage
In this stage, we adopt the OCBA technique to allocate computing resources to the selected design points. The original OCBA technique, however, is designed for ranking and selection problems with finitely many alternatives. When the number of possible alternatives is infinite, to guarantee the convergence of the algorithm, we make the following assumption on the allocation rule.
Assumption 4.1.
Suppose there exists a sequence such that , as and that . Denote as the cumulative number of replications assigned to the selected design point by iteration . It follows that for all .
This assumption has been used for problems with discrete but infinite alternatives (hong2006discrete, andradottir2006simulation). Although we consider optimization problems within a continuous domain, this assumption is important to ensure convergence (see the detailed discussion in Section 5). To fulfill this assumption, in the Allocation Stage of eTSSOQML, we first spare some budget to ensure that for each selected design point, there are at least replications assigned to it by iteration (including the newly selected input in iteration ). After this initial stage, we adopt the OCBA technique to allocate the remaining replications.
As noted before, the budget of Allocation Stage increases with to refine the point estimates at selected design inputs. This is intuitive since at the beginning, more budget can be used to search the design space and when gets larger, we are more likely to be in proximity of the promising region. At this point, we can reduce the number of newly selected designs and assign more budget to the already sampled points in the promising region to refine our estimate of the optimum. We, therefore, let and increases with iteration and update its value as follows when :
where is the sample noise variance of the point estimate at (estimated by (4)). This adaptive scheme is first adopted in the eTSSO algorithm. Its increase is controlled by the relationship between the point estimator noise, measured by , and the spatial uncertainty of the GP model, measured by the predictive variance (see equation (7)). At the beginning, when the spatial uncertainty is very large, has a slow growth to save more budget for new design point selection. When the spatial uncertainty gets smaller, i.e., the design space has been better explored, will then experience a faster growth, focusing more on the already selected points in the promising regions. The advantage of this scheme is that it does not require userdefined budgets for each iteration. Furthermore, it can improve the identification of the optimum and lead to efficient use of the computing budget.
After updating and checking that each design point has at least replications, we can allocate remaining replications to the selected design inputs with the OCBA technique. Denote as the current best design point in with respect to the current highest level: . OCBA decides the number of new replications assigned to each input as follows:
where . The OCBA technique actually prefers allocating additional replications to points with low response values and large noises, which is intended to refine the point estimates at promising regions and those with large noise variances. Next, we can run additional simulations for the existing inputs and update the point estimate vector, , and current highest level, , accordingly. Finally, a new cokriging metamodel can be built and then the algorithm goes to the next iteration.
As the overall algorithm proceeds, the search focuses more and more on promising regions of higher quantiles, and as a result, less budget is spent in the nonpromising regions. This can be seen through our twostage procedures. In the Searching Stage, as the lower quantiles are easy to estimate, it generally takes a small budget to search the lower quantile. Through this search, the algorithm has a focus on the promising regions for lower levels. If these regions are also promising for , it would have already been sampled. In the Allocation Stage, the budget used for increasing the precision of point estimate at lower quantiles essentially also improves the estimates for higher quantiles. Therefore, when optimizing lower quantiles, more budget is spent in its promising regions and the estimates (for all levels of quantiles) are improved. In other words, the algorithm digs into the promising regions identified by the lower quantiles when estimating . We highlight that this improvement is due to the specificity of our problem and does not apply to general multifidelity problems. In those problems, the experiments for lower fidelity and higher fidelity are different. Running simulations for one level does not necessarily improve the point estimates for another level.
Before we close this section, we briefly analyze how to make the current highest level, , approach the objective level, , i.e., to ensure that the objective level is optimized. This can be easily achieved when the budget is unlimited (see Section 5). With a finite budget, we can adjust the value of . Recall that represents our tolerance to the sample noise variance of the point estimates and that only if at some points. Smaller are more conservative and require a larger number of replications to drive down below . As a result, the search approaches the objective level slowly. Therefore, to speed up the increase of to , we can enlarge to ensure that at some design points, before the budget runs out. More formally, we can update in the th iteration, denoted by , as follows:
where is the current best, is the current number of replications at , and is the noise estimate of . Suppose the number of iterations spent in the next few iterations have the same magnitude as . The quantity represents an estimate of total possible remaining iterations, and thus the number of new design points selected in the future. It follows that is an estimate of the total number of design points at the end of the search. We further assume the remaining replications are evenly distributed and then at the end, can receive more replications. Therefore, can be treated as the sample noise variance of the quantile estimator at . By updating in this way, we aim to make the noise of the quantile estimator at smaller than in the end. As a result, the design set for is nonempty and we reach the objective level. A simpler effortbased rule can be used to approach the th level as well. In this rule, users can define the maximum number of replications that can be spent at the quantiles lower than . The algorithm will be forced to go to the th level after this budget is exhausted.
5 eTSSOQML Convergence Analysis
This section demonstrates the consistency of the eTSSOQML algorithm. We first introduce our main assumptions.
Assumption 5.1.
(i) There exist and , such that and for all , where is the probability density function for .
(ii) The variance parameter and the sensitivity parameter of the GP model are bounded away from zero.
(iii) The model input is known.
Similar assumptions are used in the analysis of the EGO algorithm (jones1998efficient). Assumption 5.1(i) bounds the optimization objective function and the noise variance of the point estimates. The uniform bound on helps ensure the consistency of eTSSOQML. Assumption 5.1(ii) ensures that the GP model is efficient and reasonable to use. Otherwise, with zero , the uncertainties at unsampled inputs would become zero meaning all unobserved points are actually known. With zero , the correlation between responses at any two inputs would become zero. In other words, point estimates at the design points would not help predict the responses at unobserved points. Hence, any spatial metamodel would be ineffective under either condition. Assumption 5.1(iii) is used in the convergence proof of the original eTSSO algorithm. This convergence, however, can be affected by the quality of the estimator. To the best of our knowledge, when is estimated, convergence has only been studied empirically (kleijnen2012expected, Pedrielli2018etsso) (in the deterministic setting where , the convergence has been studied theoretically by bull2011convergence). The empirical convergence with estimated can be partially seen from the numerical tests in Section 6.
The convergence proof for the eTSSOQML algorithm consists of three parts. First, in Lemma 5.2, we prove that as the iteration increases, the adopted cokriging model tends to a singlelevel model for the objective level . This is intuitive since each selected design point will be allocated an infinite number of replications as the iteration number increases to infinity. In this case, the model at the objective level is accurate enough so that we may optimize it without leveraging the lower levels.
Lemma 5.2.
Under Assumptions 4.1 and Assumption 5.1, there exists such that for iterations , eTSSOQML reaches a singlelevel stochastic GP model for , i.e., .
Second, in Lemma 5.3, we prove that the design points selected by the algorithm are dense in the design space. Generally, convergence proofs for global optimization algorithms require dense design points (torn1989global).
Lemma 5.3.
Under Assumption 4.1, the sequence of design points selected by eTSSOQML is dense in the design space as .
Finally, in Theorem 5.4, we prove the convergence of the overall eTSSOQML algorithm when the replications at each design point tend to infinity. Recall that the algorithm reports
Comments
There are no comments yet.