1 Introduction
In this paper, we consider the problem of global optimization of expensive functions, i.e., functions which require large computational costs to evaluate. For physical and computational experiments, these functions represent the relationship between input and output variables, and may require days or even weeks to evaluate at a single input setting. One example is the highpressure mixing and combustion processes in liquid rocket engines, which requires numerically solving a large, coupled system of partial differential equations; see
Oefelein and Yang (1998). Even when computation is parallelized over thousands of processing cores, a comprehensive simulation of a single injector may take months to complete. An important problem about expensive functions is how to optimize the output/response by choosing appropriate settings of the input variables. This problem can be challenging for two reasons. First, it is not feasible to conduct extensive runs of function evaluations to find the optimal input settings, since each function evaluation is expensive. It is thus desirable to identify the optimal input settings with as few runs as possible. The second challenge comes from the complicated nature of the functional relationship. They are usually regarded as “blackboxes”, because there is no explicit relationship between the input and output. Although various local optimization methods are available when the derivatives of the functions are known or can be easily obtained, see Boyd and Vandenberghe (2004), such methods are not applicable in the present scenario.In the literature, a widely used practice for global optimization of expensive functions is to sequentially select input settings for function evaluations based on some criterions. The approach consists of two steps. First, it constructs a surrogate model to approximate the true function based on all the observed function outputs. The advantage of employing surrogate model is that it can provide predictions at any input settings with much cheaper computation. Second, it identifies a new input setting for function evaluation according to some surrogate model based selection criteria. With this approach, it is feasible to approximate the global optimizer of the true function via the surrogate model optimization. The commonly used surrogate models are the kriging models (Jones et al., 1998; Jones, 2001) and models based on radial basis functions (Gutmann, 2001; Regis and Shoemaker, 2007). Chen et al. (2011, 2017)
proposed to construct the surrogate models via overcomlpete prespecified basis functions. In addition, another type of optimization approach is statistical global optimization which chooses the next point based on a probability improvement criterion, like Palgorithm
(Torn and Žilinskas, 1989). Gutmann (2001) and Žilinskas (2010) have showed the equivalence of the Palgorithm and the surrogate approach proposed in Gutmann (2001) under certain conditions. For more details along these lines, see a review in Žilinskas and Zhigljavsky (2016).The primary objective of this paper is to propose a novel global optimization framework for optimizing expensive functions. Our approach is motivated by Regis and Shoemaker (2007), in which they utilize Radial Basis Functions (RBF) to build a deterministic surrogate model and guide the selection of the next explored point based on the predicted response and some distance criteria. The rationale of using RBFs is that they can capture the nonlinear trend of functions. However, the RBFs they used are predetermined and lack the flexibility of modeling. Also, it is less efficient to perform function evaluation from their surrogate model, because they use RBFs in a deterministic way without providing prediction uncertainties. Although a distance criterion is used to avoid getting trapped at local optima, it does not incorporate the information in response values for the surrogate models. To make better use of all information in the data, we propose to construct surrogate model with RBFs that are chosen adaptively based on the updated outputs, and to select new points based on surrogate models with quantified uncertainties.
There are other approaches for global optimization of expensive functions in the literature. Jones et al. (1998) propose a global optimization scheme by constructing a surrogate model with the kriging method. Our approach is different in that they make strong assumptions on the correlation structure between explored points while ours does not. A detailed review related to the kriging model in global optimization can be found in Jones (2001). Chen et al. (2017) propose a global optimization scheme that builds a mean prediction model with linear basis functions selected from a dictionary of functions, and then imposes a Bayesian structure over the mean model to quantify the uncertainty of the prediction. Our approach is also different from Chen et al. (2017). Instead of using a predetermined discrete function dictionary with a large number of linear functions, we use a moderate number of RBFs that can be adaptively updated based on observed data.
The paper is organized as follows. In Section 2, we give a mathematical formulation of the global optimization problem, and provide a review of the RBFs. In Section 3, we present the proposed global optimization framework that utilizes adaptive RBFbased Bayesian surrogate model. In Section 4, we present simulation studies to validate and compare our proposed method with the methods by Regis and Shoemaker (2007) and Jones et al. (1998). Further simulation results for 3 and 4dimensional functions are given in Section 5. A modification of the proposed method to avoid getting trapped in local optima is presented in Section 6.1. In addition, we study the effect of the grid size which is used as the candidate points in the proposed method, and also come out a gridfree approach. Concluding remarks and future research directions are given in Section 7.
2 Problem Formulation and Review of RBFs
Suppose is an expensive function of interest, where and is a dimensional convex domain in . The objective is to identify an optimal input setting that maximizes
(1) 
Because it is not practical to evaluate over to search the global maximizer due to the huge computational cost, a wellestablished practice is to sequentially select a few input settings for function evaluation using a twostep strategy. Suppose a set of function evaluations are taken. In step 1, a surrogate model is constructed and the resulting model approximation is denoted by Unlike the true function the surrogate model is much cheaper to build and evaluate. Therefore it is feasible to predict function values over all In step 2, the next input setting is selected for function evaluation via certain criterion based on the surrogate model from step 1. Steps 1 and 2 iterate until the total computational budget is met. The best point among all the chosen input settings, , can be treated as an approximation to the true optimal point . By adopting this twostep strategy, we will present in Section 3
our proposed framework in detail. Note that the surrogate construction may not necessarily be an interpolator of the observed points, i.e.,
Because our goal is optimization, the surrogate is used to predict the location of the optimal points, rather than to approximate the response with high accuracy (Chen et al., 2011). Thus we want to capture the trend of the true response surface quickly and to serve this purpose, our surrogate model does not have to meet the interpolation requirement.In the remaining part of this section, we give a brief review of the RBFs, which will be used in the proposed framework for the surrogate model construction. In the literature, the RBF is popularly deployed in applied mathematics and neural networks. See
Buhmann (2003) and Bishop (2006). Several commonly used functions are: (1) Gaussian functions: (2) generalized multiquadratic functions: with (3) generalized inverse multiquadratic functions: with (4) thin plate spline functions: where is the center of the function, and is a prespecified constant which varies with the chosen function. In our work, we will focus on the Gaussian RBFs. The Gaussian RBFs have two types of parameters: the center parameter that determines the location of the RBFs, and the scale parameter that measures the degree of fluctuation of the function. One advantage of using the Gaussian RBFs over other basis functions is that it can capture different trends of response by choosing different centers and scales. For example, a larger indicates a more concentrated change in the surface, and vice versa.3 General Global Optimization Framework
In this section, we propose a global optimization framework that utilizes adaptive RBFbased surrogate model via uncertainty quantification. In Section 3.1, we propose a novel hierarchical normal mixture Bayesian surrogate model with RBFs to approximate the true function, where the model coefficients are sparsely represented to avoid overfitting, and the parameters of the RBFs are adaptively updated each time a new point is explored. This allows us to predict the function value at any given candidate point. In Section 3.2, we propose a modelguided selection criterion and based on the posterior samples, a sample version of the expected improvement criterion is adopted. A new point can then be selected to identify a more promising area of global maximizer. A summary of the algorithm and some discussions will be presented in Section 3.3.
3.1 Normal Mixture Surrogate Model with RBFs
Suppose we observe explored points and its function values . Without loss of generality, we assume because otherwise we can approximate ’s instead of ’s, where is the sample mean of , i.e., . We propose to construct a surrogate model by a summation of Gaussian RBFs and an error term :
(2) 
Here, an error term is used to model the discrepancy between the model approximation constructed by the RBFs and the true function We assume that
are independent normal distributions with mean 0 and vairance
. Note that if the center parameters ’s and the scale parameters ’s are known and fixed, then the surrogate model in (2) is exactly the same as linear regression.
3.1.1 Prior Distributions
Because both ’s and ’s are unknown, the proposed modeling approach can handle highly nonlinear functions. A uniform prior over a rectangular region is used for
(3) 
where i.e., the smallest hyperrectangle to cover the current explored points, and it is adaptively changed with the addition of new explored points, see Andrieu et al. (2001). A gamma prior is used for the scale parameters
(4) 
where and are common to all ’s.
We also impose a hierarchical structure on the coefficients ’s. Define a latent variable to indicate whether a certain basis function is important or not: indicates that the th basis is important, while indicates the opposite. Specifically, we set with small and with relatively large , where
can be interpreted as a variance ratio. This hierarchical setting is first employed in the Stochastic Search Variable Selection (SSVS) scheme by
George and McCulloch (1993) and is also used for uncertainty quantification studies in Chen et al. (2017). Indeed, it is one type of the “gprior” (see Zellner (1986)) for avoiding overfitting. Now the mixture normal prior of the model coefficient can be written as follows:(5) 
with if and if and a binomial prior for the latent variable
(6) 
Note that the choice of plays an important role in the posterior sampling and control the model complexity. We also impose an inversegamma prior for the residual variance
(7) 
3.1.2 Posterior Sampling
The posterior distribution defined in (8) is computationally intractable. Markov Chain Monte Carlo (MCMC) method is utilized to solve this problem, see Andrieu et al. (2001) and Koutsourelakis (2009). That is, we use the MCMC method to generate the posterior samples from . Thus we sequentially sample , , and by fixing the other components and the data . Under certain conditions, we can guarantee that these samples can be treated as the posterior samples of . Here the MCMC method iterates the following two steps:

Sample by fixing , , and .

Sample and by fixing , , , and .
Specifically, we use the Gibbs sampler to generate the posterior samples for the parameters and the MetropolisHasting algorithm to obtain the posterior samples for the parameters and because there is no explicit formula for the posterior distributions of and .
Start with the posterior distributions for Denote and . Then, the samples of can be generated by
(12) 
The samples of can be generated by
(13) 
For the samples of , it would be simple to sample sequentially conditional on the other components, and can be generated by
(14) 
where
with and is with ;
with and is with Here the notation
represents the vector of all
’s exceptNow we turn to the parameters and First, consider the sampling procedure for . Instead of directly sampling the vector , we suggest sampling sequentially from
(15)  
where denotes the vector of all ’s except We use the MetropolisHasting algorithm to generate posterior samples for Specifically, at a new step , we set the proposed density to be a mixture of two densities, and a temporary sample can be obtained from the whole domain with uniform probability, or it can be a perturbation of the current iteration within its local neighborhood, i.e.,
(16)  
and  (17) 
Then we accept this temporary sample with the acceptance rate
where
Similarly, we can use the MetropolisHasting algorithm to generate samples of At step we choose a temporary as a perturbation of the current sample by the proposed density
(18) 
And we accept such sample with the acceptance rate
where
From (12)–(18), we generate samples for iteratively based the updated estimate for the remaining parameters. Then, the Gibbs sequence
can be obtained, where is the total number of iterations. After discarding the first say 40% samples, the remaining samples can be treated as the posterior samples of , , , and from . Thus the posterior sample for model approximation at a candidate explored point can be calculated by
(19) 
Then the function prediction can then be calculated as the average of ’s, and the prediction uncertainties can be calculated as the sample variance of ’s.
Finally, we note that the mean value of the posterior density of in (12) is which is a biased estimator of with a nugget value Hence, this estimate of can be regarded as a ridgetype regression estimate. It is deployed to prevent the model coefficients from being too large. Its use can lead to a more stable surrogate model.
3.1.3 Tuning Parameters
A remaining issue in the Bayesian computation is the tuning of the hyperparameters, which is critical for the model performance. For the hyperparameters related to the RBF, we adopt the settings in Andrieu et al. (2001) and Koutsourelakis (2009). Specifically, for the proposed density of the RBF centers in (16), we set For the prior of the RBF scales in (4), we set and for the proposed density of in (18), we set For the hyperparameters related to model coefficients and residuals, we follow the settings in Chipman et al. (1997). Specifically, for , we suggest to set , where i.e., the largest change in and . For the prior of the indicator variable , we set i.e., the probability of selecting a variable is 50%. For the hyper parameter and in (7), we set and
to be the 99% quantile of the inverse gamma prior that is close to
Consider the variance ratio . Usually we choose a large positive value for , e.g., . From our experience, we fix in the first simulation example. However, it should be problemdependent and can be changed for different optimization problems.3.2 A Point Selection Criterion
In this section, we discuss how to select new explored points based on the uncertainty of the response prediction for exploring uncertain regions. The ideal selection criterion should perfectly balance between exploration and exploitation properties to efficiently identify the optimal points within the given search budget. Here a sample version of the Expected Improvement criterion is adopted.
The EI criterion, initially proposed by Mockus et al. (1978), is used to select points close to the global maxima based on a chosen surrogate model. Using this criterion, an explored point is selected to maximize the expected improvement over the best observed response
(20) 
where is the maximum of the observed model outputs. It is pointed out in Jones et al. (1998) that under the Gaussian assumption of has the following closed form expression:
(21) 
By examining the terms, we see that the expected improvement is large for those having either (i) a predicted value at that is much larger than the maximum of outputs obtained so far, i.e., or (ii) having much uncertainty about the value of i.e., when is large.
In our scenario, since the proposed surrogate model does not satisfy the Gaussian assumption, there is no analytical form for and thus it is not practical to calculate directly. Instead, we calculate the Sampled Expected Improvement (SEI) as suggested in Chipman et al. (2012) and Chen et al. (2017), i.e., to estimate based on the posterior samples of
(22) 
where is the th posterior sample by (19), and is the total number of posterior samples. Unlike in the Gaussian case, the SEI value in (22) may not be expressed as a weighted sum of the improvement term and the prediction uncertainty term. From its definition, only the prediction posterior samples that are larger than the current best value, , are taken in the summation. Thus SEI first identifies the possible “improvement”’ area, , and then sums over these terms.
A new explored point at step is then selected to maximize the SEI criterion ,
(23) 
where is the current explored point set. Since the SEI criterion does not have a closed form, it may not be easy to identify the next explored point by solving (23). Thus to identify the next point, we may limit the search over a prespecified grid, , instead of over the whole region, .
3.3 The Proposed Algorithm and Remarks
In the first part of this section, we will present a summary of the algorithm and the flexible usage of the proposed adaptive RBFbased global optimization framework. For abbreviation, we will refer to the proposed method as BaRBF, where “Ba” stands for “Bayesian adaptive”. In the second part, we will compare our method with the baseline method proposed in Regis and Shoemaker (2007).
Algorithm 1 summarizes the proposed global optimization method. In the beginning, the initial design is chosen as a spacefilling design. In this paper, the maximin Latin hypercube design (Morris and Mitchell, 1995) is used. Then the main body of Algorithm 1 is to iterate between the two steps for the surrogate model construction in Section 3.1 and the point selection criterion in Section 3.2.
Algorithm 1 Global Optimization Algorithm  

1:  Choose a small set of initial explored points using a maximin 
Latin hypercube design, and evaluate on  
2:  for do 
3:  Construct a Bayesian surrogate model 
as in Section 3.1 based on  
4:  Calculate the SEI in (22), and 
select a new explored point via selection criterion in (23) over  
6:  Update and evaluate 
7:  end for 
8:  Return the current best optimal point, and 
the corresponding function value, . 
Note that the proposed BaRBF can be flexibly used in different scenarios. For example, when the number of available function evaluations is small to moderate, there may not be enough observations to estimate all the parameters. In this case, we only need to update some part of RBF parameters, say the scale parameter by setting all the scale parameters and need not update the parameters. The choice of whether to update all parameters or part of them can be decided based on the magnitude of the model residuals at the initial stage. If updating all parameters leads to relative large model residuals, then we can fix certain parameters instead. The formulas of the posterior distribution in (8)(18) need some minor changes accordingly if certain RBF parameters are fixed. For the above example, one only needs to set in eq. (8) to be the same , and update only one in (18), and does not need to update the ’s in (15) and (16).
For the remaining part of this section, we will compare our BaRBF with the Global metric stochastic RBF (GMSRBF) algorithm proposed by Regis and Shoemaker (2007) from a theoretical perspective. The GMSRBF method will be regarded as the baseline method from now on. First we give a brief review. The GMSRBF employs a surrogate model using RBFs,
(24) 
The RBF parameters in (24) are prespecified, i.e., the RBF centers are set at the explored points , and is precalculated at the initial stage of optimization. The model coefficients in (24) are estimated by solving a deterministic linear system of equation where And their point selection criterion
(25) 
is a weighted average of the scaled response prediction with
(26) 
and the maximin distance criterion with
(27) 
where The can take values in periodically. For example, if at time then at the next time Then a new point is selected to maximize , and finally the global maximizer is also estimated by .
Although both methods use RBFs, there are two main differences. First, the surrogate model is different. BaRBF uses a Bayesian surrogate model that provides not only predictions but also its uncertainties, while the GMSRBF utilizes a deterministic surrogate model that only provides predictions. Because our proposed surrogate model is similar to ridge regression, the approximation of response is more robust and smooth compared to the interpolation surrogate model in GMSRBF. The second difference lies in the choice of the selection criterion for new explored points. In our method, we utilize the expected improvement criterion
which can be regarded as a softthresholding version of . As previously discussed, thresholding the prediction makes it easier to identify global optima. In addition, under the Gaussian prediction, EI criterion contains the measure of the prediction improvement and the model uncertainty. Here the part of the prediction improvement can be treated as exploitation and the uncertainty part is used to explore the search space. In GMSRBF, the global exploration is based on the maximin distance criterion, , and the is used for local refining. The weighted average of these two criteria is adopted with prespecific weight pattern. Simulation studies will be presented in Sections 4 and 5 to further understand and compare the empirical performance of the two methods.4 Simulation Study
To assess the performance of BaRBF, we compare it with GMSRBF, which is regarded as the baseline method. In GMSRBF, the candidate set for the next point selection is chosen in each iteration. To make a fair comparison, the candidate points are fixed as a prespecified grid, , in the experimental region and both methods will be implemented over the same grid.
In addition to the GMSRBF, we also consider another global optimization approach based on Gaussian process surrogate model for comparisons. Jones et al. (1998)
proposed the efficient global optimization (EGO) approach by using the Gaussian process for surrogate construction. EGO starts from an initial point set. After evaluating the response values of the initial design points, a numerical optimization approach, like metaheuristic algorithm, is used to obtain the MLE of the parameters in the Gaussian process and then the corresponding surrogate model is obtained. Since the Gaussian process prediction follows a normal distribution, the EI criterion in Eq. (
21) is used to identify the next explored point over the feasible point set, , and then the surrogate model is updated. Iterate these two steps until a stopping criterion is met. Usually the stopping criterion can be the number of explored points or the maximum value of the EI criterion over the unexplored points.First we start with a 2D global optimization example, whose objective function is quit smooth. Then we consider another 2D objective function which is not smooth and has multiple global optima.
4.1 2D Branin function
We consider the standard 2D test function “Branin function”, which has been widely used in the global optimization literature, e.g. Jones et al. (1998). The scaled version of “Branin function” we use here is defined as follows,
(28) 
where and To simplify our code, we further restrict this function on the evenly spaced grid . The contour plot of the Branin function over the prespecified grid points is given in Figure 1, where there will be two local maxima and one global maximum on with the maximum value 1.0473. In BaRBF, we first measure the prediction uncertainties for all grid points in and then identify the next explored point via (23) from the set .
The objective is to find that maximizes in (28) with as few evaluations as possible. At each iteration of the algorithm, the current optimal point, , and its function value are recorded together with all the explored points. We randomly choose a small set of initial explored points using a maximin Latin hypercube design (Santner et al., 2013). All three methods start with the same set of ’s. Each time the surrogate model is updated by incorporating the value of a new explored point. Then we calculate and update the value. For each algorithm, new explored points are selected and evaluated sequentially until the total number of explored points reaches This process is repeated 60 times, and the average performances are reported and compared for the two methods.
For fair comparison among BaRBF and GMSRBF, we set the initial sampler of the RBF parameters in BaRBF to be the same as the fixed RBF parameters in GMSRBF. Specifically, we use Algorithm 1 in Fasshauer and Zhang (2007) to select an optimal value of in GMSRBF that minimizes a cost function that collects the errors for a sequence of partial fits to the data. The center parameters ’s are set as the explored points ’s.
From many simulation trials, we found out that, for the Branin test function, updating all parameters in BaRBF will lead to relatively large model residuals that do not converge. This might be caused by the small number of function evaluations. Thus we only update one scale parameter with all and fix the center parameter ’s at the explored points. We iterate the MCMC 10,000 times. Also, we discard the first 40% of the samples, and take 1 out of every 5 samples in the remaining 60% of the samples, in order to obtain stable and less correlated posterior samples for model fitting. In order to implement BaRBF, two important tuning parameters need to be prespecified. The first one is the value of in the coefficient prior. Here we set . Since the weighted selection criterion is adopted to select the next explored point, the parameter in the weight is fixed as 5 after we have tested several different possibilities.
In this subsection, we first illustrate the proposed BaRBF with one particular simulation sample. Figure 2 plots the contours of the surrogate model in BaRBF and the locations of the explored points using BaRBF with 21, 26, 31, 36, 41 for one simulation sample. Figure 2(a) shows the initial status of BaRBF. The initial design is a 16run maximin Latin hypercube indicated by 16 green squares. The next explored point chosen by the selection criterion, i.e. the 17th point, is indicated by the black circle in the lower right corner. In Figure 2(b), the five additional points (17th to 21st) are indicated by the five blue squares. These five points are divided into three sets, one closer to the global maximum, the other closer to the other two local maximums. As in Figure 2(a), the next explored point, i.e., the 22nd point, is indicated by the black circle. In Figure 2(c), all the 21 points from Figure 2(b) are indicated by green squares, the additional five points by blue squares, and the next explored point by black circle. Then the same symbols are used in Figure 2(d) to (f) to demonstrate the progression of points for and 41. Amazingly, except the initial design points, all the explored points are located closer to the three maxima, none for exploring bad regions. Finally the global maximum is identified in point 39 as shown by the black square in Figure 2(f). In summary these contour plots show that BaRBF efficiently explores the experimental space and quickly approaches the optimal points.
Approach  5%  Q1  Median  Q3  95%  Mean  std  Frequencies with 

Quantile  Quantile  true optimal values  
BaRBF(SEI)  1.0397  1.0397  1.0464  1.0473  1.0473  1.0448  0.0033  29/60 
GMSRBF  1.0176  1.0438  1.0464  1.0473  1.0473  1.0425  0.0152  20/60 
EGO  1.0473  1.0473  1.0473  1.0473  1.0473  1.0473  0.000  60/60 
We report the performance of BaRBF and GMSRBF based on 60 replications by randomly generating the initial LHD designs. The purpose is to see whether BaRBF provides a more efficient search path to identify the global maximum compared with GMSRBF, for the same number of function evaluations. The numerical results are summarized in Table 1
. First BaRBF has the higher mean value, 1.0443, than that of GMSRBF, 1.0425, and is more stable because of its smaller standard deviation. Based on the sample quantiles of the optimal values identified by both approaches, there is a detectable difference at the 5% quantile values. We found out that for several cases, the best points identified by GMSRBF are not close to the true optimal point and after checking the corresponding search processes, GMSRBF did not efficiently explore the experimental space by properly choosing the next points. On the other hand, GMSRBF performs better than BaRBF for the first quartile Q1. In addition, among the 60 replications, BaRBF can identify the true optimal values 26 times, which is higher than 20 times for GMSRBF. Overall BaRBF has a better performance. Here we plot in Figure
3 the mean value as well as the 5% and 95% quantile curves of the current optimal values with respect to the number of iterations for both methods. The mean curves in the two plots are very similar. More meaningful is the comparison of the two 5% quantile curves. For GMSRBF, the curve moves up quickly until =13; then it gets stuck (flat) until about = 23. By comparison, the 5% quantile curve for BaRBF moves up quickly until = 18. By then, the band between the upper and lower quantile curves is very narrow and continues to shrink. The corresponding band for GMSRBF does not shrink even to the end ( = 30). In fact it remains very wide when = 26. This figure gives a more informative comparison than the numerical results in Table 1. It clearly shows the better performance of BaRBF over GMSRBF.When we compare the results with EGO, it seems that EGO works perfectly in this Branin example because EGO can quickly identify the global maximum point in each replications. The possible reason should be that since the Branin function can be treated as a smooth function, it can be fitted quit well by the Gaussian process model and thus the EI criterion in EGO can rapidly guide the search process to the target point. For our BaRBF, we do have a error assumption in the surrogate model and the model fitting may not be as well as Gaussian process model. In addition, SEI is computed as the sample expectation of the improvement function without any distributed assumption. Thus if the Gaussian assumption is satisfied, it is not surprised that EI can be more efficient in getting the global optimal point. In fact, when we monitor the search process of BaRBF, sometimes it may stay in a local area for a while. This should be related to that the exploration effect of the SEI criterion does not function well.
4.2 2D Ronkkonen function
In addition, we consider another 2D objective function in Rönkkönen et al. (2011), i.e.,
(29) 
where for , , and and the experimental region is . This objective function has been used as a test function in Chipman et al. (2012). Here we also restrict the function on the evenly spaced grid . The contour plot of this Ronkkonen function over the prespecified grid points is given in Figure 4, where there are 12 local maximums and 4 global maximal points with the maximum value, 0.4777. Because of its multiple local and global optimal points, this Ronkkonen function is not as smooth as the Branin function.
In this example, the initial point sets are the same as those in Section 4.1, and the total number of explored points is set as , i.e., based on 16 initial points, the search algorithm iterates 30 times by sequentially adding 30 points. Then the best value among the 46 explored points is reported. Here the goal is to identify one of the four global maximum points. The average performances of BaRBF, GMSRBF and EGO over 60 replications are summarized in Table 2.
Approach  5%  Q1  Median  Q3  95%  Mean  std  Frequencies with 

Quantile  Quantile  true optimal values  
BaRBF(SEI)  0.4766  0.4775  0.4777  0.4777  0.4777  0.4775  4.6850e4  43/60 
GMSRBF  0.3635  0.4407  0.4766  0.4775  0.4777  0.4529  0.0363  11/60 
EGO  0.3922  0.4405  0.4766  0.4777  0.4777  0.4526  0.0344  18/60 
First, GMSRBF performs worst in terms of the frequencies of reaching one of the four global maximum points (see the last column of Table 2). Then we focus on comparing the performance between BaRBF and EGO. From Table 2, the mean of the best function value found by BaRBF is 0.4775 with standard deviation 4.6850e4, while the corresponding values for EGO are 0.4526 and 0.0344 respectively. We also compute the sample quantiles of the best values found by both methods. Table 2 shows that BaRBF touches the global maximum at the sample quantile, while EGO reaches 0.4777 at the quantile. In addition, for the BaRBF, the frequency of reaching a global optimum is 43/60, while the frequency for the EGO is 18/60. The mean value of 60 replicates, and the and quantile curves of the current optimal values with respect to the number of iterations for EGO and BaRBF are shown in Figure 5. The mean curve of the BaRBF moves up quickly to the global optimal value, while the curve for EGO moves up more slowly. In addition, the quantile curve for EGO does not get much improvement before adding explored points, i.e., , while the same curve for BaRBF moves up quickly and gets close to the global optimal value after adding 15 points, i.e., . Another attractive feature for BaRBF is the extremely low standard deviation (see the std column), which may suggest that the BaRBF can perform stably over repeated implementations. In summary, the BaRBF outperforms the EGO in this example.
Chipman et al. (2012) have pointed that when the test function has multiple global optima, the EI criterion might lead the search process to jump out of the neighborhood of one global optimum to another one and cannot stick around one global optimum. This may be explained by the fact that the EI criterion tries to minimize the prediction uncertainties among different global optima. The quantile curve for EGO in Figure 5 seems to support this point. For the BaRBF, by using SEI, it can quickly locate one neighborhood of a global maximum and then identify the best value. In addition, since this Ronkkonen function is not as smooth as the Branin function, the normality assumption and the interpolation property of the Gaussian process may not give advantages for the surrogate construction. For the BaRBF, the surrogate model consists of additive radial basis functions and their parameters are simultaneously adjusted via the proposed Bayesian approach. Thus our surrogate construction approach may be more advantageous for nonsmooth test functions.
5 Simulation Studies for Three and Four Dimensions
In addition to these two different 2D test functions, we consider 3D and 4D examples to illustrate that the proposed BaRBF can deal with higher dimension cases and also compare its performances with EGO and GMSRBF.
5.1 3D Ronkkonen Function
The Ronkkonen function defined in Rönkkönen et al. (2011) is quite general and can be extended to different dimensions. Here we consider a 3D Ronkkonen function which is generalized from the 2D function in Section 4.2 and is shown below,
(30) 
where for , , and ; ; . The experimental region considered here is . According to Rönkkönen et al. (2011), there are local maximum points and 27 of them are the global maximum points.
We divide the experimental region into grid points by setting the grid size as 0.04. For this grid, there is only one global maximum point with the function value 0.3584. We start by choosing the initial maximin LHD with 50 points for the 3dimensional case. After the initial stage, we run BaRBF for 50 iterations. To implement BaRBF, we follow the same RBF setup in Section 4.1. That is we set the scale parameter with all and fix the center parameter ’s at the explored points. For the tuning variable , we fix as 15. The performance of the BaRBF is measured by the maximum function value identified within 100 explored points and is summarized based on 20 replications by independently regenerating the initial design points. For the comparison purpose, we also implement EGO and GMSRBF.
Table 3 is a summary of the maximum values obtained by the three approaches. GMSRBF performs worst in this case because GMSRBF cannot identify any true global maximum value within 20 replications. Between BaRBF and EGO, BaRBF has better performance in the average maximum function values and the frequency of reaching the global maximum point, but both approaches share similar values in the first and third quartiles, Q1 and Q3, and the median value. Because there are few replications, the maximum value identified by EGO is less than 0.34. A possible reason should be similar to what was stated in Section 4.2, namely, the Ronkkonen function is not a smooth function and EGO may jump around different local modes. Finally, the std value for BaRBF is much lower than that for the others. This is similar to what we observe in Table 2 for the 2D case and has a similar implication on the stability of the BaRBF.
Approach  Q1  Median  Q3  Mean  std  Frequencies with 

true optimal values  
BaRBF(SEI)  0.3578  0.3580  0.3584  0.3581  3.3973e04  9/20 
GMSRBF  0.3203  0.3362  0.3576  0.3334  0.0234  0/20 
EGO  0.3579  0.3580  0.3583  0.3566  0.0048  5/20 
5.2 4D Hartmann Function
In this section, we study the performance for the 4dimensional Hartmann function. This function has been studied in Picheny et al. (2013) and is defined as
(31) 
where ; and There are multiple maxima for this function over the experimental region . Here we divide the experimental region int grid points by setting the grid size as 0.05. The maximum function value over these points is 3.1218.
The proposed method, BaRBF, with SEI selection criterion is used to identify the true optimal point over the prespecified grid for this 4D function. For the parameters in radial basis function, the scale parameter is set as and the center parameters, ’s, are fixed at the explored points. In addition, the variance, , in the mixture normal priors of the coefficients is fixed as 10.
We start by choosing the initial maximin LHD with 50 points for the 4dimensional case. After the initial stage, we run BaRBF for 50 iterations. Thus there are in total 100 explored points in this example. To track the performance of the proposed method, we record the maximum function values in each iteration, i.e., . That was repeated 20 times by randomly generating the initial maximin LHD. We repeat the same procedure for GMSRBF and EGO. A summary of the maximum values obtained by the three approaches are shown in Tables 4. For this 4dimensional Hartmann function, BaRBF outperforms GMSRBF in terms of the frequency, the mean of the optimal values and a smaller standard deviation, and as shown in Table 4, for BaRBF, the middle 50% of values between Q1 and Q3 is extremely tiny and smaller than that for GMSRBF. However, EGO has the best performance in this case. We think it should be related to the target function because this Hartmann function should be a smooth function and thus it is fever to the EGO approach.
Approach  Q1  Median  Q3  Mean  std  Frequencies with 

true optimal values  
BaRBF(SEI)  3.1119  3.1218  3.1218  3.0936  0.0746  15/20 
GMSRBF  3.0948  3.1218  3.1218  3.0903  0.0972  13/20 
EGO  3.1218  3.1218  3.1218  3.1099  0.0531  19/20 
6 Discussions
In this session, several issues are studied. First we modify the proposed algorithm by adding a step to force the search process to jump out from a local optimal area. Then we study the effects of grid size on the performance of the proposed method. Finally, in order to reduce the computational burden for highdimensional optimization problems, a gridfree version of the proposed method is introduced and a higher dimension example is illustrated.
6.1 A Modified Version of BaRBF
From tracing the search process of the BaRBF in the Branin function example, we found out that sometimes the BaRBF gets stuck in a local area and cannot leave the area for a while. To overcome this potential weakness, we have the following modification. To jump out of this local area, we add an additional step, called the escape step, by monitoring the search process of the current best value. That is, we record the number of consecutive nonimprovement iterations, i.e., iterations for which the current best function value cannot get improved from the new explored point. Denote this number by . Once exceeds a prespecified number, , it indicates that the BaRBF is stuck in a local area. Instead of continuing the search, we add some additional points to explore the experiment region. The additional point is chosen based on the maximindistance criterion in the region. That is, given the current explored points, we find the point such that the union set of this point with the current explored points has the maximal value of the minimal distance between any two points in this union set. The purpose is to put additional points in the unexplored area as far away as possible from the existing points. We continue adding points until we obtain a better function value or until we add points. Then we will return to the original search procedure. A similar idea has been adopted in Regis and Shoemaker (2007) to detect if the search algorithm converges to a local optimum. We refer to this modificcation as MBaRBF.
Approach  5%  Q1  Median  Q3  95%  Mean  std  Frequencies with 

Quantile  Quantile  true optimal values  
MBaRBF(SEI)  1.0397  1.0464  1.0473  1.0473  1.0473  1.0458  0.0028  38/60 
We implement the MBaRBF for the Branin function by setting . For the same initial points sets and other tuning parameters, the average performance of the 60 replicates for MBaRBF is shown in Table 5. Compare with the results for BaRBF in Table 1. Except for the 5% sample quantile value, the MBaRBF performs better in the mean value and median value. In addition, the frequency to reach the global optimum is 38/60 which is much higher than 29/60 for the BaRBF. The Q1 value for MBaRBF is significantly higher than that for BaRBF and the median value touches the global maximum of the Branin function.
6.2 The Effects of the Grid Size
In the numerical examples, we suggest to prespecify the grid size first when we implement our algorithm. This size may be chosen based on the prior knowledge. In the numerical examples in Section 4, the grid size is fixed as 0.04 or 0.05. Suppose we can consider different grid sizes for a given optimization problem. We will illustrate the effects of the grid sizes on the performance.
We revisit the 2D Branin function example in Section 4.1. Instead of setting the grid size as 0.04, we choose the finer size 0.02 and divide the region into grid points which still cover the original grid, . Based on this finer grid, there are still two local maxima and the global optimal point is located at . Since the number of grid points is now about four times that of the original, we take more iterations, , for the proposed BaRBF. Then based on the same initial points and tuning parameters, the results with 60 replications are summarized in Table 6. In this table, in addition to the results with 120 iterations, we also report the results with 30 iterations for comparison purpose.
Approach  5%  Q1  Median  Q3  95%  Mean  std  Frequencies with 

Quantile  Quantile  true optimal values  
BaRBF(120)  1.0471  1.0471  1.0473  1.0473  1.0473  1.0472  9.508e5  40/60 
BaRBF(30)  1.0466  1.0471  1.0471  1.0473  1.0473  1.0469  9.412e4  17/60 
Compare the performances of BaRBF and BaRBF(120) in Tables 1 and 6 respectively. First, the BaRBF(120) is implemented over the finer grid and it has higher frequency, , to reach the global optimal point. In addition, the quantiles and the median value of the best solutions of BaRBF(120) are 1.0471, 1.0471 and 1.0473 respectively which are significantly higher than the corresponding values shown in Table 1. Obviously BaRBF(120) has the higher mean value 1.0472 and a smaller standard deviation. This may be related to the fact that there are more candidate points and larger number of iterations. Thus BaRBF can identify better function values due to finer grid and can still explore the experimental space because of a larger number of iterations. To support this guess, we also report the summary of the BaRBF with finer grid and 30 iterations, denoted by BaRBF(30). The corresponding , sample quantiles and the median value are still better than the corresponding values shown in Table 1. But the frequency for obtaining the global optimal point is only 17/60. It means that 30 iterations may not be large enough for BaRBF to explore the whole region and the search process may get stuck in some local areas. Thus we need to have more iterations to increase the probability to jump out of these local areas. Overall we can conclude that when we have a finer grid, a larger number of iterations should be necessary.
6.3 A Gridfree Method
When we demonstrate the performance of the proposed BaRBF in Section 4, we mention that we need to prespecify a grid as the explored candidate point set before implementing the BaRBF. For example, we set the grid size as 0.04 for the 2 and 3 dimension cases. Suppose we consider the application in parameter tuning. In practice, the precision of each variable should be limited and due to this grid setup, we can easily select the next explored point based on the SEI criterion; otherwise it should be treated as another nondifferentialable optimization problem. However, the problem for the grid set is the curse of dimensionality. For example, when we consider a 7dimension region,
. Suppose we choose the grid size as 0.2. Then the number of grid points is points, which is huge. To keep such huge number of grid points in our process would slow down the computational speed. Since the grid point set is used as the candidate set for selecting the next explored point, we would propose another modified BaRBF without prespecifying grid point set.The idea is similar to the GMSRBF in Regis and Shoemaker (2007). Instead of fixing the grid points for the candidate point set, we randomly generate the candidate points in each iterations. Regis and Shoemaker (2007) have mentioned two conditions for this point generation procedure. One is the conditionally independent sampling condition and another one is the dense condition, that is, any point in
can be chosen as a candidate point with a positive probability. Since our approach satisfies the first condition, sample the candidate points from the uniform distribution over the experimental region
to satisfy the dense condition to guarantee the convergence property, i.e., when goes to infinity, the global optimal point can be identified by such search process with high probability. Thus in our algorithm, we add one more step to generate the candidate point set from the uniform distribution over the region and then choose the next explored points by applying the SEI criterion to . After updating the , we would continue the selection process by regenerating the new candidate points.To demonstrate this new modified BaRBF, named the gridfree BaRBF, the Rastrigin function (Pohlheim, 2015) is taken as the objective function, i.e.,
This Rastrigin function has multiple local maxima and the locations of the local maximal points are uniformly distributed. The global optimal point is with maximum value 0. Here we choose , that is, we consider a 8dimensional problem. At first the 80 initial points are chosen from a Latin hypercube design over the experimental region,
Comments
There are no comments yet.