1 Introduction
Optimization is an art of getting the best solution among the feasible solutions for challenging problems. Challenging optimization problems appear in many areas of science, technology, and industry. This includes farms placement problem, selftuned driving vehicles, green building design, vehicle design crash simulators, finite element design, and optimum molecule structure in drug and material discovery. All these are design problems where we are looking for the best set of design variables to optimize certain specifications.
These problems are complex systems of inputs and outputs with entirely unavailable information about the underlying behavior. The existence of all of these applications and their challenges resulted in a body of work called BlackBox Optimization (BBO). Simulators can be used to study such blackbox systems and potentially optimize them. Indeed, a large number of variables are involved and there is substantial interaction between them. This results in a large combinatorial search space where each setting is a valid solution. The evaluation process though includes costly experiments due to the complexity of the systems, which can be either computer simulators such as energy simulation tools or actual experiments such as crash simulators. Achieving an optimal solution of a highdimensional expensive blackbox function within a limited number of function evaluations is a matter of concern.
The blackbox optimization problem formulation is similar to conventional optimization problem with an objective function Equation 1 and constraints defining the feasible region Equation 2. In this research we consider the boxconstrained, Equation 3, blackbox systems.
(1)  
(2)  
(3) 
where is the blackbox function, and the goal is to obtain a global optimal solution of in the feasible input space of , where .
Since is often computationally expensive to be embedded directly within a global optimization method, a more practical approach employs surrogate models that are computationally less expensive to evaluate. A surrogate model is a mathematical approximation of the relationship between input and output variables. Several types of statistical models including Kriging Krige (1951)
, Radial Basis Functions (RBF)
Powell (1985), Regression Trees Breiman et al. (1984), Multivariate Adaptive Regression Splines (MARS)
Friedman (1991), Artificial Neural Networks
Papadrakakis et al. (1998), Support Vector Regression (SVR)
Clarke et al. (2005), etc. are distinguished as surrogate models. Optimizing the cheap to evaluate surrogate models for blackbox optimization is one of the existing wellknown derivativefree techniques called surrogate optimization. Surrogate optimization requires careful selection of simulator runs to simultaneously improve the surrogate model and gain more information on the potential optimum in each iteration.Historically, surrogate optimization methods assume that there is no uncertainty in the complicated blackbox systems and the set of significant decision variables is known a priori. Both of these assumptions are often violated in real world, Picheny et al. (2013a); Pilla (2007). This research introduces a new surrogate optimization paradigm to address two primary concerns: 1) How can we handle unimportant inputs (decision variables of the blackbox function) in surrogate optimization? 2) How can we handle the uncertainties associated with the blackbox functions in surrogate optimization? To handle this paradigm shift, we develop a new surrogate optimization approach, TreeKnot Multivariate Adaptive Regression Splines (TKMARS) as well as a smart replication strategy.
We introduce the surrogate optimization algorithm and the related works in the literature in § 2 to highlight the gaps that are not properly investigated. § 3 describes TKMARS and the proposed approach for deterministic blackbox functions. In this section we also present the proposed replication approach for handling the uncertainties associated with the blackbox systems. Finally, in § 4 we discuss the performance of the proposed deterministic and stochastic approaches.
2 Background
In this section, we provide some technical background on surrogate optimization, interpolating and noninterpolating surrogate models, as well as an explorationexploitation sampling to be used in this research. We also describe the Classification and Regression Tree (CART) as we see it in our proposed surrogate model as a partitioning technique.
2.1 Surrogate Optimization
Algorithm 1 is the formal visualization of a generic surrogate optimization algorithm for blackbox functions. Ideally, surrogate optimization attempts to find a global optimum of the surrogate model and evaluate it with the blackbox function to get the actual objective value . It adds the newly evaluated point to the initial data set and iteratively minimizes the gap.
The stopping criteria can be either a maximum number of expensive evaluations of the blackbox function or the expected improvement of the best sampled mean solution (BSMS). BSMS is the best known solution after iterations.
2.2 Interpolating vs. Noninterpolating Surrogate Models
Choosing an appropriate surrogate model, also known as metamodel, is extremely dependent on the performance of distinct methods under different circumstances. Surrogate models are classified as interpolating (kriging
Krige (1951), RBF Powell (1985), etc.) or noninterpolating (polynomial regression models Neter et al. (1996), multivariate adaptive regression spline Friedman (1991), etc.). Interpolating models are the most common techniques applied in surrogate optimization literature that cannot handle the uncertainty inherently. On the other side, noninterpolating models approximate the data smoothly under uncertainty, Hwang and Martins (2018).Noninterpolating surrogate models do not necessarily traverse all the data points to capture the exact behavior of the function, unlike interpolating surrogate models. In terms of biasvariance tradeoff, noninterpolating surrogate models may have higher bias and lower variance than interpolating surrogates. Therefore, in highly noisy systems, noninterpolating models are preferred to avoid oscillations of the fitted surface. Stochastic blackbox systems are rarely addressed in surrogate optimization literature as we discussed before. In Figure
1, we show how an interpolating model, Figure 1(b), and a noninterpolating model, Figure 1(c), perform for a simple function with noise and recognize the true underlying behavior under uncertainty. As we can see, to capture the single simulation with two or more different outputs, the interpolation method needs to traverse the points with a large slope. Consequently, interpolationbased surrogate models result in highly fluctuated approximations for data points include uncertainty.2.3 Radial Basis Function
Radial Basis Function (RBF) is one of the most common interpolating surrogate models Powell (1985). Assuming distinct already evaluated points, , the RBF interpolant is of the form, where refers to the norm, and is a basis function. The most common types of basis functions are Multiquadric (MQ) , Gaussian (G) , Cubic (C) , and Thin Plate Splines (TPS) . The shape parameter and the number of points affect the accuracy and stability of RBF (a larger shape parameter makes the function flatter).
2.4 Mars
Multivariate Adaptive Regression Splines, MARS, was introduced by Friedman Friedman (1991). MARS is a nonparametric noninterpolating surrogate model. The structure of MARS model is based on basis functions. The MARS algorithm utilizes splines to construct a piecewise continuous function to model the dependent variable. The MARS approximation has the form:
(4) 
where is an dimensional vector of input variables, is the intercept coefficient, is the maximum number of linearly independent basis functions, is the coefficient for the th basis function, and is a basis function that is either univariate or multivariate interaction terms. The interaction terms are presented in the form of:
(5) 
where is the number of interaction terms in the th basis function, is the th dependent variable corresponding to the th truncated linear function in the th basis function, and is the knot value corresponding to . The value is the direction that the truncated linear basis function can take.
Each data point can be an eligible knot location for MARS. Adding more data points increases the size of eligible knot locations. MARS interpolates the data points and loses its flexibility as the number of eligible knots increases. Interpolating a limited data set early when insufficient data is collected leads to a false assessment of function behavior and multiple local optima. For highly noisy functions, interpolating leads to high local variance in function estimation
Koc and Iyigun (2014). This may cause difficulties for the optimization procedure. Yet, when a large number of eligible knots are selected, multicollinearity can occur between basis functions with knots that are close to each other.Eligible knot selection techniques are developed to solve local variance and multicollinearity issues. One of the most common techniques selects evenly spaced knot locations within the range of the input data Martinez (2013); Cao et al. (2010); Huang and Shen (2004); Song and Yang (2010). Evenlyspaced knots may not capture true patterns in the data. Friedman Friedman (1991) proposes the minimum span (MinSpan) approach to minimize the local variability. In the MinSpan approach, for each independent variable, a local search around its current knot location is designed to reduce the number of eligible knot locations. Miyata Miyata and Shen (2005) presents a simulated annealing approach to choose eligible knot locations. Koc Koc and Iyigun (2014)
develops a mapping strategy by transforming the original data into a network of nodes through a nonlinear mapping. The nodes in the mapped network are basically a reference for choosing the new candidate knots. The mapping hyperparameters have an impact on the accuracy and time efficiency of the model.
MARS is intended to be parsimonious, including a forward selection and backward elimination procedures. In forward selection, MARS adds the basis functions in pairs for different dimensions, which gives the maximum reduction in the sum of squares error. The process of adding continues until it reaches the maximum number of basis functions. By removing the least effective term at each step, the backward elimination process avoids overfitting. MARS basically has an embedded dimension reduction technique, which is very useful for blackbox optimization where there is no previous understanding of input variables and their impact on the response. As a consequence, in the final model it maintains the most important variables. The advantages of MARS over other statistical models, such as linear regression models lie in its ability to handle curvature in highdimensional space and produce easiertointerpret models.
2.5 Cart
CART is a nonparametric decision tree and nonlinear predictive modeling method
Breiman et al. (1984). It partitions the space into smaller subregions, recursively, so that the interactions are manageable. For regression predictive modeling problems, CART chooses binary splits by minimizing the sum of the squared error, Equation 6, across all data points that fall within each partition. The partitions discover the structure of the unknown function as we progress over time.(6) 
where is the response value for each observations and is the average response value of the observations in terminal .
2.6 ExplorationExploitation Pareto Approach (EEPA)
EEPA Dickson (2014) is an explorationexploitation candidate selection approach. It creates a Pareto frontier on the estimated function value from the surrogate model, as one dimension, and the distance of the candidate points from the already evaluated points, as the second dimension. The first dimension exploits the interesting regions around the optimum function value area, and the second dimension explores the undiscovered regions. The exploration metric is where , and the exploitation metric is the fitted function value . The first metric needs to be maximized, while the second should be minimized. The nondominated Pareto set is given by , where is a fixed random pool of sample points. There might be candidate points on the Pareto set that are close to each other. To find the winning candidate points from the Pareto set maximin exploration technique is applied.
2.7 Literature
Various surrogate models has been used in the blackoptimization literature. Moore et al. (2000); Costas et al. (2014); Crino and Brown (2007); Dickson (2014) employ noninterpolating models such as MARS and polynomial regression. Krigigng Sacks et al. (1989); Jones et al. (1998); Simpson et al. (1998); Huang et al. (2006); Picheny et al. (2013a); Dong et al. (2015) is one the most traditional surrogate models used in the literature for blackbox optimization. RBF, an interpolating technique, is the most common surrogate model that has been given attention lately Regis,Rommel (2011); Regis (2014a, b); Krityakierne et al. (2016); Datta and Regis (2016); Jakobsson et al. (2010); Müller and Shoemaker (2014); Müller (2016). Considering every variable as a significant factor to the output is the weakest part of the surrogate optimization literature. Crino Crino and Brown (2007) demonstrates that MARS is capable of screening and reducing variables using the parsimonious nature of MARS.
Crino and Brown (2007); Costas et al. (2014); Picheny et al. (2013a); Jakobsson et al. (2010); Huang et al. (2006) consider uncertainty components in the metamodeling. Kriging and RBF do not inherently handle uncertainty, and some modifications are required. Huang Huang et al. (2006) develops a Krigingbased surrogate optimization approach and applies it to lowdimensional test problems including a lowlevel noise. The cost of fitting Kriging increases by the number of samples as it leads to impractical higherdimensional problems. The proposed method for highly fluctuated functions under higher noise levels requires further investigation. Picheny Picheny et al. (2013b) adds Gaussian noise with a fixed independent variance to the response and performs it on lowdimensional optimization test problems. The results show the relative poor modeling performance of Kriging. A large part of the variability that cannot be explained by the model is due to the observation noise during optimization. Jakabsson Jakobsson et al. (2010) and Picheny Picheny et al. (2013a) apply RBF and Kriging based surrogates that are performed only on lowdimensional test problems under low levels of uncertainty. MARS and regression require no revision to handle uncertainty. Costas Costas et al. (2014) shows MARS is preferable in realworld applications due to slope discontinuities and uncertainties, even though they do not study the effect of the noise.
A few other studies in surrogate optimization concentrate primarily on the development of an efficient optimizer. The main approach to recommend a new candidate is to solve an optimization subproblem that is subject to exploration and exploitation constraints on the estimated objective function Moore et al. (2000); Regis and Shoemaker (2005); Knowles (2006); Müller and Woodbury (2017). Although this method synthesizes sample points, it is computationally costly to solve an optimization problem and the complexity increases as the dimension increases. A stochastic sampling approach has been applied in Müller (2016); Müller et al. (2015)
. In this strategy, sample points are generated by perturbing the variables of the best point found so far with constant perturbation probability. The Paretobased candidate selection approach has been recently taken into consideration to select multiple sample points at a time
Dickson (2014); Bischl et al. (2014); Krityakierne et al. (2016); Dong et al. (2015).2.8 Contributions
In this research, we propose a new noninterpolating surrogate model that is specifically designed for surrogate optimization of blackbox functions. We demonstrate that the modified surrogate model is capable of identifying significant variables and handling uncertainties. Furthermore, a centroidbased dynamic pool generation approach is developed to improve the candidate selection procedure. We propose an alternative approach to identify subregions of interest, which nicely surround promising points for subsequent function evaluations. We introduce uncertainty associated with the blackbox functions and propose effective strategies to handle it in the surrogate optimization procedure.
3 Technical Description
In this section, we describe our technical contributions to surrogate optimization.
3.1 TkMars
To address the two major issues in blackbox optimization, a flexible, noninterpola ting and parsimonious surrogate model is required in the context of surrogate optimization to identify significant variables and address the uncertainties associated with blackbox function. In this section, we develop a modified version of MARS, which is capable of identifying significant variables and specifically designed for blackbox optimization.
The idea behind the new eligible knot selection approach is to fit accurately around potential optimum points to speed up the convergence within fewer number of evaluations.In addition, MARS is sensitive to the order of the data points. The regression tree therefore helps bring in the nearoptimal data points earlier, which accelerates the efficiency of MARS for optimization. The proposed approach, which is called TreeKnot MARS or briefly TKMARS, uses a partitioning technique to capture the function structure in the solution space and identify nearoptimal knot locations in each partition.
Algorithm 2, presents TKMARS framework. Consider the following highlevel example of TKMARS. Suppose we have a data set for , where is a normal noise term with a mean of , as in Figure 2(a). Consequently, . CART splits the existing data points into four partitions, as in Figure 2(b). Note that CART is capable of identifying the structure of the underlying function. Each partitions are the solid lines, and the horizontal lines show the estimated average function value in each partition. Next, we identify the centroid of each partition, and pick the closest points to the centroids as eligible knot locations. The centroids are the dashed lines as in Figure 2(b). We want to point out that the centroids are close to peaks and valleys. Focusing on the peaks and valleys where optima lie facilitates the optimization process. Given the new set of eligible knot locations, the breaking points of TKMARS are the nearest points to the peaks (valleys) of the function as in Figure 2(c).
The representatives of each partition are considered as a reference point for eligible knot locations. CART partitions the data set on the response function structure. It splits more where the function structure changes to minimize the least square error. CART does not split where there are no significant changes in the response. Hence, there are more partitions in highly structured regions. As a consequence, TKMARS defines more knots in highly structured regions. The peaks and valleys of the blackbox function are often in the middle of the space defined by the tree logic at the terminal nodes rather than the edges since CART splits where the response values significantly change. Hence, the centroids as calculated by Equation 7, are appropriate locations for the eligible knots
(7) 
where is the centroid of terminal for dimension , and is the th dimension of observations in terminal .
Knots in MARS are in onedimensional space. A transformation is therefore needed from multidimensional to onedimensional space. Equation 8 represents a transformation function. The eligible knot location for each dimension is the closest point to the terminal centroid in the same dimension.
(8) 
where is the nearest point to the centroid in terminal node for dimension . returns the smallest index if there are ties.
3.2 Dynamic Pool Generation
We apply EEPA, explorationexploitation Pareto approach Dickson (2014) in this research and modify it to incorporate TKMARS and a standardized Cosine metric. To discover the new candidate points, we employ EEPA to add multiple candidate points at a time and ignore the set of dominated points by pruning regions that are worse in both exploration and exploitation. Dickson Dickson (2014) shows that EEPA outperforms pure exploration and exploitation methods. However, EEPA and the quality of BSMS found using EEPA depends on the fixed random data set. The space can not be perfectly covered by generating a fixed set of random points. As a result, it needs a large pool, which makes the surrogate optimization procedure inefficient, since more exploration is needed. Further, in the presence of noise, even a large pool may not be sufficient. As a consequence, the candidate selection process can be facilitated by a dynamic pool generation that synthesizes some new sample points as required. Using the information about the function structure in each iteration helps with the gradual evolution of the pool. Regarding the first step of TKMARS, fitting CART on the already evaluated points, we have the set of centroids (representing the partitions). The quality of these centroids from the TKMARS knot selection procedure is discussed in § 3.1, as a result of the regression tree logic. Since the centroids of the terminal nodes are close to peaks and valleys of the underlying blackbox function, they can be good candidates to be added to the fixed random pool generated. Note that the centroids are not necessarily in the pool, even though they are close to the promising points. In the new approach at the end of each iteration, the solution space is defined as where is the fixed random pool, and is the set of centroids of the terminal nodes, . Over time, the centroids effectively move in the solution space. We add more points near BSMS using the tree logic, instead of using a large fixed random pool representing the whole feasible region. As a result, the candidate points are effectively near potential optima.
3.3 Standardized Cosine
As we discussed, to identify the nondominated set of solutions, EEPA defines a distance metric for exploration and uses estimated objective value for exploitation. Different distance metrics can be applied in EEPA, such as Euclidean and Cosine similarity. Cosine similarity calculates the Cosine of the angle between two vectors by using inner product space
. Cosine provides information about alternative directions. To find the perfect right angle of a direction, the Cosine measure needs to be zero.A perpendicular direction of the points to the already evaluated points provides more information on the unexplored areas. The main challenge is that it is not possible to calculate the Cosine similarity metric for points but vectors. We have a set of already evaluated points and a random pool from which the next candidate points are selected. To calculate the Cosine distance of the potential candidates from the already evaluated points, we need to consider the points as originstarting vectors, and then calculate the angle between the vectors. Hypothetically, the origin can be anywhere in the space. However, to find the right angle, we consider the center of the feasible domain as the origin. As a result, a standardization procedure is required to calculate the Cosine metric as given by Algorithm 3.
Algorithm 3, calculates the Cosine angle between the vectors originated from the center of the space. The standardized Cosine discovers the candidates that are uniformly scattered in the domain around the middle of the space, therefore.
3.4 SmartReplication
In some expensive computer simulations, there is inherent noise associated with the system. Most of the existing methods cannot handle uncertainty Davis and Ierapetritou (2007); Grill et al. (2015); Moore et al. (2000). We relax the noisefree assumption in this section, hence, the response obtained from simulation contains uncertainty. where is the output of the blackbox system that contains uncertainty. This implies the simulation output is a stochastic function that differs from the true value of the function. Each time a point is evaluated, a different response comes out. Therefore, for a single simulation, we have different outputs. The goal is still to minimize the true objective function Equation 1. We assume that the uncertainty or noise is independent identically distributed with mean 0 and variance of , , following an unknown distribution.
Since the data points include uncertainties associated with the blackbox function, a single evaluation may not be reliable and does not show the true output value of the blackbox function. Consequently, the deterministic approach may not be adequate to handle the uncertainty effect and, therefore, mislead the optimization process. Performing more than one evaluation for the same point provides more accurate data about the real objective value. An alternative approach is to replicate data points to reduce the uncertainty. Suppose the number of replication is . Let us define as . In other words, is the number of equal pair of sample points from the set of already evaluated points, . Let be the sample mean of after replications. That is, .
(9) 
in Equation 9 follows
based on the central limit theorem.
is the uncertainty component of . The second component in Equation 9 corresponds to the mean of replications’ uncertainties for a sample point. is an indicator function that counts the number of replications for a single point.We propose a distinct replication strategy, which replicates not all of the candidate points but only the promising points for optimization. Following this strategy, in this section, we propose an approach for replication, using a hypothesis testing based on confidence intervals. This approach replicates only the promising points closer to the best sampled mean solution, BSMS. SmartReplication automatically chooses the number of replications for each selected candidate point following the hypothesis rule.
Assume that the current BSMS is . For each selected candidate point
, SmartReplication considers the following null hypothesis and stops replicating
, if it can reject the null hypothesis.
H: if , where , then .
H: if , where , then .
Decision rule: if , Reject .
Conclusion: Even though the objective value of BSMS, , is smaller than the objective value of , , the true objective value of , , is smaller than the true objective value of , . Hence, we are confident that more replications are required for .
For each point that is evaluated times,
, we calculate the standard deviation and the confidence interval with a significance level of
: , where . is the noisy evaluation.Now, we calculate the confidence interval, . To select the promising points, we prune the candidate points based on the lower bound of the confidence interval . If of a point is less than the threshold, i.e. , it is selected as a promising point to be replicated. The promising points are the candidate points that are close to the current BSMS, i.e., below the threshold, Figure 3. Since the number of function evaluations is limited, and the experiments are costly, we consider a maximum number of replications for the promising points .
The smart approach behaves similar to the deterministic, NoReplication, approach when the uncertainty level is low and similar to the FixedReplication approach when the uncertainty level is high. For a higher noise level, the variance is larger. It makes the condition of ineffective, and, as a result, more replications are required. However, it can replicate points up to the maximum number of replications , which needs to be large and determined in advance. Consequently, SmartReplication is efficient in both ways, i.e. low/high noise. That is because it recognizes if the system is deterministic or stochastic, and decides about replications automatically.
3.5 Performance Metric
Since the function evaluations are expensive, the number of function evaluations before finding an optimal solution is a good metric for measuring the performance of an algorithm. However, obtaining a global optimum cannot be guaranteed. A metric that can quantify the convergence pattern of an algorithm is more plausible in the context of blackbox optimization.
First, we define the area under the curve (AUC) metric in Definition 1.
Definition 1 (Area Under the Curve – AUC)
Given the best sampled mean solution (BSMS) found by an algorithm after each function evaluation. Let be the true objective value of BSMS, where is the BSMS after function evaluations. Let and . Using the normalized objective value of BSMS:
the AUC is
(10) 
AUC comprises both the quality of BSMS, as well as the time that the algorithm finds it. It works well for the deterministic cases, as the objective value of the BSMS monotonically decreases over time. However, this may not hold for stochastic blackbox systems, as the curve consists of jumps. In these cases, the stability of BSMS represents the robustness. Even though the jumps in early iterations are tolerable, we expect a stable behavior towards the end of the evaluation. Therefore, next, we propose a metric for stochastic blackbox systems that is able to monitor the stability and jump locations across the number of function evaluations. A good algorithm is the one with fewer and shorter jumps, in which after a reasonable number of function evaluations, the results are reliable. To consider the instability in the metric, we consider the maximum objective value of BSMS obtained among all BSMS found forward. Subsequently, we define the maximal true function area under the curve (MTFAUC) as following:
Definition 2 (Maximal True Function Area Under the Curve – MTFAUC)
Given the best sampled mean solution (BSMS) found by an algorithm after each function evaluation. Let be the true objective value of BSMS, where is the BSMS after function evaluations. Let and . Using the normalized objective value of BSMS
(Equation 9) is the sample mean of the observed objective value after replications. Let be the BSMS after number of function evaluations and be its estimated objective value. Consider as the maximum objective value of BSMS obtained among all BSMS found forward:
(11) 
The MTFAUC is
(12) 
MTFAUC comprises the instability, by using to penalize the jumps. Hence, MTFAUC deteriorates if the convergence pattern highly fluctuates. A more stable algorithm towards uncertainty has a lower MTFAUC value.
4 Experimental Results
To evaluate the performance of the proposed approaches we consider the global optimization test functions with known optimal points that have proven challenging for blackbox optimization algorithms Surjanovic and Bingham ; Hansen et al. (2009). In this research, we first assume that blackbox functions is deterministic and later relax the assumption. Table 1 describes the characteristics of the selected test functions.
Function  Formulation  Range  Global Min 

Rosenbrock  [5,10]  
Rastrigin  [5.12,5.12]  
Levy  [10,10]  
4.1 TkMars
First, we evaluate the new knot preprocessing approach, TK, versus the evenlyspaced eligible knot selection, Friedman (1991), in the context of surrogate optimization. The number of eligible knot locations, , is usually preset to a static value in MARS. We study the results for , and . It is more appropriate to have MARS and TKMARS dynamically set based on the number of terminal nodes of CART,, to ensure a fair comparison between the two approaches. Yet the eligible knot location is distinct in each approach. As a consequence, we demonstrate that TKMARS eligible knot locations promise near optimal sites.
The results are provided in Figures 4(a)(d). The initial set of 35 points with independent variables designed with LHD for this experiment. As one can see, the treebased knots approach improves the optimization process, significantly.
We are now presenting an extensive analysis of the performance of TKMARS in a new class of test functions that includes unimportant variables. We believe that the existing test functions that are designed for global optimization differ from the lesssymmetrical and lessstructured realworld applications. In our work the function evaluation would be based on a fraction of the input variables, , not all of them, to show that the proposed method can identify the significant variables. Figures 5 correspond to average MTFAUC of 30 different runs for each test functions in four different .
Note that as the fraction of important variables decreases, MTFAUC increases. It is more difficult for the algorithm to recognize significant variables when the output is evaluated using fewer variables.
Besides, a component of uncertainty is added to the real objective value for each function evaluation to represent stochastic blackbox systems. In this research, the experiments are provided with a Gaussian noise, , where , and is the noise percentage level.
An Orthogonal Array, Bose and Bush (1952); Plackett and Burman (1946), is considered to design an efficient set of experiments on a subset of combinations of multiple parameters at multiple levels. Table 2 represents the problem setting and algorithm parameters for an extensive set of experiments.
Problem Parameters  levels 

Test function  Rosenbrock, 
Rastrigin, Levy  
Dimension  10, 20, 30 
Fraction of import. vars.  0.25, 0.50, 0.75, 1 
Noise level (%)  0, 5, 10, 25 
Algorithm Parameters  levels 
Initial pool size  , 
DOE method  LHD, sobol 
EEPA distance  Euclidean, Cosine 
EEPA # candidates  3, 6 
Replication type  fixed_rep, smart_rep 
Replication #  5, 10 
Model  RBF, TKMARS 
We perform an ANOVA on an OAdesigned experiments to test the hypothesis of variations caused by different factors. In ANOVA, the reference population is the first level of each factor. From the results, Table 3, it can be observed that statistically significant parameters are the fraction of important factors () and the noise level (). This indicates that the deterministic approach can not handle the noise effect and has an important impact on the algorithm efficiency (MTFAUC).
Estimate  Std. Error  t value  

(Intercept)  0.0213  0.06535  0.326  0.745659  
fiv=0.75  0.12197  0.04773  2.556  0.013294  *  
fiv=0.50  0.22659  0.04773  4.748  1.43E05  ***  
fiv=0.25  0.30299  0.04773  6.348  3.84E08  ***  
noise level=10%  0.14558  0.04133  3.522  0.000851  ***  
noise level=25%  0.23302  0.04133  5.638  5.59E07  ***  
Rastrigin  0.33031  0.04133  7.991  7.10E11  ***  
Levy  0.02759  0.04133  0.667  0.507163  
dimension=20  0.06038  0.04133  1.461  0.149573  
dimension=30  0.01005  0.04133  0.243  0.808709  
poolSize=2(d+1)  0.01179  0.03375  0.35  0.728  
DOE=Sobol  0.02552  0.03375  0.756  0.452732  
EEPA distance=Cosine  0.01675  0.03375  0.496  0.62165  
EEPA number of candidates=6  0.0452  0.03375  1.339  0.185774  
model=TKMARS  0.01327  0.03375  0.393  0.695633  
Signif. Codes:  0 ‘***’  0.001 ‘**’  0.01 ‘*’  0.05 ‘.’  0.1 ‘ ’  1 
4.2 SmartReplication
Next, we consider the replication approache to mitigate the effect of noise. An analysis of variance is performed with MTFAUC as the response variable on the set of parameters shown in Table
2. We consider two and replication types with two distinct replication numbers and . The replication number for is the fixed number of replications for each candidate points, however, for it is the maximum number of replications.From Table 4, note that the noise parameter is not significant anymore, which indicates that replication helps to cancel out the noise effect. TKMARS is margina lly significant. The SmartReplication strategy is statistically better than FixedReplication, which is consistent with our expectation. Using smart replication saves function evaluations and and in a noisy scenario decreases the MTFAUC.
Estimate  Std. Error  t value  
(Intercept)  0.250806  0.086891  2.886  0.005626  **  

fiv=0.75  0.07965  0.056382  1.413  0.163596  
fiv=0.50  0.207822  0.056382  3.686  0.000537  ***  
fiv=0.25  0.132775  0.056382  2.355  0.022263  *  
model=TKMARS  0.113446  0.048828  2.323  0.024028  *  
noise level=10%  0.004939  0.048828  0.101  0.91982  
noise level=25%  0.009702  0.048828  0.199  0.843259  
smart_rep  0.09262  0.039868  2.323  0.02404  *  
Rastrigin  0.344011  0.048828  7.045  3.81E09  ***  
Levy  0.045359  0.048828  0.929  0.357128  
dimension=20  0.073677  0.048828  1.509  0.137262  
dimension=30  0.051091  0.048828  1.046  0.300151  
Replication=10  0.146338  0.048828  2.997  0.004142  **  
poolSize=2(d+1)  0.016954  0.039868  0.425  0.67238  
DOE=Sobol  0.00669  0.039868  0.168  0.867332  
EEPA distance=Cosine  0.01711  0.039868  0.429  0.669465  
EEPA number of candidates=6  0.02138  0.039868  0.536  0.594059  
Signif. Codes:  0 ‘***’  0.001 ‘**’  0.01 ‘*’  0.05 ‘.’  0.1 ‘ ’  1 
Looking at the ANOVA Tables 3 and 4, we drop the unimportant parameters to explore MTFAUC variation caused by important parameters and fixed them to the significant level identified by ANOVA. Dimension=30, initial pool size=d+1, DOE method=LHD, EEPA distance=Euclidean, and EEPA number of candidate points is fixed to 3. We perform two separate full factorial designs considering the significant parameters; one for TKMARS and one for RBF. The fraction of important variables is statistically significant in almost all of the analysis. For simplicity and focusing on noise effect, we continue to study the replication strategies with the fixed , the most significant level.
Figures 6(a)(c) show the average performance of deterministic, fixed replication, and smart replication for 30 random pools. The plots indicate that NoReplication outperforms the Replication approaches when we use TKMARS. Note that, SmartReplication outperforms FixedReplication. The MTFAUC is improving as expected as the level of noise increases. But let us verify the robustness and the quality of the solution found through each approach.
Figure 7, shows the boxplot of the MTFAUC values for 30 different runs at different noise levels with different methods. Although from Figure 6(a) we observed that NoReplication outperforms Replication approaches, looking at Figure 7(c) () and Figure 7(e) (), we confirm that the variance of MTFAUC for replication strategies is more reasonable than , Figure 7(a). As in Figures 7(d) and (e), in lower levels of noise, SmartReplication is competitive with NoReplication approach. This indicates the robustness of the SmartReplication approach to randomness. Although is robust to different noise levels and there is no significant difference in the means across the noise levels, the overall average MTFAUC is larger in lower noise levels compared to and . This is because the FixedReplication strategy makes unnecessary functional evaluations.
From Figure 8, we see the variance of the objective values of BSMS after 1000 function evaluations for 30 different runs at each noise level. As shown in the Figure, in the nonoise case, is competitive with and they have the shortest boxes, comparatively. and are robust to different noise levels since the means are close compared to . In the highest level of noise, has the shortest box, indicating that it is more robust to randomness and is more reliable for the Rosenbrock function.
Looking into the results on the Rastrigin function, Figure 6(b), we can see that there is a small difference between the different methods. The reason can be justified by the highly fluctuating behavior of the Rastrigin function with several local optima, and as the noise level added to the function increases, the optimum is harder to obtain. The robustness and quality of the final solution found has been analyzed through the boxplots for Rastrigin function. , which is the full replication approach, has the shortest box and is the most robust approach to randomness, however, it has a higher average across different noise levels compare to . The same pattern was observed for the quality of the final BSMS. finds better BSMS at the highest noise level with relative robustness.
For the Levy function, NoReplication outperforms the replication approaches in almost every case (Figure 6(c)). Although, we can see that SmartReplication is doing well too. had a comparatively short box at a higher level of noise for the average of different runs. NoReplication outperforms in terms of the quality of final BSMS after 1000 evaluation for the Levy function, however, the performance of the is very competitive. The boxplots for Rastrigin and Levy can be found in Appendix, § 6. Overall, we observe that NoReplication is not robust to the noise effect although it has better average value.
Next, we analyze the average performance of Noreplication, Fixedreplication, and Smartreplication using RBF for 30 random pools across different noise levels.
The plots in Figure 9 confirm that in the deterministic situation, NoReplication, is the best option. However, as the noise level increases, outperforms . We can see that is more competitive when we use the interpolating model, RBF for the Rosenbrock function. Looking into the Rastrigin function, Figure 9(b), we can see that there is a small difference between different methods, especially at the highest level of uncertainty. Noreplication slightly outperforms all the other methods, since exploration overcomes the replication in the case of having a complicated function like Rastrigin. It can be seen that Replication approaches with 5 replications are slightly competitive with NoReplication at a very high level of noise.
Next, we analyze the robustness of algorithms and the quality of the final BSMS, using the boxplots. One can verify that is the most robust approach to the uncertainty level and randomness since it has the shortest box as the noise level increases, Figure 9. is competitive under a higher level of uncertainty. is the best option in the nonoise case since replication wastes function evaluations when there is no uncertainty.
From Figures 11, observe that the variance of BSMS after 1000 function evaluations for 30 different runs at each considered noise level. As is observed, comparatively outperforms in terms of the BSMS at the end. and are robust to different noise levels in terms of finding the BSMS. For Rastrigin function we observed that has the highest robustness toward randomness, however, it had larger BSMS, comparatively. finds better BSMS, on average when the uncertainty level is low, as we expected. In this case, finds better BSMS at the highest noise level. This confirms that for a highly fluctuated function like Rastrigin we need more exploration as well as replication. For Levy function outperforms other methods under the highest level of uncertainty. is competitive, as well. We verified that has the shortest box overall, except for the nonoise case, which Noreplication has the lower MTFAUC on average. In terms of robustness to randomness and the different level of noise, and are competitive. However, the MTFAUC is lower for .
5 Conclusion and Future Work
TKMARS outperforms RBF, the leading surrogate optimization technique in the literature, with regard to the evaluation described in § 4. TKMARS shows excellent improvement in the handling of insignificant inputs over RBF. SmartReplication is outperforms NoReplication in terms of robustness to the uncertainty level and the quality of the final optimum solution discovered.
TKMARS is a regressionbased model that does not traverse all the points and forms a surface that minimizes the error based on the distance between the real values and the fitted line. As a result, regressionbased models help to have a better approximation for data sets with noisy outcomes. On the other hand SmartReplication can adapt itself based on the uncertainty level. Consequently, its behavior is similar to NoReplication when the uncertainty level is low and is similar to FixedReplication when the uncertainty level is high. Further, SmartReplication does not waste evaluations on the unpromising points and replicate the nearoptima points. Overall, SmartReplication with TKMARS is the most robust and reliable approach for highdimensional blackbox optimization undet uncertainty.
References
 MOImbo: multiobjective infill for parallel modelbased optimization. In International Conference on Learning and Intelligent Optimization, pp. 173–186. Cited by: §2.7.
 Orthogonal arrays of strength two and three. The Annals of Mathematical Statistics, pp. 508–524. Cited by: §4.1.
 Classification and regression trees. CRC press. Cited by: §1, §2.5.
 Penalized spline estimation for functional coefficient regression models. Computational statistics & data analysis 54 (4), pp. 891–905. Cited by: §2.4.
 Analysis of support vector regression for approximation of complex engineering analyses. Journal of mechanical design 127 (6), pp. 1077–1087. Cited by: §1.
 A multiobjective surrogatebased optimization of the crashworthiness of a hybrid impact absorber. International Journal of Mechanical Sciences 88, pp. 46–54. Cited by: §2.7, §2.7.
 Global optimization with multivariate adaptive regression splines. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 37 (2), pp. 333–340. Cited by: §2.7, §2.7.
 A surrogateassisted evolution strategy for constrained multiobjective optimization. Expert Systems with Applications 57, pp. 270–284. Cited by: §2.7.
 Adaptive optimisation of noisy blackbox functions inherent in microscopic models. Computers & chemical engineering 31 (5), pp. 466–476. Cited by: §3.4.
 An exploration and exploitation pareto approach to surrogate optimization. The University of Texas at Arlington. Cited by: §2.6, §2.7, §2.7, §3.2.
 A kind of balance between exploitation and exploration on kriging for global optimization of expensive functions. Journal of Mechanical Science and Technology 29 (5), pp. 2121–2133. Cited by: §2.7, §2.7.
 Multivariate adaptive regression splines. The annals of statistics, pp. 1–67. Cited by: §1, §2.2, §2.4, §2.4, §4.1.
 Blackbox optimization of noisy functions with unknown smoothness. In Advances in Neural Information Processing Systems, pp. 667–675. Cited by: §3.4.
 Realparameter blackbox optimization benchmarking 2009: noiseless functions definitions. Ph.D. Thesis, INRIA. Cited by: §4.
 Global optimization of stochastic blackbox systems via sequential kriging metamodels. Journal of global optimization 34 (3), pp. 441–466. Cited by: §2.7, §2.7.
 Functional coefficient regression models for nonlinear time series: a polynomial spline approach. Scandinavian journal of statistics 31 (4), pp. 515–534. Cited by: §2.4.
 A fastprediction surrogate model for large datasets. Aerospace Science and Technology 75, pp. 74–87. Cited by: §2.2.
 A method for simulation based optimization using radial basis functions. Optimization and Engineering 11 (4), pp. 501–532. Cited by: §2.7, §2.7.
 Efficient global optimization of expensive blackbox functions. Journal of Global optimization 13 (4), pp. 455–492. Cited by: §2.7.

ParEGO: a hybrid algorithm with online landscape approximation for expensive multiobjective optimization problems.
IEEE Transactions on Evolutionary Computation
10 (1), pp. 50–66. Cited by: §2.7.  Restructuring forward step of mars algorithm using a new knot selection procedure based on a mapping approach. Journal of Global Optimization 60 (1), pp. 79–102. Cited by: §2.4, §2.4.
 A statistical approach to some mine valuation and allied problems on the witwatersrand: by dg krige. Ph.D. Thesis, University of the Witwatersrand. Cited by: §1, §2.2.
 SOP: parallel surrogate global optimization with pareto center selection for computationally expensive single objective problems. Journal of Global Optimization 66 (3), pp. 417–437. Cited by: §2.7, §2.7.
 Variants of multivariate adaptive regression splines (mars): convex vs. nonconvex, piecewiselinear vs. smooth and sequential algorithms. Ph.D. Thesis, Ph. D. thesis, The University of Texas at Arlington. Cited by: §2.4.
 Freeknot splines and adaptive knot selection. Journal of the Japan Statistical Society 35 (2), pp. 303–324. Cited by: §2.4.

Q2: memorybased active learning for optimizing noisy continuous functions
. In Robotics and Automation, 2000. Proceedings. ICRA’00. IEEE International Conference on, Vol. 4, pp. 4095–4102. Cited by: §2.7, §2.7, §3.4.  CH 4 parameter estimation in clm4. 5bgc using surrogate global optimization. Geoscientific Model Development 8 (10), pp. 3285–3310. Cited by: §2.7.
 Influence of ensemble surrogate models and sampling strategy on the solution quality of algorithms for computationally expensive blackbox global optimization problems. Journal of Global Optimization 60 (2), pp. 123–144. Cited by: §2.7.
 GOSAC: global optimization with surrogate approximation of constraints. Journal of Global Optimization 69 (1), pp. 117–136. Cited by: §2.7.
 MISO: mixedinteger surrogate optimization framework. Optimization and Engineering 17 (1), pp. 177–203. Cited by: §2.7, §2.7.
 Applied linear statistical models. Vol. 4, Irwin Chicago. Cited by: §2.2.
 Structural optimization using evolution strategies and neural networks. Computer methods in applied mechanics and engineering 156 (14), pp. 309–333. Cited by: §1.
 Quantilebased optimization of noisy computer experiments with tunable precision. Technometrics 55 (1), pp. 2–13. Cited by: §1, §2.7, §2.7.
 A benchmark of krigingbased infill criteria for noisy optimization. Structural and Multidisciplinary Optimization 48 (3), pp. 607–626. Cited by: §2.7.
 Robust airline fleet assignment. Cited by: §1.
 The design of optimum multifactorial experiments. Biometrika 33 (4), pp. 305–325. Cited by: §4.1.
 Radial basis funcitionn for multivariable interpolation: a review. In IMA Conference on Algorithms for the Approximation of Functions ans Data, pp. 143–167. Cited by: §1, §2.2, §2.3.
 Constrained global optimization of expensive black box functions using radial basis functions. Journal of Global optimization 31 (1), pp. 153–171. Cited by: §2.7.
 Constrained optimization by radial basis function interpolation for highdimensional expensive blackbox problems with infeasible initial points. Engineering Optimization 46 (2), pp. 218–243. Cited by: §2.7.
 Evolutionary programming for highdimensional constrained expensive blackbox optimization using radial basis functions. IEEE Transactions on Evolutionary Computation 18 (3), pp. 326–347. Cited by: §2.7.
 Stochastic radial basis function algorithms for largescale optimization involving expensive blackbox objective and constraint functions. Computers & Operations Research 38 (5), pp. 837–853. Cited by: §2.7.
 Design and analysis of computer experiments. Statistical science, pp. 409–423. Cited by: §2.7.
 Comparison of response surface and kriging models for multidisciplinary design optimization. In 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, pp. 4755. Cited by: §2.7.

Oracally efficient spline smoothing of nonlinear additive autoregression models with simultaneous confidence band
.Journal of Multivariate Analysis
101 (9), pp. 2008–2025. Cited by: §2.4.  [45] Virtual library of simulation experiments: test functions and datasets. Note: Retrieved April 25, 2017, from http://www.sfu.ca/ ssurjano Cited by: §4.