I Introduction
Optimisation is the search for the best performing design with respect to a predefined objective function. In practice, however, the optimal performance may not be achieved for a number of reasons: variation in the design due to manufacturing tolerances, operation away from the design point, the optimised model does not accurately reflect reality, and environmental uncertainties.
If the objective function is insensitive to changes in design parameters, the performance change will largely go unnoticed. Unfortunately, this is not the case for many realworld problems: even small changes in the modelled design may result in dramatic consequences. The goal of robust optimisation is to locate designs that have a guaranteed performance as the design parameters vary within a region around the modelled optimal design. We refer to these robust performance regions as sweet spots of the design space. Although often the designer desires to guarantee performance for small perturbations around the optimum, we present a technique applicable to arbitrarily large sweet spots.
Expensive blackbox functions are a common problem in many disciplines, including tuning the parameters of machine learning algorithms
[1, 2], robotics [3, 4], and other engineering design problems [5, 6, 7]. Bayesian optimisation is a principled and efficient technique for the global optimisation of these functions. The idea behind Bayesian optimisation is to place a prior distribution over the target function and then update that prior with a set of “true” observations of the target function by expensively evaluating it in order to produce a posterior predictive distribution. The posterior then informs where to make the next observation of the target function through the use of an acquisition function, which balances the
exploitation of regions known to have good performance with the exploration of regions where there is little information about the function’s response. A Gaussian process is a popular choice of prior, because they are intuitive to understand, capable of modelling the target function accurately with few data, and cheap to evaluate with the small numbers of observations usually available.This paper introduces and evaluates a novel acquisition function for the Bayesian robust optimisation of expensive blackbox functions. We also describe an efficient algorithm by which to compute the acquisition function in higher dimensional spaces.
We begin by outlining background material and reviewing similar techniques in Section II. Then, in Section III, we present a formal definition of a sweet spot, which lays the groundwork for the later sections. Section IV builds upon the previous section by introducing the Bayesian optimisation of sweet spots, and giving a demonstration on a toy function in one dimension. We also discuss efficient optimisation in higher dimensions and examine strategies for determining the parameter set at which to next evaluate the function. Results of two, five, and tendimensional test problems are presented alongside analysis in Section V. Finally, the conclusion and suggestions for future work can be found in Section VI.
Ii Background
This section comprises background material in Bayesian optimisation (Section IIA), Gaussian processes (Section IIB), and robust optimisation for expensivetoevaluate functions (Section IIC).
Iia Bayesian optimisation
Although stochastic search algorithms, such as evolutionary algorithms, have been popular for the optimisation of expensive blackbox functions, Bayesian optimisation is often more attractive. Through explicitly modelling the expensive function and accounting for the uncertainty in the model, the search can be guided efficiently to promising areas of the decision space: either those with high certainty of being better than the current best solution, or those with high uncertainty that may be better than the current best. See [8] for an introduction to Bayesian optimisation, and [9] for a recent comprehensive review.
To be definite and without loss of generality, we assume that the goal of optimisation is to minimise a function where are the decision variables in the feasible space . Bayesian optimisation relies on constructing a model of . Assume that has been (expensively) evaluated in locations so that data are available from which to learn a model. Then Bayesian modelling is used to construct a posterior predictive distribution at any desired location . Crucially, Bayesian modelling gives not only a prediction of the function value at , but the posterior distribution quantifies the uncertainty in the prediction as well. Where to next expensively evaluate is determined by an acquisition function, which balances the exploitation of good values of with exploring uncertain and potentially good regions. Here we use the popular expected improvement [10], which has been shown to be effective in practice and for which some theoretical guarantees exist [11]
. Alternatives such as the probability of improvement
[12] or upperconfidence bound [13, 8] could also be used.If is modelled to take the value , then the improvement at is defined as
(1) 
where
(2) 
is the best function value from the evaluations thus far. The expected improvement is then
(3) 
Gaussian processes are commonly used for modelling in which case the posterior predictive distribution is itself a Gaussian density (see Section IIB) with mean
and variance
. In this case the expected improvement has the closed analytical form [10]:(4) 
where and and
are the standard Normal density and cumulative distribution functions respectively.
The next (expensive) evaluation is then chosen as that with the greatest expected improvement: . This location is often discovered by using an evolutionary algorithm to maximise , which is rapid since is computationally cheap to evaluate. The evaluated location and its function value are added to and the optimisation proceeds iteratively until some stopping criterion is met or, more commonly, the available computational resources are exhausted.
IiB Gaussian Processes
Gaussian processes (s) [14] are commonly used for Bayesian optimisation due to their flexibility and the simple Gaussian predictive posterior distributions. Briefly, a
is a collection of random variables, and any finite number of these have a joint Gaussian distribution. Given data
and a feature vector
, the posterior predictive density of the target is Gaussian:(5) 
where the mean and variance of the prediction are given by
(6)  
(7) 
Here , is the vector of evaluated function values at . Nonlinearity in the enters through a kernel function which models the covariance between two feature vectors. The covariance matrix collects these covariances together, , and is the dimensional vector of covariances between the training data and :
. There are a number of kernels that could be used, for example radial basis functions, or the Matérn family of covariance functions
[14]; here we used the Matérn covariance function with smoothing parameter .In addition the kernel function depends upon a number of hyperparameters, . Training the comprises inferring these hyperparameters by maximising the marginal likelihood of the data given by
(8) 
IiC Robust Optimisation of Expensive Functions
Here we focus on robust optimisation problems that involve computationally expensive blackbox functions; for a comprehensive survey of nonexpensive robust optimisation see [16]. Cases in which the objective function is expensive to evaluate, approaches such as evolutionary algorithms [17, 18] or particle swarm optimisation [19] will not be viable due to the large number of function evaluations they demand. Therefore it is essential to apply methods that only necessitate small numbers of observations; in extreme cases this may be—at most—two hundred. In spite of this necessity, relatively few methods exist in the literature to address this.
There are a few methods in the literature that use s to develop a surrogate model of the expensive function [20, 21, 22], which reduces the computational cost of the optimisation in two ways. Firstly, it enables the surrogate model (rather than the expensive function) to be searched using, for example, an evolutionary algorithm or simulated annealing. Secondly, the use of a surrogate has the clear benefit of curtailing the computational burden of evaluating the robustness of solutions, because the surrogate model can be interrogated over the true objective function. Although these methods lighten the computational load, they do not take into account the uncertainty (in the model) that is accessible with a to help guide the search in subsequent iterations. Picheny et al. present a review of robust acquisition functions for use with a [23], but these only account for noise in the function’s response.
A stateoftheart Bayesian approach was presented by ur Rehman et al. [24]. This method exploits a with a modified formulation of the expected improvement, which aims to account for the robust performance over a region of the design space. Whilst this technique is shown to be useful for expensive robust optimisation, there are two drawbacks: (1) the uncertainty of the is largely disregarded when calculating the modified expected improvement, as only the uncertainty at the estimated worst performing location is considered; and (2) this method is demonstrated with a somewhat substantial number of initial observations (100 in 10 dimensions), which makes it rather unsuitable for very expensive functions. Owing to this method being considered state of the art we have elected to include our own implementation of it for comparison during experimentation.
Iii Sweet Spot Definition
As stated in Section I, we are not focussed on locating an optimal point of in , rather we are interested in locating a region , in which the quality of the target function is good whilst being insensitive to variations in . We refer to as a sweet spot of the design space; it is worth emphasising that need not be small.
A quality measure can be used to describe the aggregated value of over and thus robustness of in a number of ways. For example the average performance of can be calculated as
(9) 
An alternative, which might be more useful in practice, is to guarantee the worstcase performance across a sweet spot [16],
(10) 
Having defined a quality function, the optimisation problem becomes
(11) 
An issue that now arises is that (11) is unconstrained, because the shape of is unconstrained. Smaller are likely to appear better performing and there is nothing to prevent shrinking to a point as part of the optimisation. We therefore separate the location of from its shape and parametrise it as , where determines the location of and parametrises the sweet spot’s shape. In practice the shape of is likely to be constrained to be convex, and frequently a (hyper) sphere. Here we fix , so that has a constant shape and volume, and concentrate on only optimising the location of the sweet spot. To simplify the notation we therefore omit the dependence of the sweet spot on and write for a sweet spot located at .
To further constrain this optimisation, we demand that a search algorithm should only return sweet spots that contain at least one location that has been expensively evaluated; that is, for to be a valid sweet spot there must exist for some the set of evaluated locations.
As illustrated in Figure 1, we also define the neighbourhood of a location to be the set of sweet spot locations that contain together with the sweet spot at those locations:
(12) 
Iv Sweet Spot Optimisation
The optimisation of a sweet spot can be distilled to the minimisation of some chosen quality measure (see Section III for suggestions for ) of a sweet spot . The obstacle to straightforward optimisation is that evaluating the quality of candidate sweet spots, using for example (9) or (10), is infeasible for continuous domains: they require the evaluation of for all . In spite of this, one can envisage that a bruteforce method for optimising a sweet spot would be to transform the target function using the quality measure to yield a new function of to be optimised, , so that:
(13) 
Here we propose to search for the optimum sweet spot by constructing a Gaussian process () model of to which the quality function can be applied in order to model the sweet spot’s quality. In order to account for the uncertainty in the modelled we estimate an expected improvement in by drawing realisations from the and calculating the improvement for each realisation over the current best; averaging these realisationspecific improvements for all of the drawn realisations yields the expected improvement.
We first describe a straightforward method, which proves to be computationally infeasible when the dimension of decision space or the number of evaluated locations becomes large. In the following sections we discuss modifications to the algorithm to make it computationally efficient.
Algorithm 1 shows the main steps in the naïve sweet spot optimisation procedure. The process is initialised with a small number of evaluations of , usually chosen via Latin hypercube sampling. These allow a to be constructed (line 4). In lines 5 to 8, a realisation from the fitted model is drawn for a set of that densely covers possible locations of the current best sweet spot and the location of possible new sweet spots. Note that a realisation evaluated at a set of locations is a draw from a multivariate Gaussian where . This surrogate of can then be used to estimate the improvement for this realisation of a location over the best sweet spot location so far evaluated. As illustrated in Figure 1, the requirement that any sweet spot should contain an evaluated location means that the set of possible locations for the best sweet spot found so far is where . Thus the improvement is:
(14) 
where indicates that the quality is estimated from samples from the modelled . For example, if one were using the worstcase quality measure,
(15) 
with the locations at which
is evaluated uniformly distributed over
. Note that the best sweet spot so far evaluated, , depends upon the particular realisation.Averaging the improvement over several realisations drawn from the posterior distribution of permits the expected improvement to be estimated as the average over realisations:
(16) 
The optimum location of the sweet spot for an expensive evaluation is thus found as
(17) 
using, for example, an evolutionary optimiser to search over the feasible space (line 7). As we show below, although we demand that each sweet spot contain an evaluated location, it can be advantageous to evaluate at a location other than the “centre” of . In Section IVB we explore a number of criteria for choosing the location to evaluate. In Algorithm 1 is expensively evaluated at the location provided by the function (lines 10 and 11). This sequence is then repeated until convergence is achieved or computational resources are exhausted.
Despite the apparent simplicity of this algorithm, it requires a number of modifications to make it useful for practical problems.
In Section IVB we discuss where to expensively evaluate within a newly discovered sweet spot, and in sections IVC and IVD we describe a simplification to the evaluation of the improvement (14) and how to efficiently evaluate it in higher dimensions. First we illustrate the procedure with a toy example.
Iva Toy Example
We illustrate the procedure using the toy onedimensional function
(18) 
for . For simplicity we restrict the sweet spot to be an interval, with a single (scalar) shape parameter defining its width, i.e.
(19) 
For now, as we seek to optimise only the location of the sweet spot, we set .
Figure 2 shows the toy target function and the induced landscape for the worstcase quality as given by (10). This toy function illustrates how the optimal single point location, namely the minimum of itself, can exist in a distinct location from the optimal robust region.
The first step is to fit a Gaussian process to an initial set of observations of the expensive function . In this instance we have used an initial set of observations; the resulting Gaussian process can be seen in Figure 6. In practice a sampling scheme with low discrepancy [25] (such as a Sobol sequence [26] or Latin hypercube sampling [27]) would be used to generate this initial set.
Each realisation of the Gaussian process can be thought of as a possible whose quality can be evaluated using the elected quality measure. Figure 6 (middle) shows the effect of applying the quality function to a drawn realisation together with the resulting improvement from that realisation (27).
. The realisation shown as a bold line is used in the middle panel. The 95% confidence interval of the Gaussian process is shown as the shaded region. The span of the bestsofar sweet spot is indicated by the bar centred along the bottom. Note that
in this case is centred on an observation, but this need not be the case in practice. Middle: A single realisation drawn from the Gaussian process model and the corresponding induced quality function. The circle at shows the value of the quality of the bestsofar sweet spot, and the shaded region indicates where there is improvement over the bestsofar sweet spot. Bottom: Monte Carlo approximation over many realisations of of the expected improvement (16) for the sweet spot, and the singlepoint expected improvement of . Triangles indicate where the expected improvement is greatest.In this example we have constrained the best sweet spot to be centred on an observation ; the best sweet spot so far is centred at , including the observation at . The sweet spot quality for each realisation is calculated using (14) and the robust expected improvement is approximated as an average over all of the realisations (16). This is the acquisition function used for determining where to sample next, (17). Figure 6 (bottom) compares the expected sweet spot improvement (16) with the (usual) singlepoint expected improvement (4), which clearly demonstrates that the sweet spot expected improvement gives greater weight to searching more robust regions of design space.
Comparison of where the next three observations are located when using the robust expected improvement (top) and the usual singlepoint expected improvement (bottom). The same initial eight observations were used for both schemes. New observations are indicated with circles, the solid line and the shaded region indicate the median and the interquartile range of the estimated worstcase quality
over 100 realisations respectively.Figure 9 shows the result of continuing the optimisation procedure for three additional iterations; the objective function is evaluated at the centre of the sweet spot of maximum acquisition, . The robust sweet spot optimiser quickly locates the region of the optimum sweet spot, whereas the singlepoint optimiser searches the region of the global minimum. Also shown are the median and interquartile range of the calculated over 100 realisations from the following the 11 observations, showing that the approximation to calculated from the realisations is accurate, particularly in regions where has been evaluated.
IvB Sampling Location
Nonrobust acquisition functions, such as the expected improvement described in Section IIA, determine a point of maximum acquisition. In contrast, their robust counterparts yield a region, which presents an additional decision in the optimisation process: where within this region should the next observation of the target function be made.
We constrain the location of the new observation to exist within the region of maximum acquisition . Further, we propose the use of a sampling function , which determines where within to locate . As we show in our results (Section V) the choice of has a significant impact on the algorithm’s ability to estimate the “best” sweet spot. Here we present four suggestions for the sampling function ; Figure 10 illustrates each of them.

Centred observation. An obvious choice is to observe the objective function at the location of maximum expected improvement—the centre of the sweet spot:
(20) 
Most uncertain observation. A maximally explorative approach to improving the estimate of the quality of the predicted best sweet spot is to observe the expensive function at the location of maximum uncertainty within :
(21) where is the predicted variance of the Gaussian process at the given location, , see (7).

Worstcase prediction. An alternative to improve the estimate of the sweet spot’s quality is to query at the location of the worstcase predicted value:
(22) where is the predicted mean of the Gaussian process at the given location .

Uniformly at random. Finally, draw uniformly at random within :
(23) This approach may also be expected to promote exploration, but not in such a directed way as the “most uncertain observation” scheme.
IvC BestSoFar Sweet Spot
In standard Bayesian optimisation the bestsofar location and function value are simply available because has been evaluated at and deciding on the bestsofar location is merely a matter of inspecting the evaluated locations. However, in this robust scheme the improvement for a particular realisation calculating the improvement (14) requires a procedure to search for the quality of the best sweet spot: so that candidate sweet spots can be compared with it. Since the evaluation of requires evaluating the modelled at many locations covering , this optimisation in turn requires evaluating the modelled over all locations that might be covered by the sweet spot, that is over the extended neighbourhood of :
(24) 
The extended neighbourhood is illustrated in Figure 1.
To avoid this potentially expensive optimisation for every draw of a realisation of , we instead identify the best sweet spot as:
(25) 
where is evaluated from the mean of the modelled :
(26) 
As shown in Algorithm 2, is determined once each new observation is acquired (line 6).
By evaluating at a number of locations in a candidate sweet spot, the improvement for a particular realisation is then evaluated as
(27) 
with the best quality found so far estimated as:
(28) 
IvD Efficiency in Higher Dimensions
When the dimension of is small it is possible to implement the naïve algorithm (Algorithm 1), such that an exhaustive search can be used cheaply to find the that maximises the expected improvement (16). In this naïve scheme it is necessary to sample realisations of jointly across the entire feasible space, and then compute the improvement—and thus the expected improvement—for a sufficiently dense set of locations in . However, this naïve implementation becomes computationally exorbitant once the number of dimensions goes beyond just two, due to the increasing number of locations required to be sampled jointly. For our sweet spot method to be of much practical use, we must modify the naïve approach so that a reasonable number of dimensions is achievable.
A more efficient procedure is to use an evolutionary algorithm to search for the sweet spot with the greatest expected improvement (Algorithm 2, line 7). Since the locations explored by the evolutionary optimiser are not known in advance, this requires the ability to progressively sample a single realisation of at new locations as the search proceeds in order to evaluate , the quality of a sweet spot at a new candidate location. We emphasise that accurate evaluation of the improvement requires the evaluation of (27) for a single realisation. Approximating and using different realisations is insufficient because it ignores the dependence between the bestsofar sweet spot and a candidate sweet spot.
Methods for sampling from a multivariate Gaussian density are well known. A popular method is to compute the Cholesky decomposition of the covariance matrix , where is a lowertriangular matrix. Then, if is a vector whose elements are samples from a zeromean univariate Gaussian density, is a sample from . Here, the computational complexity is dominated by the cost of computing the Cholesky decomposition, which is cubic in the size of . We can exploit this method of sampling from a Gaussian density so that we can effectively draw additional samples from the same realisation post hoc, which enables the evolutionary search for the maximum of the acquisition function to be performed.
Suppose that a realisation has been sampled at locations using the covariance matrix ; denote this joint sample by . Then the realisation may be evaluated at a new location by sampling from
(29) 
where and . Using the Cholesky decomposition of , the required mean and variance can be efficiently found by solving the triangular systems and in turn to obtain . The Cholesky decomposition of the augmented covariance may then be updated as:
(30) 
where . These ideas are straightforwardly extended to permit sampling at many new locations instead of a single . In Algorithm 2 these operations are encapsulated in the procedure which evaluates the expected improvement for a candidate sweet spot for use in an evolutionary algorithm to maximise the acquisition function.
V Results
We present results of the performance of our method in comparison to the stateoftheart method described by ur Rehman et al. in [24] with over five common benchmark functions [28, 29, 30]; the sixth function is taken from [24] for direct comparison with their work. Figure 11 presents twodimensional visualisations of each function, which are defined in Table I.
Name  Equation  Ref.  
Bumped Bowl  [28]  
Levy03 

[29]  
StyblinskiTang  [30]  
Robust Problem 4  [28]  
Stepped Sphere  [28]  
Exponential  [24] 
Each benchmark was selected to test a different aspect of robust optimisation. Function presents a situation where the robust optimum is situated at a local maximum, which tests the ability of an algorithm to overlook the betterperforming nonrobust region. Our implementation of this benchmark has been modified from [28] to ensure that the robust optimum exists exactly at the peak of the local maximum. Benchmarks and are examples of functions with multiple local minima. In the case of benchmark , the robust optimum resides just outside of the global optimum, which tests robust procedures’ resilience to nonrobust regions. Optimisers that exploit the parabolic sphere have difficulty in finding the “step” in containing the optimum, which occupies a vanishingly small proportion of the domain as the number of dimensions increases. This function has been modified from [28] to ensure that the size of the step remains significant as increases: even so, the proportion of containing the lower step is only .
The robust quality measure used for evaluating the sweet spot was the worstcase quality (10). The size of the sweet spots were constrained to be spherical with radius :
(31) 
where and are the upper and lower bounds of the domain respectively (i.e. ).
We evaluated the four sampling schemes proposed in Section IVB: centred observation (20), most uncertain observation (21), and uniformly at random (23). We have not included the results of the worstcase prediction sampling scheme (22), because its performance was not competitive.
To enable paired comparisons, each method was initialised using the same Latin hypercube samples. The experiments were repeated 30 times for statistical comparison.
Figure 14 compares the convergence of each of the methods tested, and Table II provides an endofrun summary, including the median, the median average deviation around the median, the minimum value, and statistical significance with Bonferroni correction of each scheme over the 30 runs.
Centre  Random  Most uncertain  ur Rehman et al.  
Min.  Med.  MAD  Min.  Med.  MAD  Min.  Med.  MAD  Min.  Med.  MAD  
0.000  0.000  0.000  0.000  
0.000  0.000  0.000  0.000  
0.031  
0.003  
0.000  0.000  0.000  0.000  
0.003  
0.183  
0.039  
0.000  
0.008  
0.000  
0.049  
0.256  
0.042  
0.001  0.001  
8.982  
0.004 
Va Analysis
The convergence plots in Figure 14 show the difference between the state of the optimiser and the value of the true robust minimum. They demonstrate that our sweet spot optimisation procedure is generally capable of locating and exploiting robust optima with a small number of observations of the underlying expensive function. However, we note that all methods perform significantly less well in dimensions and none of the methods is able to locate the robust optimum for the Levy03 or stepped sphere functions.
Of the three competing sampling functions, , sampling at the most uncertain location within the sweet spot or at random consistently produce the best result in terms of final solution quality and convergence rate. The success of these methods is largely because of the increased exploration of the bestsofar robust region, which leads to more even coverage of observations in that key region. The benefits of increased exploration are evident for the step function in dimensions where the most uncertain method is the only method that explores the bottom of the parabolic bowl containing the step sufficiently to locate the optimum. In higher dimensions we conjecture that the model of the function is not sufficiently good to allow identification of the most uncertain point.
Figure 19 shows the typical search pattern for each of the three sampling methods and ur Rehman et al.’s approach after 50 iterations on the twodimensional StyblinskiTang function [30]; each run was initialised from the same set of 3 Latin hypercube samples. It is clear from this example that both the “random” and “most uncertain” sampling schemes have made better estimates of the robust optimum. In addition, the “most uncertain” sampling scheme has been the most exploratory over the remainder of the domain, and has lead to the most even coverage of observations at the robust optimum.
In all but one case our method has outperformed that of ur Rehman et al.’s competing method. In the case of the 10dimensional Levy03, ur Rehman et al.’s method was able to produce a lower median value after 150 iterations, although this result is not statistically significant. The most uncertain sampling scheme was able to successfully locate better values on more (18 vs 12) of the runs than ur Rehman et al..
Generally, and as one might expect, the performance of all of the compared schemes worsens as the number of dimensions increases. Benchmark problem is difficult for a to model due to the discontinuous downward step in one corner of the domain. This problem is exacerbated in higher dimensions: whilst in two dimensions the downward step covers of the domain, as noted above, the step scales such that it occupies of the domain. The result is that in ten dimensions the downward step exists in less than onethousandth of the domain. With the extremely limited number of observations made during our experiments it is no surprise that none of the sampling schemes nor ur Rehman et al.’s method discovered the step. In five dimensions only the “most uncertain” and “random” sampling schemes were able to find the step, which shows (in this case) that the improved exploration offered by these schemes presents significant advantages in locating complicated response features.
In general ur Rehman et al.’s approach and the sweet spot optimisation with the “centre” sampling scheme do not do well, and in fact perform similarly poorly in similar circumstances. This appears to be a result of both of the approaches tending towards making more exploitative observations.
Each of the presented methods make significant ground towards improving their quality of robustness, as seen in Figure 14. And each one generally exhibits a similar convergence curve for the initial 20 iterations. However, during the remainder of the run it is clear that “most uncertain” and “random” generally outperform the other approaches.
Vi Conclusion
This paper has introduced a novel algorithm for the robust optimisation of expensivetoevaluate functions in the context of Bayesian optimisation. Experiments on a range of commonlyused benchmark functions show that our method is effective at locating robust optima, and able to consistently outperform a stateoftheart method from the literature. The method depends upon building a model of the expensive function and then evaluating the improvement with respect to a chosen quality function over realisations drawn from the model. The expectation of these improvements is then used as an acquisition function, which is searched using an evolutionary algorithm, to inform the next location of the expensive function to evaluate.
Essential to the success of this method is the ability to estimate the improvement in the robust quality from a single realisation of the Gaussian process. We have therefore described how to efficiently sample from a single realisation during the course of the evolutionary search.
Standard (nonrobust) Bayesian optimisation and ur Rehman et al.’s method only require the to be evaluated once per iteration; we require evaluations of the to form an estimate of the robust improvement. Consequently our methods take roughly times as long to decide on the next location for evaluation of the expensive function. While this additional burden is significant for benchmark functions like these which are trivial to evaluate, the additional time required is insignificant in comparison with the time required to evaluate realworld expensive functions. Note also that interrogating the can be made in parallel.
Subject to the demand that the expensivetoevaluate function be evaluated in the putative sweet spot, we have demonstrated that the choice of sampling location can markedly affect the convergence rate and quality of the final solution. Methods that promote exploration are more effective than exploitative methods and sampling from the location about which the model is most uncertain is effective, although we suspect that in higher dimensions that quality of the surrogate model may not be good enough to properly identify the most uncertain location.
Future work entails the simultaneous optimisation of the location and the shape of the sweet spot, and improvements in uncertainty estimation in high dimensions.
Acknowledgment
This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/M017915/1].
References
 [1] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyperparameter optimization,” in Advances in Neural Information Processing Systems, 2011, pp. 2546–2554.
 [2] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimization of machine learning algorithms,” in Advances in Neural Information Processing Systems, 2012, pp. 2951–2959.
 [3] M. Tesch, J. Schneider, and H. Choset, “Using response surfaces and expected improvement to optimize snake robot gait parameters,” in Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on. IEEE, 2011, pp. 1069–1074.
 [4] D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans, “Automatic Gait Optimization with Gaussian Process Regression,” in IJCAI, vol. 7, 2007, pp. 944–949.
 [5] S. J. Daniels, A. A. Rahat, R. M. Everson, G. R. Tabor, and J. E. Fieldsend, “A suite of computationally expensive shape optimisation problems using computational fluid dynamics,” in International Conference on Parallel Problem Solving from Nature. Springer, 2018, pp. 296–307.

[6]
D. Anthony and A. Keane, “Robustoptimal design of a lightweight space structure using a genetic algorithm,”
AIAA journal, vol. 41, no. 8, pp. 1601–1604, 2003. 
[7]
D. Wiesmann, U. Hammel, and T. Back, “Robust design of multilayer optical
coatings by means of evolutionary algorithms,”
IEEE Transactions on Evolutionary Computation
, vol. 2, no. 4, pp. 162–167, 1998.  [8] E. Brochu, V. M. Cora, and N. De Freitas, “A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” arXiv preprint arXiv:1012.2599, 2010.
 [9] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, “Taking the human out of the loop: A review of Bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, Jan 2016.
 [10] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive blackbox functions,” Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, 1998.
 [11] A. D. Bull, “Convergence rates of efficient global optimization algorithms,” Journal of Machine Learning Research, vol. 12, pp. 2879–2904, 2011.
 [12] H. J. Kushner, “A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise,” Journal of Basic Engineering, vol. 86, no. 1, pp. 97–106, 1964.
 [13] N. Srinivas, A. Krause, S. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10. USA: Omnipress, 2010, pp. 1015–1022.
 [14] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2006.
 [15] GPy, “GPy: A Gaussian process framework in Python,” http://github.com/SheffieldML/GPy, 2012.
 [16] H.G. Beyer and B. Sendhoff, “Robust optimization – A comprehensive survey,” Computer Methods in Applied Mechanics and Engineering, vol. 196, no. 33, pp. 3190 – 3218, 2007.
 [17] J. Branke, “Creating robust solutions by means of evolutionary algorithms,” in International Conference on Parallel Problem Solving from Nature. Springer, 1998, pp. 119–128.
 [18] I. Paenke, J. Branke, and Y. Jin, “Efficient search for robust solutions by means of evolutionary algorithms and fitness approximation,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 405–420, 2006.

[19]
C.E. J. Dippel, “Using particle swarm optimization for finding robust optima,” Natural Computing Group, Universiteit Leiden, Tech. Rep., 2010.
 [20] Y. Jin, “Surrogateassisted evolutionary computation: Recent advances and future challenges,” Swarm and Evolutionary Computation, vol. 1, no. 2, pp. 61–70, 2011.
 [21] Y.S. Ong, P. B. Nair, and K. Y. Lum, “Maxmin surrogateassisted evolutionary algorithm for robust design,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 392–404, 2006.
 [22] K.H. Lee and G.J. Park, “A global robust optimization using Kriging based approximation model,” JSME International Journal Series C Mechanical Systems, Machine Elements and Manufacturing, vol. 49, no. 3, pp. 779–788, 2006.
 [23] V. Picheny, T. Wagner, and D. Ginsbourger, “A benchmark of Krigingbased infill criteria for noisy optimization,” Structural and Multidisciplinary Optimization, vol. 48, no. 3, pp. 607–626, 2013.
 [24] S. ur Rehman, M. Langelaar, and F. van Keulen, “Efficient Krigingbased robust optimization of unconstrained problems,” Journal of Computational Science, vol. 5, no. 6, pp. 872–881, 2014.
 [25] J. Matoušek, “On the L2discrepancy for Anchored Boxes,” Journal of Complexity, vol. 14, no. 4, pp. 527–556, Dec. 1998.
 [26] I. M. Sobol’, “On the distribution of points in a cube and the approximate evaluation of integrals,” Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, vol. 7, no. 4, pp. 784–802, 1967.
 [27] M. D. Morris and T. J. Mitchell, “Exploratory designs for computational experiments,” Journal of Statistical Planning and Inference, vol. 43, no. 3, pp. 381–402, 1995.
 [28] S. Mirjalili and A. Lewis, “Obstacles and difficulties for robust benchmark problems: A novel penaltybased robust optimisation method,” Information Sciences, vol. 328, pp. 485–509, 2016.
 [29] M. Laguna and R. Martí, “Experimental testing of advanced scatter search designs for global optimization of multimodal functions,” Journal of Global Optimization, vol. 33, no. 2, pp. 235–255, 2005.
 [30] M. Styblinski and T.S. Tang, “Experiments in nonconvex optimization: stochastic approximation with function smoothing and simulated annealing,” Neural Networks, vol. 3, no. 4, pp. 467–483, 1990.
 [31] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945.
Comments
There are no comments yet.