A Bayesian Approach for the Robust Optimisation of Expensive-To-Evaluate Functions

04/25/2019
by   Nicholas D. Sanders, et al.
University of Plymouth
22

Many expensive black-box optimisation problems are sensitive to their inputs. In these problems it makes more sense to locate a region of good designs, than a single, possible fragile, optimal design. Expensive black-box functions can be optimised effectively with Bayesian optimisation, where a Gaussian process is a popular choice as a prior over the expensive function. We propose a method for robust optimisation using Bayesian optimisation to find a region of design space in which the expensive function's performance is insensitive to the inputs whilst retaining a good quality. This is achieved by sampling realisations from a Gaussian process modelling the expensive function and evaluating the improvement for each realisation. The expectation of these improvements can be optimised cheaply with an evolutionary algorithm to determine the next location at which to evaluate the expensive function. We describe an efficient process to locate the optimum expected improvement. We show empirically that evaluating the expensive function at the location in the candidate sweet spot about which the model is most uncertain or at random yield the best convergence in contrast to exploitative schemes. We illustrate our method on six test functions in two, five, and ten dimensions, and demonstrate that it is able to outperform a state-of-the-art approach from the literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 8

page 9

page 10

10/27/2014

Heteroscedastic Treed Bayesian Optimisation

Optimising black-box functions is important in many disciplines, such as...
04/17/2020

What do you Mean? The Role of the Mean Function in Bayesian Optimisation

Bayesian optimisation is a popular approach for optimising expensive bla...
02/21/2019

Bayesian optimisation under uncertain inputs

Bayesian optimisation (BO) has been a successful approach to optimise fu...
08/31/2015

Model Guided Sampling Optimization for Low-dimensional Problems

Optimization of very expensive black-box functions requires utilization ...
02/21/2019

Stable Bayesian Optimisation via Direct Stability Quantification

In this paper we consider the problem of finding stable maxima of expens...
12/05/2019

Ordinal Bayesian Optimisation

Bayesian optimisation is a powerful tool to solve expensive black-box pr...
06/20/2019

Bayesian Optimisation over Multiple Continuous and Categorical Inputs

Efficient optimisation of black-box problems that comprise both continuo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Optimisation is the search for the best performing design with respect to a predefined objective function. In practice, however, the optimal performance may not be achieved for a number of reasons: variation in the design due to manufacturing tolerances, operation away from the design point, the optimised model does not accurately reflect reality, and environmental uncertainties.

If the objective function is insensitive to changes in design parameters, the performance change will largely go unnoticed. Unfortunately, this is not the case for many real-world problems: even small changes in the modelled design may result in dramatic consequences. The goal of robust optimisation is to locate designs that have a guaranteed performance as the design parameters vary within a region around the modelled optimal design. We refer to these robust performance regions as sweet spots of the design space. Although often the designer desires to guarantee performance for small perturbations around the optimum, we present a technique applicable to arbitrarily large sweet spots.

Expensive black-box functions are a common problem in many disciplines, including tuning the parameters of machine learning algorithms

[1, 2], robotics [3, 4], and other engineering design problems [5, 6, 7]

. Bayesian optimisation is a principled and efficient technique for the global optimisation of these functions. The idea behind Bayesian optimisation is to place a prior distribution over the target function and then update that prior with a set of “true” observations of the target function by expensively evaluating it in order to produce a posterior predictive distribution. The posterior then informs where to make the next observation of the target function through the use of an acquisition function, which balances the

exploitation of regions known to have good performance with the exploration of regions where there is little information about the function’s response. A Gaussian process is a popular choice of prior, because they are intuitive to understand, capable of modelling the target function accurately with few data, and cheap to evaluate with the small numbers of observations usually available.

This paper introduces and evaluates a novel acquisition function for the Bayesian robust optimisation of expensive black-box functions. We also describe an efficient algorithm by which to compute the acquisition function in higher dimensional spaces.

We begin by outlining background material and reviewing similar techniques in Section II. Then, in Section III, we present a formal definition of a sweet spot, which lays the groundwork for the later sections. Section IV builds upon the previous section by introducing the Bayesian optimisation of sweet spots, and giving a demonstration on a toy function in one dimension. We also discuss efficient optimisation in higher dimensions and examine strategies for determining the parameter set at which to next evaluate the function. Results of two-, five-, and ten-dimensional test problems are presented alongside analysis in Section V. Finally, the conclusion and suggestions for future work can be found in Section VI.

Ii Background

This section comprises background material in Bayesian optimisation (Section II-A), Gaussian processes (Section II-B), and robust optimisation for expensive-to-evaluate functions (Section II-C).

Ii-a Bayesian optimisation

Although stochastic search algorithms, such as evolutionary algorithms, have been popular for the optimisation of expensive black-box functions, Bayesian optimisation is often more attractive. Through explicitly modelling the expensive function and accounting for the uncertainty in the model, the search can be guided efficiently to promising areas of the decision space: either those with high certainty of being better than the current best solution, or those with high uncertainty that may be better than the current best. See [8] for an introduction to Bayesian optimisation, and [9] for a recent comprehensive review.

To be definite and without loss of generality, we assume that the goal of optimisation is to minimise a function where are the decision variables in the feasible space . Bayesian optimisation relies on constructing a model of . Assume that has been (expensively) evaluated in locations so that data are available from which to learn a model. Then Bayesian modelling is used to construct a posterior predictive distribution at any desired location . Crucially, Bayesian modelling gives not only a prediction of the function value at , but the posterior distribution quantifies the uncertainty in the prediction as well. Where to next expensively evaluate is determined by an acquisition function, which balances the exploitation of good values of with exploring uncertain and potentially good regions. Here we use the popular expected improvement [10], which has been shown to be effective in practice and for which some theoretical guarantees exist [11]

. Alternatives such as the probability of improvement

[12] or upper-confidence bound [13, 8] could also be used.

If is modelled to take the value , then the improvement at is defined as

(1)

where

(2)

is the best function value from the evaluations thus far. The expected improvement is then

(3)

Gaussian processes are commonly used for modelling in which case the posterior predictive distribution is itself a Gaussian density (see Section II-B) with mean

and variance

. In this case the expected improvement has the closed analytical form [10]:

(4)

where and and

are the standard Normal density and cumulative distribution functions respectively.

The next (expensive) evaluation is then chosen as that with the greatest expected improvement: . This location is often discovered by using an evolutionary algorithm to maximise , which is rapid since is computationally cheap to evaluate. The evaluated location and its function value are added to and the optimisation proceeds iteratively until some stopping criterion is met or, more commonly, the available computational resources are exhausted.

Ii-B Gaussian Processes

Gaussian processes (s) [14] are commonly used for Bayesian optimisation due to their flexibility and the simple Gaussian predictive posterior distributions. Briefly, a

is a collection of random variables, and any finite number of these have a joint Gaussian distribution. Given data

and a feature vector

, the posterior predictive density of the target is Gaussian:

(5)

where the mean and variance of the prediction are given by

(6)
(7)

Here , is the vector of evaluated function values at . Non-linearity in the enters through a kernel function which models the covariance between two feature vectors. The covariance matrix collects these covariances together, , and is the -dimensional vector of covariances between the training data and :

. There are a number of kernels that could be used, for example radial basis functions, or the Matérn family of covariance functions

[14]; here we used the Matérn covariance function with smoothing parameter .

In addition the kernel function depends upon a number of hyper-parameters, . Training the comprises inferring these hyper-parameters by maximising the marginal likelihood of the data given by

(8)

Although the log marginal likelihood function landscape may be non-convex and multi-modal, we adopt the standard practice of using a gradient-based optimiser (BFGS) with several random starts to estimate good hyper-parameter values

[15].

Ii-C Robust Optimisation of Expensive Functions

Here we focus on robust optimisation problems that involve computationally expensive black-box functions; for a comprehensive survey of non-expensive robust optimisation see [16]. Cases in which the objective function is expensive to evaluate, approaches such as evolutionary algorithms [17, 18] or particle swarm optimisation [19] will not be viable due to the large number of function evaluations they demand. Therefore it is essential to apply methods that only necessitate small numbers of observations; in extreme cases this may be—at most—two hundred. In spite of this necessity, relatively few methods exist in the literature to address this.

There are a few methods in the literature that use s to develop a surrogate model of the expensive function [20, 21, 22], which reduces the computational cost of the optimisation in two ways. Firstly, it enables the surrogate model (rather than the expensive function) to be searched using, for example, an evolutionary algorithm or simulated annealing. Secondly, the use of a surrogate has the clear benefit of curtailing the computational burden of evaluating the robustness of solutions, because the surrogate model can be interrogated over the true objective function. Although these methods lighten the computational load, they do not take into account the uncertainty (in the model) that is accessible with a to help guide the search in subsequent iterations. Picheny et al. present a review of robust acquisition functions for use with a [23], but these only account for noise in the function’s response.

A state-of-the-art Bayesian approach was presented by ur Rehman et al. [24]. This method exploits a with a modified formulation of the expected improvement, which aims to account for the robust performance over a region of the design space. Whilst this technique is shown to be useful for expensive robust optimisation, there are two drawbacks: (1) the uncertainty of the is largely disregarded when calculating the modified expected improvement, as only the uncertainty at the estimated worst performing location is considered; and (2) this method is demonstrated with a somewhat substantial number of initial observations (100 in 10 dimensions), which makes it rather unsuitable for very expensive functions. Owing to this method being considered state of the art we have elected to include our own implementation of it for comparison during experimentation.

Iii Sweet Spot Definition

As stated in Section I, we are not focussed on locating an optimal point of in , rather we are interested in locating a region , in which the quality of the target function is good whilst being insensitive to variations in . We refer to as a sweet spot of the design space; it is worth emphasising that need not be small.

A quality measure can be used to describe the aggregated value of over and thus robustness of in a number of ways. For example the average performance of can be calculated as

(9)

An alternative, which might be more useful in practice, is to guarantee the worst-case performance across a sweet spot [16],

(10)

Having defined a quality function, the optimisation problem becomes

(11)

An issue that now arises is that (11) is unconstrained, because the shape of is unconstrained. Smaller are likely to appear better performing and there is nothing to prevent shrinking to a point as part of the optimisation. We therefore separate the location of from its shape and parametrise it as , where determines the location of and parametrises the sweet spot’s shape. In practice the shape of is likely to be constrained to be convex, and frequently a (hyper-) sphere. Here we fix , so that has a constant shape and volume, and concentrate on only optimising the location of the sweet spot. To simplify the notation we therefore omit the dependence of the sweet spot on and write for a sweet spot located at .

To further constrain this optimisation, we demand that a search algorithm should only return sweet spots that contain at least one location that has been expensively evaluated; that is, for to be a valid sweet spot there must exist for some the set of evaluated locations.

As illustrated in Figure 1, we also define the neighbourhood of a location to be the set of sweet spot locations that contain together with the sweet spot at those locations:

(12)
Figure 1: A two-dimensional illustration of the neighbourhood (12) of the locations evaluated so far (crosses). is the set of possible locations for the best sweet spot. The extended neighbourhood (24) is the region that contains any sweet spot which contains a member of . The bar in the bottom left-hand corner represents the size of the sweet spot used to generate these neighbourhoods.

Iv Sweet Spot Optimisation

The optimisation of a sweet spot can be distilled to the minimisation of some chosen quality measure (see Section III for suggestions for ) of a sweet spot . The obstacle to straightforward optimisation is that evaluating the quality of candidate sweet spots, using for example (9) or (10), is infeasible for continuous domains: they require the evaluation of for all . In spite of this, one can envisage that a brute-force method for optimising a sweet spot would be to transform the target function using the quality measure to yield a new function of to be optimised, , so that:

(13)

Here we propose to search for the optimum sweet spot by constructing a Gaussian process () model of to which the quality function can be applied in order to model the sweet spot’s quality. In order to account for the uncertainty in the modelled we estimate an expected improvement in by drawing realisations from the and calculating the improvement for each realisation over the current best; averaging these realisation-specific improvements for all of the drawn realisations yields the expected improvement.

We first describe a straightforward method, which proves to be computationally infeasible when the dimension of decision space or the number of evaluated locations becomes large. In the following sections we discuss modifications to the algorithm to make it computationally efficient.

1:Input: Observations of :
2:Input: Sampling function,
3:for  do
4:     Fit to
5:     for  do
6:         Draw realisation from
7:         Evaluate improvement Equation (14)
8:     end for
9:      Best location
10:      Determine where to sample next
11:      Expensively evaluate
12:     
13:end for
Algorithm 1 Naïve sweet-spot optimisation

Algorithm 1 shows the main steps in the naïve sweet spot optimisation procedure. The process is initialised with a small number of evaluations of , usually chosen via Latin hypercube sampling. These allow a to be constructed (line 4). In lines 5 to 8, a realisation from the fitted model is drawn for a set of that densely covers possible locations of the current best sweet spot and the location of possible new sweet spots. Note that a realisation evaluated at a set of locations is a draw from a multivariate Gaussian where . This surrogate of can then be used to estimate the improvement for this realisation of a location over the best sweet spot location so far evaluated. As illustrated in Figure 1, the requirement that any sweet spot should contain an evaluated location means that the set of possible locations for the best sweet spot found so far is where . Thus the improvement is:

(14)

where indicates that the quality is estimated from samples from the modelled . For example, if one were using the worst-case quality measure,

(15)

with the locations at which

is evaluated uniformly distributed over

. Note that the best sweet spot so far evaluated, , depends upon the particular realisation.

Averaging the improvement over several realisations drawn from the posterior distribution of permits the expected improvement to be estimated as the average over realisations:

(16)

The optimum location of the sweet spot for an expensive evaluation is thus found as

(17)

using, for example, an evolutionary optimiser to search over the feasible space (line 7). As we show below, although we demand that each sweet spot contain an evaluated location, it can be advantageous to evaluate at a location other than the “centre” of . In Section IV-B we explore a number of criteria for choosing the location to evaluate. In Algorithm 1 is expensively evaluated at the location provided by the function (lines 10 and 11). This sequence is then repeated until convergence is achieved or computational resources are exhausted.

Despite the apparent simplicity of this algorithm, it requires a number of modifications to make it useful for practical problems.

In Section IV-B we discuss where to expensively evaluate within a newly discovered sweet spot, and in sections IV-C and IV-D we describe a simplification to the evaluation of the improvement (14) and how to efficiently evaluate it in higher dimensions. First we illustrate the procedure with a toy example.

Iv-a Toy Example

We illustrate the procedure using the toy one-dimensional function

(18)

for . For simplicity we restrict the sweet spot to be an interval, with a single (scalar) shape parameter defining its width, i.e.

(19)

For now, as we seek to optimise only the location of the sweet spot, we set .

Figure 2 shows the toy target function and the induced landscape for the worst-case quality as given by (10). This toy function illustrates how the optimal single point location, namely the minimum of itself, can exist in a distinct location from the optimal robust region.

Figure 2: Toy function defined in (18) and the respective quality function for a sweet spot of constant size. The triangles indicate the minimum of and . The bar in the bottom left-hand corner depicts the size of the sweet spot.

The first step is to fit a Gaussian process to an initial set of observations of the expensive function . In this instance we have used an initial set of observations; the resulting Gaussian process can be seen in Figure 6. In practice a sampling scheme with low discrepancy [25] (such as a Sobol sequence [26] or Latin hypercube sampling [27]) would be used to generate this initial set.

Each realisation of the Gaussian process can be thought of as a possible whose quality can be evaluated using the elected quality measure. Figure 6 (middle) shows the effect of applying the quality function to a drawn realisation together with the resulting improvement from that realisation (27).

(a)
(b)
(c)
Figure 6: Top: An example of 5 realisations drawn from a Gaussian process, which has been fitted to 8 initial observations (crosses) of the toy function

. The realisation shown as a bold line is used in the middle panel. The 95% confidence interval of the Gaussian process is shown as the shaded region. The span of the best-so-far sweet spot is indicated by the bar centred along the bottom. Note that

in this case is centred on an observation, but this need not be the case in practice. Middle: A single realisation drawn from the Gaussian process model and the corresponding induced quality function. The circle at shows the value of the quality of the best-so-far sweet spot, and the shaded region indicates where there is improvement over the best-so-far sweet spot. Bottom: Monte Carlo approximation over many realisations of of the expected improvement (16) for the sweet spot, and the single-point expected improvement of . Triangles indicate where the expected improvement is greatest.

In this example we have constrained the best sweet spot to be centred on an observation ; the best sweet spot so far is centred at , including the observation at . The sweet spot quality for each realisation is calculated using (14) and the robust expected improvement is approximated as an average over all of the realisations (16). This is the acquisition function used for determining where to sample next, (17). Figure 6 (bottom) compares the expected sweet spot improvement (16) with the (usual) single-point expected improvement (4), which clearly demonstrates that the sweet spot expected improvement gives greater weight to searching more robust regions of design space.

(a)
(b)
Figure 9:

Comparison of where the next three observations are located when using the robust expected improvement (top) and the usual single-point expected improvement (bottom). The same initial eight observations were used for both schemes. New observations are indicated with circles, the solid line and the shaded region indicate the median and the inter-quartile range of the estimated worst-case quality

over 100 realisations respectively.

Figure 9 shows the result of continuing the optimisation procedure for three additional iterations; the objective function is evaluated at the centre of the sweet spot of maximum acquisition, . The robust sweet spot optimiser quickly locates the region of the optimum sweet spot, whereas the single-point optimiser searches the region of the global minimum. Also shown are the median and interquartile range of the calculated over 100 realisations from the following the 11 observations, showing that the approximation to calculated from the realisations is accurate, particularly in regions where has been evaluated.

Iv-B Sampling Location

Non-robust acquisition functions, such as the expected improvement described in Section II-A, determine a point of maximum acquisition. In contrast, their robust counterparts yield a region, which presents an additional decision in the optimisation process: where within this region should the next observation of the target function be made.

We constrain the location of the new observation to exist within the region of maximum acquisition . Further, we propose the use of a sampling function , which determines where within to locate . As we show in our results (Section V) the choice of has a significant impact on the algorithm’s ability to estimate the “best” sweet spot. Here we present four suggestions for the sampling function ; Figure 10 illustrates each of them.

Figure 10: Illustration of each sampling function described in Section IV-B. The solid line and accompanying shaded region depicts a fitted Gaussian process model and 95% confidence interval. The hatched region indicates . Note that the location depicted for the “random” sample is illustrative for one possibility of such a function.
  1. Centred observation. An obvious choice is to observe the objective function at the location of maximum expected improvement—the centre of the sweet spot:

    (20)
  2. Most uncertain observation. A maximally explorative approach to improving the estimate of the quality of the predicted best sweet spot is to observe the expensive function at the location of maximum uncertainty within :

    (21)

    where is the predicted variance of the Gaussian process at the given location, , see (7).

  3. Worst-case prediction. An alternative to improve the estimate of the sweet spot’s quality is to query at the location of the worst-case predicted value:

    (22)

    where is the predicted mean of the Gaussian process at the given location .

  4. Uniformly at random. Finally, draw uniformly at random within :

    (23)

    This approach may also be expected to promote exploration, but not in such a directed way as the “most uncertain observation” scheme.

Iv-C Best-So-Far Sweet Spot

In standard Bayesian optimisation the best-so-far location and function value are simply available because has been evaluated at and deciding on the best-so-far location is merely a matter of inspecting the evaluated locations. However, in this robust scheme the improvement for a particular realisation calculating the improvement (14) requires a procedure to search for the quality of the best sweet spot: so that candidate sweet spots can be compared with it. Since the evaluation of requires evaluating the modelled at many locations covering , this optimisation in turn requires evaluating the modelled over all locations that might be covered by the sweet spot, that is over the extended neighbourhood of :

(24)

The extended neighbourhood is illustrated in Figure 1.

To avoid this potentially expensive optimisation for every draw of a realisation of , we instead identify the best sweet spot as:

(25)

where is evaluated from the mean of the modelled :

(26)

As shown in Algorithm 2, is determined once each new observation is acquired (line 6).

By evaluating at a number of locations in a candidate sweet spot, the improvement for a particular realisation is then evaluated as

(27)

with the best quality found so far estimated as:

(28)

Iv-D Efficiency in Higher Dimensions

1:Input: Observations of :
2:Input: Sampling function,
3:for  do
4:     Fit to
5:     Determine best-so-far sweet spot
6:     
7:      Evolutionary opt.
8:      Where to sample next
9:      Expensively evaluate
10:     
11:end for
12:Update and return
1:procedure ()
2:     for  do
3:         Draw at Eq. (29)
4:         Evaluate improvement Equation (27)
5:     end for
6:     return Equation (16)
7:end procedure
Algorithm 2 Sweet-spot optimisation

When the dimension of is small it is possible to implement the naïve algorithm (Algorithm 1), such that an exhaustive search can be used cheaply to find the that maximises the expected improvement (16). In this naïve scheme it is necessary to sample realisations of jointly across the entire feasible space, and then compute the improvement—and thus the expected improvement—for a sufficiently dense set of locations in . However, this naïve implementation becomes computationally exorbitant once the number of dimensions goes beyond just two, due to the increasing number of locations required to be sampled jointly. For our sweet spot method to be of much practical use, we must modify the naïve approach so that a reasonable number of dimensions is achievable.

A more efficient procedure is to use an evolutionary algorithm to search for the sweet spot with the greatest expected improvement (Algorithm 2, line 7). Since the locations explored by the evolutionary optimiser are not known in advance, this requires the ability to progressively sample a single realisation of at new locations as the search proceeds in order to evaluate , the quality of a sweet spot at a new candidate location. We emphasise that accurate evaluation of the improvement requires the evaluation of (27) for a single realisation. Approximating and using different realisations is insufficient because it ignores the dependence between the best-so-far sweet spot and a candidate sweet spot.

Methods for sampling from a multivariate Gaussian density are well known. A popular method is to compute the Cholesky decomposition of the covariance matrix , where is a lower-triangular matrix. Then, if is a vector whose elements are samples from a zero-mean univariate Gaussian density, is a sample from . Here, the computational complexity is dominated by the cost of computing the Cholesky decomposition, which is cubic in the size of . We can exploit this method of sampling from a Gaussian density so that we can effectively draw additional samples from the same realisation post hoc, which enables the evolutionary search for the maximum of the acquisition function to be performed.

Suppose that a realisation has been sampled at locations using the covariance matrix ; denote this joint sample by . Then the realisation may be evaluated at a new location by sampling from

(29)

where and . Using the Cholesky decomposition of , the required mean and variance can be efficiently found by solving the triangular systems and in turn to obtain . The Cholesky decomposition of the augmented covariance may then be updated as:

(30)

where . These ideas are straightforwardly extended to permit sampling at many new locations instead of a single . In Algorithm 2 these operations are encapsulated in the procedure which evaluates the expected improvement for a candidate sweet spot for use in an evolutionary algorithm to maximise the acquisition function.

V Results

We present results of the performance of our method in comparison to the state-of-the-art method described by ur Rehman et al. in [24] with over five common benchmark functions [28, 29, 30]; the sixth function is taken from [24] for direct comparison with their work. Figure 11 presents two-dimensional visualisations of each function, which are defined in Table I.

Figure 11: Two-dimensional visualisations of the six benchmark functions. Exact formulations can be found in Table I.
Name Equation Ref.
Bumped Bowl [28]
Levy03
where
[29]
Styblinski-Tang [30]
Robust Problem 4 [28]
Stepped Sphere [28]
Exponential [24]
Table I: Summary of the benchmark functions used for experimentation, the domains over which they were evaluated, and the location of the robust optimum . A signifies that the stated equation has been modified from the referenced source (see Section V for full explanation).

Each benchmark was selected to test a different aspect of robust optimisation. Function presents a situation where the robust optimum is situated at a local maximum, which tests the ability of an algorithm to overlook the better-performing non-robust region. Our implementation of this benchmark has been modified from [28] to ensure that the robust optimum exists exactly at the peak of the local maximum. Benchmarks and are examples of functions with multiple local minima. In the case of benchmark , the robust optimum resides just outside of the global optimum, which tests robust procedures’ resilience to non-robust regions. Optimisers that exploit the parabolic sphere have difficulty in finding the “step” in containing the optimum, which occupies a vanishingly small proportion of the domain as the number of dimensions increases. This function has been modified from [28] to ensure that the size of the step remains significant as increases: even so, the proportion of containing the lower step is only .

The robust quality measure used for evaluating the sweet spot was the worst-case quality (10). The size of the sweet spots were constrained to be spherical with radius :

(31)

where and are the upper and lower bounds of the domain respectively (i.e. ).

We evaluated the four sampling schemes proposed in Section IV-B: centred observation (20), most uncertain observation (21), and uniformly at random (23). We have not included the results of the worst-case prediction sampling scheme (22), because its performance was not competitive.

To enable paired comparisons, each method was initialised using the same Latin hypercube samples. The experiments were repeated 30 times for statistical comparison.

Figure 14 compares the convergence of each of the methods tested, and Table II provides an end-of-run summary, including the median, the median average deviation around the median, the minimum value, and statistical significance with Bonferroni correction of each scheme over the 30 runs.

5 dimensions
(a) 5 dimensions
(b) 10 dimensions
Figure 14: Convergence over the six benchmarks (section V and Table I) of each sampling scheme (section IV-B) and the scheme described by ur Rehman et al. [24]. Solid lines show the median (over 30 runs) of the difference between the quality of the predicted best sweet spot and the quality of the true best sweet spot; the shaded regions show the inter-quartile range. The top set of plots (a) display the results from the experiments in 5 dimensions, whereas the bottom set of plots (b) are from the 10-dimensional experiments. Note that excepting the Stepped Sphere results, all plots are on a semi-log scale on the vertical axis.
Centre Random Most uncertain ur Rehman et al.
Min. Med. MAD Min. Med. MAD Min. Med. MAD Min. Med. MAD
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
0.031
0.003
0.000 0.000 0.000 0.000
0.003
0.183
0.039
0.000
0.008
0.000
0.049
0.256
0.042
0.001 0.001
8.982
0.004
Table II: Summary results in 2, 5, and 10 dimensions. Three sampling schemes (Section IV-B) and ur Rehman et al.’s [24] method are compared using the minimum (Min.), and the median (Med.) over 30 runs; spread around the median result is quantified by the median absolute deviation (MAD). Best medians, which are statistically indistinguishable from one another, but significantly better than the others (using the Wilcoxon signed ranks test [31], single tail, and Bonferroni correction ) are highlighted in bold. The lowest minimum value for each experiment is italicised.

V-a Analysis

The convergence plots in Figure 14 show the difference between the state of the optimiser and the value of the true robust minimum. They demonstrate that our sweet spot optimisation procedure is generally capable of locating and exploiting robust optima with a small number of observations of the underlying expensive function. However, we note that all methods perform significantly less well in dimensions and none of the methods is able to locate the robust optimum for the Levy03 or stepped sphere functions.

Of the three competing sampling functions, , sampling at the most uncertain location within the sweet spot or at random consistently produce the best result in terms of final solution quality and convergence rate. The success of these methods is largely because of the increased exploration of the best-so-far robust region, which leads to more even coverage of observations in that key region. The benefits of increased exploration are evident for the step function in dimensions where the most uncertain method is the only method that explores the bottom of the parabolic bowl containing the step sufficiently to locate the optimum. In higher dimensions we conjecture that the model of the function is not sufficiently good to allow identification of the most uncertain point.

Figure 19 shows the typical search pattern for each of the three sampling methods and ur Rehman et al.’s approach after 50 iterations on the two-dimensional Styblinski-Tang function [30]; each run was initialised from the same set of 3 Latin hypercube samples. It is clear from this example that both the “random” and “most uncertain” sampling schemes have made better estimates of the robust optimum. In addition, the “most uncertain” sampling scheme has been the most exploratory over the remainder of the domain, and has lead to the most even coverage of observations at the robust optimum.

(a) Centre
(b) Random
(c) Most uncertain
(d) ur Rehman et al.
Figure 19: Comparison between where the expensive function () has been observed by three of the sampling schemes proposed in Section IV-B (a)–(c) and ur Rehman et al.’s method (d). Crosses indicate observation locations (the shade of which darkens as the search progresses), and hexagons indicate the three initial Latin hypercube observations.

In all but one case our method has out-performed that of ur Rehman et al.’s competing method. In the case of the 10-dimensional Levy03, ur Rehman et al.’s method was able to produce a lower median value after 150 iterations, although this result is not statistically significant. The most uncertain sampling scheme was able to successfully locate better values on more (18 vs 12) of the runs than ur Rehman et al..

Generally, and as one might expect, the performance of all of the compared schemes worsens as the number of dimensions increases. Benchmark problem is difficult for a to model due to the discontinuous downward step in one corner of the domain. This problem is exacerbated in higher dimensions: whilst in two dimensions the downward step covers of the domain, as noted above, the step scales such that it occupies of the domain. The result is that in ten dimensions the downward step exists in less than one-thousandth of the domain. With the extremely limited number of observations made during our experiments it is no surprise that none of the sampling schemes nor ur Rehman et al.’s method discovered the step. In five dimensions only the “most uncertain” and “random” sampling schemes were able to find the step, which shows (in this case) that the improved exploration offered by these schemes presents significant advantages in locating complicated response features.

In general ur Rehman et al.’s approach and the sweet spot optimisation with the “centre” sampling scheme do not do well, and in fact perform similarly poorly in similar circumstances. This appears to be a result of both of the approaches tending towards making more exploitative observations.

Each of the presented methods make significant ground towards improving their quality of robustness, as seen in Figure 14. And each one generally exhibits a similar convergence curve for the initial 20 iterations. However, during the remainder of the run it is clear that “most uncertain” and “random” generally outperform the other approaches.

Vi Conclusion

This paper has introduced a novel algorithm for the robust optimisation of expensive-to-evaluate functions in the context of Bayesian optimisation. Experiments on a range of commonly-used benchmark functions show that our method is effective at locating robust optima, and able to consistently outperform a state-of-the-art method from the literature. The method depends upon building a model of the expensive function and then evaluating the improvement with respect to a chosen quality function over realisations drawn from the model. The expectation of these improvements is then used as an acquisition function, which is searched using an evolutionary algorithm, to inform the next location of the expensive function to evaluate.

Essential to the success of this method is the ability to estimate the improvement in the robust quality from a single realisation of the Gaussian process. We have therefore described how to efficiently sample from a single realisation during the course of the evolutionary search.

Standard (non-robust) Bayesian optimisation and ur Rehman et al.’s method only require the to be evaluated once per iteration; we require evaluations of the to form an estimate of the robust improvement. Consequently our methods take roughly times as long to decide on the next location for evaluation of the expensive function. While this additional burden is significant for benchmark functions like these which are trivial to evaluate, the additional time required is insignificant in comparison with the time required to evaluate real-world expensive functions. Note also that interrogating the can be made in parallel.

Subject to the demand that the expensive-to-evaluate function be evaluated in the putative sweet spot, we have demonstrated that the choice of sampling location can markedly affect the convergence rate and quality of the final solution. Methods that promote exploration are more effective than exploitative methods and sampling from the location about which the model is most uncertain is effective, although we suspect that in higher dimensions that quality of the surrogate model may not be good enough to properly identify the most uncertain location.

Future work entails the simultaneous optimisation of the location and the shape of the sweet spot, and improvements in uncertainty estimation in high dimensions.

Acknowledgment

This work was supported by the Engineering and Physical Sciences Research Council [grant number EP/M017915/1].

References

  • [1] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl, “Algorithms for hyper-parameter optimization,” in Advances in Neural Information Processing Systems, 2011, pp. 2546–2554.
  • [2] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimization of machine learning algorithms,” in Advances in Neural Information Processing Systems, 2012, pp. 2951–2959.
  • [3] M. Tesch, J. Schneider, and H. Choset, “Using response surfaces and expected improvement to optimize snake robot gait parameters,” in Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on.   IEEE, 2011, pp. 1069–1074.
  • [4] D. J. Lizotte, T. Wang, M. H. Bowling, and D. Schuurmans, “Automatic Gait Optimization with Gaussian Process Regression,” in IJCAI, vol. 7, 2007, pp. 944–949.
  • [5] S. J. Daniels, A. A. Rahat, R. M. Everson, G. R. Tabor, and J. E. Fieldsend, “A suite of computationally expensive shape optimisation problems using computational fluid dynamics,” in International Conference on Parallel Problem Solving from Nature.   Springer, 2018, pp. 296–307.
  • [6]

    D. Anthony and A. Keane, “Robust-optimal design of a lightweight space structure using a genetic algorithm,”

    AIAA journal, vol. 41, no. 8, pp. 1601–1604, 2003.
  • [7] D. Wiesmann, U. Hammel, and T. Back, “Robust design of multilayer optical coatings by means of evolutionary algorithms,”

    IEEE Transactions on Evolutionary Computation

    , vol. 2, no. 4, pp. 162–167, 1998.
  • [8] E. Brochu, V. M. Cora, and N. De Freitas, “A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” arXiv preprint arXiv:1012.2599, 2010.
  • [9] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, “Taking the human out of the loop: A review of Bayesian optimization,” Proceedings of the IEEE, vol. 104, no. 1, pp. 148–175, Jan 2016.
  • [10] D. R. Jones, M. Schonlau, and W. J. Welch, “Efficient global optimization of expensive black-box functions,” Journal of Global Optimization, vol. 13, no. 4, pp. 455–492, 1998.
  • [11] A. D. Bull, “Convergence rates of efficient global optimization algorithms,” Journal of Machine Learning Research, vol. 12, pp. 2879–2904, 2011.
  • [12] H. J. Kushner, “A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise,” Journal of Basic Engineering, vol. 86, no. 1, pp. 97–106, 1964.
  • [13] N. Srinivas, A. Krause, S. Kakade, and M. Seeger, “Gaussian process optimization in the bandit setting: No regret and experimental design,” in Proceedings of the 27th International Conference on International Conference on Machine Learning, ser. ICML’10.   USA: Omnipress, 2010, pp. 1015–1022.
  • [14] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning.   The MIT Press, 2006.
  • [15] GPy, “GPy: A Gaussian process framework in Python,” http://github.com/SheffieldML/GPy, 2012.
  • [16] H.-G. Beyer and B. Sendhoff, “Robust optimization – A comprehensive survey,” Computer Methods in Applied Mechanics and Engineering, vol. 196, no. 33, pp. 3190 – 3218, 2007.
  • [17] J. Branke, “Creating robust solutions by means of evolutionary algorithms,” in International Conference on Parallel Problem Solving from Nature.   Springer, 1998, pp. 119–128.
  • [18] I. Paenke, J. Branke, and Y. Jin, “Efficient search for robust solutions by means of evolutionary algorithms and fitness approximation,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 405–420, 2006.
  • [19]

    C.-E. J. Dippel, “Using particle swarm optimization for finding robust optima,” Natural Computing Group, Universiteit Leiden, Tech. Rep., 2010.

  • [20] Y. Jin, “Surrogate-assisted evolutionary computation: Recent advances and future challenges,” Swarm and Evolutionary Computation, vol. 1, no. 2, pp. 61–70, 2011.
  • [21] Y.-S. Ong, P. B. Nair, and K. Y. Lum, “Max-min surrogate-assisted evolutionary algorithm for robust design,” IEEE Transactions on Evolutionary Computation, vol. 10, no. 4, pp. 392–404, 2006.
  • [22] K.-H. Lee and G.-J. Park, “A global robust optimization using Kriging based approximation model,” JSME International Journal Series C Mechanical Systems, Machine Elements and Manufacturing, vol. 49, no. 3, pp. 779–788, 2006.
  • [23] V. Picheny, T. Wagner, and D. Ginsbourger, “A benchmark of Kriging-based infill criteria for noisy optimization,” Structural and Multidisciplinary Optimization, vol. 48, no. 3, pp. 607–626, 2013.
  • [24] S. ur Rehman, M. Langelaar, and F. van Keulen, “Efficient Kriging-based robust optimization of unconstrained problems,” Journal of Computational Science, vol. 5, no. 6, pp. 872–881, 2014.
  • [25] J. Matoušek, “On the L2-discrepancy for Anchored Boxes,” Journal of Complexity, vol. 14, no. 4, pp. 527–556, Dec. 1998.
  • [26] I. M. Sobol’, “On the distribution of points in a cube and the approximate evaluation of integrals,” Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, vol. 7, no. 4, pp. 784–802, 1967.
  • [27] M. D. Morris and T. J. Mitchell, “Exploratory designs for computational experiments,” Journal of Statistical Planning and Inference, vol. 43, no. 3, pp. 381–402, 1995.
  • [28] S. Mirjalili and A. Lewis, “Obstacles and difficulties for robust benchmark problems: A novel penalty-based robust optimisation method,” Information Sciences, vol. 328, pp. 485–509, 2016.
  • [29] M. Laguna and R. Martí, “Experimental testing of advanced scatter search designs for global optimization of multimodal functions,” Journal of Global Optimization, vol. 33, no. 2, pp. 235–255, 2005.
  • [30] M. Styblinski and T.-S. Tang, “Experiments in nonconvex optimization: stochastic approximation with function smoothing and simulated annealing,” Neural Networks, vol. 3, no. 4, pp. 467–483, 1990.
  • [31] F. Wilcoxon, “Individual comparisons by ranking methods,” Biometrics Bulletin, vol. 1, no. 6, pp. 80–83, 1945.