Black-box Mixed-Variable Optimisation using a Surrogate Model that Satisfies Integer Constraints

06/08/2020 ∙ by Laurens Bliek, et al. ∙ Delft University of Technology 0

A challenging problem in both engineering and computer science is that of minimising a function for which we have no mathematical formulation available, that is expensive to evaluate, and that contains continuous and integer variables, for example in automatic algorithm configuration. Surrogate modelling techniques are very suitable for this type of problem, but most existing techniques are designed with only continuous or only discrete variables in mind. Mixed-Variable ReLU-based Surrogate Modelling (MVRSM) is a surrogate modelling algorithm that uses a linear combination of rectified linear units, defined in such a way that (local) optima satisfy the integer constraints. This method is both more accurate and more efficient than the state of the art on several benchmarks with up to 238 continuous and integer variables.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Surrogate modelling techniques such as Bayesian optimisation have a long history of success in optimising expensive black-box objective functions [13, 12, 14]. These are functions that have no mathematical formulation available and take some time or other resource to evaluate, which occurs for example when they are the result of some simulation, algorithm or scientific experiment. Often there is also randomness or noise involved in these evaluations. By approximating the objective with a cheaper surrogate model, the optimisation problem can be solved more efficiently.

While most attention in the literature has gone to problems in continuous domains, recently solutions for combinatorial optimisation problems have started to arise [8, 1, 2, 18, 6]. Yet many problems contain a mix of continuous and discrete variables, for example material design [11], optical filter optimisation [20]

, and automated machine learning 

[10]. The literature on surrogate modelling techniques for these types of problems is even more sparse than for purely discrete problems. Discretising the continuous variables to make use of a purely discrete surrogate model, or applying rounding techniques to make use of a purely continuous surrogate model, are both common but inadequate ways to solve the problem [8, 16]. The few existing techniques that can deal with a mixed variable setting still have considerable room for improvement in either accuracy or efficiency. When the surrogate model is not expressive enough and does not model any interaction between the different variables, it will perform poorly, especially when many variables are involved. We show that this is exactly what happens with the popular surrogate modelling algorithm HyperOpt [3]. On the other hand, most Bayesian optimisation techniques do model the interaction between all variables, but use a surrogate model that grows in size every iteration. This causes those algorithms to become slower over time, potentially even becoming more expensive than the expensive objective itself.

Our main contribution is a surrogate modelling algorithm called Mixed-Variable ReLU-based Surrogate Modelling (MVRSM) that can deal with problems with continuous and integer variables efficiently and accurately. This is realised by using a continuous surrogate model that:

  • models interactions between all variables,

  • does not grow in size over time and can be updated efficiently, and

  • has local optima that are located exactly in points of the search space where the integer constraints are satisfied.

The first point ensures that the model remains accurate, even for large-scale problems. The second point ensures that the algorithm does not slow down over time. Finally, the last point eliminates the need for rounding the variables, which is known to be sub-optimal in Bayesian optimisation [8], and also eliminates the need for using combinatorial optimisation with integer constraints as is done in [7].

Besides the proposed algorithm, the contributions include a proof in Section 4 that the local optima of the proposed surrogate model are integer-valued in the intended variables, and an experimental proof of the effectiveness of this method in Section 5 on five benchmarks that are either taken from related work, or contain the same number of continuous and discrete variables as the benchmarks in related work. The largest benchmark contains variables, which is much larger than the benchmarks considered in most Bayesian optimisation algorithms.

2 Preliminaries

This work considers the problem of finding the minimum of a mixed-variable black-box objective function that can only be accessed via expensive and noisy measurements . That is, we want to solve

(1)

where is the number of continuous variables in the problem, is the number of integer variables,

is a zero-mean random variable with finite variance, and

and are the bounded domains of the continuous and integer variables respectively. In this work, the lower and upper bounds of either or for the -th variable are denoted and respectively. Expensive in this context means that it takes some time or other resource to evaluate

, as is the case in for example hyperparameter tuning problems 

[3] and many engineering problems [4, 18]. Therefore, we wish to solve (1) using as few samples as possible. We assume that only a limited budget of samples is available, meaning that can only be evaluted times.

The problem is usually solved with a surrogate modelling technique such as Bayesian optimisation [14]. In this approach, the data samples , are used to approximate the objective with a surrogate model . Usually,

is a machine learning model such as a Gaussian process, random forest or a a weighted sum of nonlinear basis functions. In any case, it has an exact mathematical formulation, which means that

can be optimised with existing techniques as it is not expensive to evaluate and it is not black-box. If is indeed a good approximation of the original objective , it can be used to suggest new candidate points of the search space where should be evaluated. This happens iteratively, where in every iteration is evaluated, the approximation of is improved, and optimisation on is used to suggest a next point to evaluate .

3 Related work

In Bayesian optimisation, Gaussian processes are the most popular surrogate model [14]. On the one hand, these surrogate models lend themselves well to problems with only continuous variables, but not so much when they include integer variables as well. On the other hand, there have been several recent approaches to develop surrogate models for problems with only discrete variables [8, 1, 18, 6].

The mixed-variable setting is not as well-developed, although there are some surrogate modelling methods that can deal with this. We start by mentioning two well-known methods, namely SMAC [9] and HyperOpt [3], followed by more recent work, along with their strengths and shortcomings. We end this section with recent work on discrete surrogate models that we make use of throughout this paper.

SMAC [9] uses random forests as the surrogate model. This captures interactions between the variables nicely, but the main disadvantage is that the random forests are less accurate in unseen parts of the search space, at least compared to other surrogate models. HyperOpt [3]

uses a Tree-structured Parzen Estimator as the surrogate model. This algorithm is known to be fast in practice, has been shown to work in settings with over

variables, and also has the ability to deal with conditional variables, where certain variables only exist if other variables take on certain values. Its main disadvantage is that complex interactions between variables are not modelled. Most other existing Bayesian optimisation algorithms have to resort to rounding or discretisation in order to deal with the mixed variable setting, which both have their disadvantages [8, 16].

More recently, the CoCaBO algorithm was proposed [16]

, which is developed for problems with a mix of continuous and categorical variables. It makes use of a mix of multi-armed bandits and Gaussian processes. The algorithm can also deal with a batch setting, where the objective function is evaluated multiple times in parallel at each iteration. Other research groups have focused their attention to problems with a mix of continuous, categorical and integer variables that also have multiple objectives 

[20, 11].

Most of the methods mentioned here suffer from the drawback that the surrogate model grows while the algorithm is running, causing the algorithms to become slower over time. This problem has been addressed and solved for the continuous setting [4] and the discrete setting [18, 6] by making use of parametric surrogate models that are linear in the parameters. The recently proposed MiVaBO algorithm [7] is, to the best of our knowledge, the first algorithm that applies this solution to the mixed variable setting. It relies on an alternation between continuous and discrete optimisation to find the optimum of the surrogate model. MiVaBO can also deal with known quadratic constraints, and the authors provide theoretical convergence guarantees.

In contrast with MiVaBO, previous work [6] gives the theoretical guarantee that any local minimum of the surrogate model satisfies the integer constraints, so only continuous optimisation needs to be used. This is achieved by using a surrogate model consisting of a linear combination of rectified linear units (ReLUs), a popular basis function in the machine learning community. Using only continuous optimisation is much more efficient than the approach used in MiVaBO. However, the theory in [6] only applies to problems without continuous variables.

4 Mixed-Variable ReLU-based Surrogate Modelling

In this section, we extend the theory from [6] to the mixed-variable setting. This is far from trivial, as a wrong choice of surrogate model might result in limited interaction between all variables, in not being able to optimise the surrogate model efficiently, or in not being able to satisfy the integer constraints. The result of this extension is the Mixed-Variable ReLU-based Surrogate Modelling (MVRSM) algorithm. This algorithm makes use of a surrogate model based on rectified linear units that contains interactions between all variables, is easy to update and to optimise, and has its local optima situated in points that satisfy the integer constraints.

4.1 Proposed surrogate model

As in related work [4, 6, 7], we use a continuous surrogate model :

(2)

with being the number of basis functions. The model is linear in its own parameters

, which allows it to be trained with linear regression techniques. We choose the basis functions

in such a way that all local optima of the model satisfy , as explained later in this section. This leads to an efficient way of finding the minimum of the surrogate model for mixed variables.

Similar to [5, 6], we choose rectified linear units as the basis functions:

(3)
(4)

with , , and . This causes the surrogate model to be piece-wise linear. The model parameters can be chosen according to one of four strategies:

  • they are optimised together with the weights ,

  • they are chosen directly according to the data samples in a non-parametric way using kernel basis functions [14, 16],

  • they are chosen randomly once and then fixed [4, 5, 18, 7], or

  • they are chosen according to the variable domains and then fixed [6].

The first option is not recommended as nonlinear optimisation would have to be used, while linear regression techniques can be used for the parameters . The second option has the downside that more and more basis functions need to be added as data samples are gathered, making the surrogate model grow in size while the algorithm is running. This is what happens in most Bayesian optimisation algorithms, but it causes these algorithms to become slower over time. The third option fixes this problem, but even though there are good approximation theorems available for a random choice of the parameters [15, 4], it does not give any guarantees on satisfying the integer constraints. The fourth option does, but only for problems that have no continuous variables. Therefore, we propose to use a mix of the third and fourth option, getting the best of both options, as explained below.

The approach in [6] is to choose the model parameters as integers according to the variable domains , which gave the guarantee that any integer constraints were satisfied in the local minima of the model. However, this was done only for basis functions depending on integer-valued variables. By adding mixed features as is done in [7] we may lose this guarantee. We show in this section how the guarantee can still be maintained with mixed variables.

We first re-use two results from [6] that are relevant to our approach:

Theorem 1.

Any strict local minimum of is located in a point with for linearly independent functions  [6].

This follows from the fact that is piece-wise linear, so any strict local minimum must be located in a point where the model is nonlinear in all directions.

Definition 1 (Integer -function).

An integer -function is chosen according to (4) with and with and having integer values chosen according to Algorithm from [6]. That means it has one of the following forms: , with an element from and chosen between and (the lower and upper bounds of ), or , for and chosen between and . This results in a basis function that depends only on one or two subsequent integer variables and does not depend on any continuous variables.

Lemma 1.

If for different linearly independent integer -functions , then .

Proof.

The proof follows exactly the same reasoning as the proof of [6, Thm. 2 (II)]. ∎

By making use of the integer -functions, we have a surrogate model with basis functions that depend on the integer variables. If we would add basis functions that depend only on the continuous variables, the possible interaction between continuous and integer variables would not be modelled. But if we add randomly chosen mixed basis functions as in [7], then we might lose the guarantee that integer constraints are satisfied in local minima. See Figure 1 (left).

Figure 1: (left) Example of the problem with mixed basis functions for integer and continuous variable. All local minima are located in points where two -functions intersect when they are . This works fine for the intersections with the integer -functions , , but not for the intersection of the two randomly chosen -functions , , as in that point takes on a non-integer value. (right) A solution to the problem is to use mixed

-functions that are parallel to a number of linearly independent vectors equal to

. In the visualisation, , so all the mixed -functions are parallel to each other. This ensures that all intersections are located in points where is integer.

To avoid both the problem of losing interaction between variables and the problem of losing the guarantee on satisfying the integer constraints, we propose to add mixed basis functions as in [7], but we choose them pseudo-randomly rather than randomly. This benefits from the success that randomly chosen weights have had in the past [15, 4, 5, 18, 7], while avoiding the problem from Figure 1 (left).

Definition 2 (Mixed -function).

A mixed -function is chosen according to (4) with sampled from a set that contains random vectors in

with a continuous probability distribution

, and is then chosen from a random continuous probability distribution which depends on . This results in a basis function that depends on all continuous and on all integer variables.

The probability distributions and are chosen in such a way that the mixed -functions are never completely outside the domain . (The exact procedure for choosing them can be found in Appendix A.) As a result of the definition, all mixed -functions will be parallel to one of the random vectors. See Figure 1 (right). This gives the following result, which guarantees the unique property of this continuous surrogate model, i.e. that all local minima are integer-valued in the intended variables:

Theorem 2.

If the surrogate model consists entirely of integer and mixed -functions, then any strict local minimum of satisfies .

Proof.

From Theorem 1 it follows that for linearly independent . Since all mixed -functions are parallel to one of the randomly chosen vectors, there can only be linearly independent mixed -functions. As all other -functions are integer -functions, this means that there are linearly independent integer -functions. The result now follows from Lemma 1. ∎

This makes it possible to apply a standard nonlinear optimisation technique such as L-BFGS [19] to find a minimum of our surrogate model, instead of having to solve a mixed-integer program which is more expensive, or having to resort to rounding which is sub-optimal. As the rectified linear units are linear almost everywhere, the surrogate model can be optimised relatively easily with a gradient-based technique such as L-BFGS or other standard methods.

4.2 MVRSM details

In the proposed algorithm, we first initialise the model by adding basis functions consisting of integer and mixed -functions. The procedure of generating integer -functions is the same as in the advanced model of [6], which gives basis functions in total, with the domain of the -th integer variable. We then generate mixed -functions. Since our approach allows us to choose any number of mixed -functions without losing the guarantee of satisfying the integer constraints, computational resources are the only limiting factor here. We chose to have the same number of mixed -functions per continuous variable as the number of integer -functions per integer variable, and so that the computational complexity remains similar as the one in [6].

The algorithm proceeds with an iterative procedure consisting of four steps as in [4, 6]: 1) evaluating the objective, 2) updating the model, 3) finding the minimum of the model, and 4) performing an exploration step. Evaluating the objective at iteration gives a data sample . We also normalise the samples as follows: . The update procedure of the surrogate model is performed with the recursive least squares algorithm [17], which can be done since the model is linear in its parameters . We also add a regularisation factor of here, mainly for numerical stability. Furthermore, the weights from (2) are initialised as for the basis functions corresponding to integer -functions, and as for the basis functions corresponding to the mixed -functions. The minimum of the model is found with the L-BFGS method [19], which is improved by giving an analytical representation of the Jacobian. For this purpose, we define , as the rectified linear units are non-differentiable in . We run the L-BFGS method for sub-iterations only, as the goal is not to find the exact minimum of the surrogate model, but rather to find a promising area of the search space. Lastly, we perform an exploration step on the point found by the L-BFGS algorithm, where the point is perturbed so that local optima can be avoided. For the integer variables, we use an exploration step similar to the one in [6, Sec. 3.4], except that we allow perturbations that are larger than . See Appendix B. For the continuous variables, we use the procedure from [4], adding a random variable to . For each continuous variable ,

is zero-mean normally distributed with a standard deviation of

. The exploration step is done in such a way that the solution stays within the bounds . The whole algorithm is shown in Algorithm 1.

Objective , domains , , budget
,
Initialise surrogate with integer and mixed -functions
Initialise for integer -functions and for mixed -functions, initialise other recursive least squares parameters
for  do
     Evaluate
     Normalise
     Update the parameters of with data point using recursive least squares
     Solve over domains , with relaxed integer constraints using L-BFGS
     Explore around the found solution by adding random perturbation :
Algorithm 1 MVRSM algorithm

5 Experiments

To see if the proposed algorithm overcomes the drawbacks of existing surrogate modelling algorithms for problems with mixed variables in practice, we compare MVRSM with different state-of-the-art methods and random search on several benchmark functions used in related work. For comparison, we consider state-of-the-art surrogate modelling algorithms that are able to deal with a mixed-variable setting, have code available, and are concerned with single-objective problems.

We compare our method with HyperOpt [3] as a popular and established surrogate modelling algorithm that can deal with mixed variables, and we compare with CoCaBO [16] as a more recent method that can deal with a mix of continuous and categorical variables. As is good practice in surrogate modelling, we include random search in the comparisons to confirm whether more sophisticated methods are even necessary.

Though we consider MiVaBO [7] also to be part of the state of the art, at the time of writing the authors have not made their code available yet. We still include their benchmarks in the comparison, and include MiVaBO in the discussion of the results.

5.1 Implementation details

To enable the use of categorical variables in MVRSM, we convert those variables to integers. We also did this for HyperOpt. To enable the use of integer or binary variables for CoCaBO, we convert those variables to categorical variables. For CoCaBO, we chose a mixture weight 

[16, Eq. (2)] of as this seemed to give the best results on synthetic benchmarks in [16]. The random search uses HyperOpt’s implementation. The code of HyperOpt222 https://github.com/hyperopt/hyperopt , CoCaBO333 https://github.com/rubinxin/CoCaBO_code , and MVRSM444 https://github.com/lbliek/MVRSM are availabe online. All methods are implemented in Python, and experiments were done on a CPU with GB of memory. In line with [16], all methods start with initial random guesses, which are not shown in the figures. All figures in this section depict the maximisation of the objective functions instead of minimisation, in line with the figures in [16], and include the standard deviation over multiple runs. Objective function values of minimisation problems have been multiplied with for CoCaBO and for the visualisation of the other methods.

5.2 Results on relevant benchmarks

We consider mixed-variable benchmark problems of various dimensions from related literature, with the largest benchmark having variables. The benchmarks were selected such that they were not too similar in the number of variables, and such that they were easily implemented or available online. When this was not the case, we took a standard black-box optimisation benchmark and adapted it to have similar dimensions as the benchmark from the literature. In the end, this lead to one benchmark from [16] (func3C), one benchmark from [7] (MiVaBO synthetic function), two benchmarks of similar scale as the applications from [7] (Rosenbrock10 and Ackley53), and one benchmark of similar scale as the application from [3] (Rosenbrock238).

All methods are compared on these benchmarks using the same number of iterations for every method, and the best function value found at each iteration is reported, averaged over multiple runs (the standard deviations are shown with error bars). The computation time of the methods is also reported, as we claim that MVRSM is an efficient method for problems with mixed variables. The total computation time for all methods on all benchmarks is shown in Table 1. Since MVRSM also has the advantage of not becoming slower over time, we report not just the total computation time but also the computation time per iteration in the figures in this section.

The remainder of this section gives some more details on the benchmarks and reports and discusses the results of each benchmark separately.

Benchmark Variables RS HO MVRSM CoCaBO
func3C cat., cont.
Rosenbrock10 int., cont.
MiVaBO synth. int., cont.
Ackley53 bin., cont. s h h h
Rosenbrock238 int., cont. s h h -
Table 1: Problem dimensions and total computation time in seconds ( standard deviation) for all experiments. The last two rows use hours instead, except for random search.
(a) Results on the func3C benchmark ( categorical, continuous), averaged over runs.
(b) Results on the Rosenbrock10 benchmark ( integer, continuous), averaged over runs.
(c) Results on randomly generated MiVaBO synthetic benchmark ( integer, continuous), averaged over runs and over the different benchmarks.
(d) Results of on the Ackley53 benchmark ( binary, continuous), averaged over runs.
(e) Results on the Rosenbrock238 benchmark ( integer, continuous). CoCaBO was not evaluated for this benchmark due to the large computation time.
Figure 2: Maximisation of different benchmark functions with random search, HyperOpt, MVRSM and CoCaBO.

5.2.1 Func3C

This benchmark was taken from [16, Sec. 5.1]. It has categorical and continuous variables.

Figure 1(a) shows the results of iterations averaged over runs. We have managed to reproduce the results from [16, Fig. 6(b)] for both HyperOpt (also called TPE) and CoCaBO. As this benchmark has categorical variables and was one of CoCaBO’s benchmarks, we expect CoCaBO to perform best, which it does, though it uses more computation time than the other methods. MVRSM performs a bit worse than HyperOpt, but better than the reported results on SMAC [16, Fig. 6(b)], which obtained an objective value of around .

5.2.2 Rosenbrock10

The Rosenbrock function555Details available at https://www.sfu.ca/~ssurjano/optimization.html is a standard benchmark in continuous optimisation that can be scaled to any dimension. For any dimension, the function has its global minimum (maximum in the figures) in the point , where it achieves the value . This benchmark has a dimension of , but of the variables were adapted to integers in . The remaining continuous variables were limited to . The function was scaled with a factor , and uniform noise in

was added to every function evaluation. This problem is of the same scale as the problem of gradient boosting hyperparameter tuning 

[7, Sec. 4(a)].

Figure 1(b) shows the results of iterations averaged over runs. Though CoCaBO performs well, especially for a problem with no categorical but integer variables, it takes up more computational resources. MVRSM performs best on this function.

5.2.3 MiVaBO synthetic function

We also compare with one of the randomly generated synthetic test functions from [7, Appendix A.1] (Gaussian weights variant). This problem has variables of which integer and continuous. No bounds were reported so we set them to for the integer variables and for the continuous variables. We generated of these random functions and ran all algorithms times on each of them for iterations.

Figure 1(c) shows the average over all runs. MVRSM performs better than HyperOpt but due to the large variance the improvement is not significant, especially considering HyperOpt’s lower computation time. It seems CoCaBO, which was designed for categorical variables, has problems dealing with such a large number of integers.

5.2.4 Ackley53

The Ackley function666Details available at https://www.sfu.ca/~ssurjano/optimization.html is another standard benchmark that can be scaled to any dimension. The global optimum is located in the point , where it achieves the value . We chose a dimension of , but of the variables were adapted to binary variables in . The continuous variables were limited to . Uniform noise in  was added to each function evaluation. This problem is of the same scale as the problem of variational auto-encoder hyperparameter tuning after binarising the discrete hyperparameters [7].

See Figure 1(d) for the average over three runs of iterations. Not only does MVRSM achieve significantly better results than HyperOpt and CoCaBO for this problem, it is also faster than both. HyperOpt suffers from the limited interaction between variables in their surrogate model, and CoCaBO seems unable to efficiently explore such a large search space.

5.2.5 Rosenbrock238

As a final experiment, we look at a large scale Rosenbrock function with the first variables adapted to integers in , and continuous variables limited to . The function was scaled with a factor and we added uniform noise in . Due to the problem size we only performed run with iterations. This problem is of the same scale as the problem of feed-forward classification model hyperparameter tuning [3], except that the ratio between continuous and integer variables is chosen to be . We did not compare with CoCaBO for this run due to the large computation time.

We can see from the results in Figure 1(e) that MVRSM outperforms its competitors on this benchmark. This is surprising considering the scale of the problem is similar to that of one of HyperOpt’s own benchmarks, but the authors of HyperOpt themselves noted that their algorithm “…is conspicuously deficient in optimizing each hyperparameter independently of the others. It is almost certainly the case that the optimal values of some hyperparameters depend on settings of others. Algorithms such as SMAC (Hutter et al., 2011) that can represent such interactions might be significantly more effective optimizers…” [3]. MVRSM uses a surrogate model that can model the interaction between all variables. For the other competitors such as CoCaBO and MiVaBO, their evaluated benchmark problems did not even come close to this number of variables, probably due to the required computation time.

5.3 Discussion

We see that MVRSM outperforms the state-of-the-art on mixed-variable problems with a large number of variables (e.g. or more). We attribute this to the efficient surrogate model, which models interactions between all variables and which also does not require expensive optimisation procedures due to the guarantee that integer constraints are satisfied in local optima. For a small-scale problem with continuous and categorical variables, namely func3C, other methods seem to work better, but MVRSM outperforms random search. This indicates that it can still be used on problems that it was not designed for.

The figures in this section also showcase a significant drawback of most existing surrogate modelling algorithms, namely that they become slower over time. Both HyperOpt and CoCaBO suffer from this, although HyperOpt is still a relatively fast method. MVRSM and random search have a fixed computation time per iteration.

Furthermore, CoCaBO tunes its own hyperparameters every iterations, which costs even more computational resources as can be seen in the figures. In contrast, MVRSM has quite a low number of hyperparameters, and we choose them the same way in all reported experiments.

Though we could not compare with MiVaBO, the MiVaBO benchmarks were included in this section. Both MiVaBO and MVRSM outperform random search and HyperOpt on these benchmarks, but MVRSM does so in an efficient manner using only continuous optimisation in the surrogate model, where MiVaBO has to resort to more expensive optimisation procedures.

No comparison was made with SMAC [9], but this method seems to be slightly outperformed by HyperOpt for problems with mixed variables [7]. We also did not compare with the multi-objective methods from the related work section, as we did not find a way to make a fair comparison for single-objective problems, even though they were specifically developed for the mixed-variable setting. We expect MVRSM to outperform MiVaBO and multi-objective methods on single-objective domains, but further research is required to confirm and study this.

6 Conclusion and Future Work

We showed how Mixed-Variable ReLU-based Surrogate Modelling (MVRSM) solves three problems present in methods that can deal with mixed variables in expensive black-box optimisation. First, it solves the problem of slowing down over time due to a growing surrogate model. Second, it solves the problem of sub-optimality and inefficiency that may arise due to the need to satisfy integer constraints. Third, it solves the problem of model inaccuracies due to limited interaction between the mixed variables. MVRSM’s surrogate model, based on a linear combination of rectified linear units, avoids all of these problems by having a fixed number of basis functions that contain interaction between all variables, while also having the guarantee that any local optimum is located in points where the integer constraints are satisfied. This makes MVRSM both more accurate and more efficient than the state-of-the-art. MVRSM performs particularly well on large-scale benchmarks with mixed variables, with results shown for a problem with over variables.

For future work we will investigate the exploration part of the surrogate model, for example by applying techniques with more theoretical guarantees such as Thompson sampling, and we will also apply the method to real-world applications from engineering and computer science.

Acknowledgements

The authors thank Erik Daxberger for providing the code for generating one of MiVaBO’s synthetic test functions (called MiVaBO synthetic function in this paper).

Appendix A Details for generating mixed basis functions

In this section we show how to choose and in such a way that the mixed -functions are never completely outside the domain . We recommend to choose

to be a uniform distribution over

. This way, the term will not take on large values, which might cause numerical problems.

After sampling from , we look for two cornerpoints of the space . For every dimension , the -th element of corner points is determined by

(5)
(6)

Here, and are the lower and upper bounds of the -th variable respectively, so this gives

(7)

Now we calculate the distance from the hyperplane generated by

to these corner points, which can be done with the inner product:

(8)

By the way and are constructed and because , we now have . We choose equal to the uniform distribution over .

Next we prove that this choice of prevents the hyperplane from being completely outside the set .

Theorem 3.

Let be sampled from any continuous probability distribution and let be sampled from the uniform distribution over , with , as in (8). Let . Then, there exists a such that .

Proof.

Suppose that for all for which . Then from (7), at least one of the following inequalities holds:

(9)
(10)

Because , we have . Because is sampled from , from (8) we also have . This gives , which is in conflict with (9)-(10). By contradiction, there has to exist a with . ∎

Appendix B Details on the exploration step for integer variables

The exploration step for the integer variables consists of determining a random perturbation that is added to the solution. We determine according to Algorithm 2.

Domain , current solution
for  do
     
      Whether to increase or decrease , the -th element of
     
     while  do
         if  then
         else if  then
         else
              if  then
              else                         
               
Algorithm 2 Determining

References

  • [1] R. Baptista and M. Poloczek (2018) Bayesian optimization of combinatorial structures. In ICML, pp. 471–480. Cited by: §1, §3.
  • [2] T. Bartz-Beielstein and M. Zaefferer (2017) Model-based methods for continuous and discrete global optimization. Applied Soft Computing 55, pp. 154–167. Cited by: §1.
  • [3] J. Bergstra, D. Yamins, and D. Cox (2013) Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML - Volume 28, pp. I–115. Cited by: §1, §2, §3, §3, §5.2.5, §5.2.5, §5.2, §5.
  • [4] L. Bliek, H. R. G. W. Verstraete, M. Verhaegen, and S. Wahls (2018-01) Online optimization with costly and noisy measurements using random Fourier expansions.

    IEEE Transactions on Neural Networks and Learning Systems

    29 (1), pp. 167–182.
    External Links: ISSN 2162-237X Cited by: §2, §3, 3rd item, §4.1, §4.1, §4.1, §4.2.
  • [5] L. Bliek, M. Verhaegen, and S. Wahls (2017) Online function minimization with convex random ReLU expansions. In MLSP, pp. 1–6. Cited by: 3rd item, §4.1, §4.1.
  • [6] L. Bliek, S. Verwer, and M. de Weerdt (2019)

    Black-box combinatorial optimization using models with integer-valued minima

    .
    arXiv preprint arXiv:1911.08817. Cited by: §1, §3, §3, §3, 4th item, §4.1, §4.1, §4.1, §4.1, §4.1, §4.2, §4.2, §4, Definition 1, Theorem 1.
  • [7] E. Daxberger, A. Makarova, M. Turchetta, and A. Krause (2019) Mixed-variable Bayesian optimization. arXiv preprint arXiv:1907.01329. Cited by: §1, §3, 3rd item, §4.1, §4.1, §4.1, §4.1, §5.2.2, §5.2.3, §5.2.4, §5.2, §5.3, §5.
  • [8] E. C. Garrido-Merchán and D. Hernández-Lobato (2020) Dealing with categorical and integer-valued variables in Bayesian optimization with Gaussian processes. Neurocomputing 380, pp. 20–35. Cited by: §1, §1, §3, §3.
  • [9] F. Hutter, H. H. Hoos, and K. Leyton-Brown (2011) Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization, pp. 507–523. Cited by: §3, §3, §5.3.
  • [10] F. Hutter, L. Kotthoff, and J. Vanschoren (2019) Automated machine learning. Springer. Cited by: §1.
  • [11] A. Iyer, Y. Zhang, A. Prasad, S. Tao, Y. Wang, L. Schadler, L. C. Brinson, and W. Chen (2019) Data-centric mixed-variable Bayesian optimization for materials design. In ASME, Cited by: §1, §3.
  • [12] D. R. Jones, M. Schonlau, and W. J. Welch (1998) Efficient global optimization of expensive black-box functions. Journal of Global optimization 13 (4), pp. 455–492. Cited by: §1.
  • [13] J. Močkus (1975) On Bayesian methods for seeking the extremum. In Optimization techniques IFIP technical conference, pp. 400–404. Cited by: §1.
  • [14] J. Močkus (2012) Bayesian approach to global optimization: theory and applications. Vol. 37, Springer Science & Business Media. Cited by: §1, §2, §3, 2nd item.
  • [15] A. Rahimi and B. Recht (2008) Uniform approximation of functions with random bases. In Communication, Control, and Computing, 2008 46th Annual Allerton Conference on, pp. 555–561. Cited by: §4.1, §4.1.
  • [16] B. Ru, A. S. Alvi, V. Nguyen, M. A. Osborne, and S. J. Roberts (2019) Bayesian optimisation over multiple continuous and categorical inputs. arXiv preprint arXiv:1906.08878. Cited by: §1, §3, §3, 2nd item, §5.1, §5.2.1, §5.2.1, §5.2, §5.
  • [17] A. H. Sayed and T. Kailath (1998) Recursive least-squares adaptive filters. The Digital Signal Processing Handbook 21 (1). Cited by: §4.2.
  • [18] T. Ueno, T. D. Rhone, Z. Hou, T. Mizoguchi, and K. Tsuda (2016) COMBO: an efficient Bayesian optimization library for materials science. Materials discovery 4, pp. 18–21. Cited by: §1, §2, §3, §3, 3rd item, §4.1.
  • [19] S. Wright and J. Nocedal (1999) Numerical optimization. Springer Science 35, pp. 67–68. Cited by: §4.1, §4.2.
  • [20] K. Yang, K. van der Blom, T. Bäck, and M. Emmerich (2019) Towards single-and multiobjective bayesian global optimization for mixed integer problems. In Proceedings of the 14th International Global Optimization workshop, Vol. 2070, pp. 020044. Cited by: §1, §3.