Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces

09/22/2021 ∙ by Samuel (Sam) Daulton, et al. ∙ Facebook 0

The ability to optimize multiple competing objective functions with high sample efficiency is imperative in many applied problems across science and industry. Multi-objective Bayesian optimization (BO) achieves strong empirical performance on such problems, but even with recent methodological advances, it has been restricted to simple, low-dimensional domains. Most existing BO methods exhibit poor performance on search spaces with more than a few dozen parameters. In this work we propose MORBO, a method for multi-objective Bayesian optimization over high-dimensional search spaces. MORBO performs local Bayesian optimization within multiple trust regions simultaneously, allowing it to explore and identify diverse solutions even when the objective functions are difficult to model globally. We show that MORBO significantly advances the state-of-the-art in sample-efficiency for several high-dimensional synthetic and real-world multi-objective problems, including a vehicle design problem with 222 parameters, demonstrating that MORBO is a practical approach for challenging and important problems that were previously out of reach for BO methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The challenge of solving complex optimization problems in a sample-efficient fashion while trading off multiple competing objectives is pervasive in many fields, including machine learning 

(Sener and Koltun, 2018), science (Gopakumar et al., 2018), and engineering (Marler and Arora, 2004; Mathern et al., 2021). For instance, Mazda recently proposed a vehicle design problem in which the goal is to optimize the widths of structural parts in order to minimize the total weight of three different vehicles while simultaneously maximizing the number of common gauge parts (Kohira et al., 2018). Additionally, this problem has black-box constraints that enforce important performance requirements such as collision safety. Evaluating a design requires either crash-testing a physical prototype or running computationally demanding simulations. In fact, the original problem underlying this benchmark was solved on what at the time was the world’s fastest supercomputer and took around , CPU years to compute (Oyama et al., 2017). Another example is designing optical components for AR/VR applications, which requires optimizing complex geometries described by a large number of parameters in order to achieve acceptable trade-offs between image quality and efficiency of the optical device. Evaluating a design involves either fabricating and measuring prototypes or running computationally intensive simulations. For such problems, sample-efficient optimization is crucial.

Bayesian optimization (BO) has emerged as an effective, general, and sample-efficient approach for “black-box” optimization (Jones et al., 1998). However, in its basic form, BO is subject to important limitations. In particular, (i) successful applications typically optimize functions over low-dimensional search spaces, usually with less than tunable parameters (Frazier, 2018), (ii) its high computational requirements prevent direct application in the large-sample regime that is often necessary for high-dimensional problems, and (iii) most methods focus on problems with a single objective and no black-box constraints.

While recent contributions have tackled these shortcomings, they have done so largely independently. Namely, while multi-objective extensions of BO (including, e.g., Knowles (2006); Paria et al. (2020); Ponweiser et al. (2008); Hernández-Lobato et al. (2015); Yang et al. (2019); Belakaria et al. (2019); Daulton et al. (2020)) take principled approaches to exploring the set of efficient trade-offs between outcomes, they generally do not scale well to problems with high-dimensional search spaces. On the other hand, many methods for high-dimensional BO (e.g., Wang et al. (2016); Munteanu et al. (2019); Letham et al. (2020); Kandasamy et al. (2015)), require strong assumptions on the (usually unknown) structure of the objective function. Without making such assumptions, trust region BO methods such as TuRBO (Eriksson et al., 2019) have shown to perform well in the high-dimensional setting when a large number of function evaluations are possible. However, these high-dimensional BO methods all focus on single-objective optimization. Consequently, many important problems that involve both a high-dimensional search space and multiple objectives are out of reach for existing methods (Gaudrie, 2019). In this paper, we close this gap by making BO applicable to challenging high-dimensional multi-objective problems.

In particular, our main contributions are as follows:

  • [itemsep=1pt,topsep=1pt,leftmargin=14pt]

  • We propose MORBO (“Multi-Objective trust Region Bayesian Optimization”), the first scalable algorithm for multi-objective BO that is practical for high-dimensional problems that require thousands of function evaluations.

  • MORBO uses local surrogate models to efficiently scale to high-dimensional input spaces and large evaluation budgets. This unlocks the use of multi-objective BO on problems with hundreds of tunable parameters, a setting where practitioners have previously had to fall back on alternative, much less sample-efficient methods such as NSGA-II (Deb et al., 2002).

  • MORBO performs local optimization in multiple trust regions to target diverse trade-offs in parallel using a collaborative, coordinated hypervolume-based acquisition criterion, which leads to well-distributed, high-quality Pareto frontiers.

  • We provide a comprehensive empirical evaluation, demonstrating that MORBO significantly outperforms other state-of-the-art methods on several challenging high-dimensional multi-objective optimization problems. The achieved improvements in sample efficiency facilitate order-of-magnitude savings in terms of time and/or resources.

The remainder of our work is structured as follows: In Section 2, we provide background on multi-objective and Bayesian optimization. In Section 3, we discuss related work. We detail our approach in Section 4 and present a comprehensive evaluation in Section 5. We conclude in Section 6.

2 Background

2.1 Multi-Objective Optimization

The goal in multi-objective optimization (MOO) is to maximize (without loss of generality) a vector-valued objective function

, where we assume with bounded and for . Usually, there is no single solution that simultaneously maximizes all objectives. Objective vectors are compared using Pareto domination: an objective vector Pareto-dominates another vector if for all and there exists at least one such that . The Pareto frontier (PF) of optimal trade-offs is the possibly infinite set of non-dominated objective vectors. The goal of a Pareto optimization algorithm is to identify a finite, approximate PF within a pre-specified budget of function evaluations.

Hypervolume (HV) is a common metric used to assess the quality of a PF. The HV indicator quantifies the hypervolume (e.g., for , the area) of the set of points dominated by intersected with a region of interest in objective space bounded below by a reference point (for a formal definition see, e.g., Auger et al. (2009)). The reference point is typically provided by the practitioner based on domain knowledge (Yang et al., 2019)

. Evolutionary algorithms (EAs) such as NSGA-II 

(Deb et al., 2002) are popular for solving MOO problems (Zitzler et al., 2000). While EAs often require low overhead, they generally suffer from high sample complexity, rendering them inefficient for expensive-to-evaluate black-box functions.

2.2 Bayesian Optimization

Bayesian optimization (BO) is a sample-efficient framework for sequentially optimizing black-box functions that consists of two key components: 1) a Bayesian surrogate model of the objective functions that provides a probability distribution over outcomes, and 2) an acquisition function that leverages the surrogate model to assign a utility value to a set (often a singleton) of new candidate points to be evaluated. A Gaussian process (GP) is typically used as the surrogate model because it is a flexible non-parametric model capable of representing a wide variety of functions and is known for its well-calibrated and analytic posterior uncertainty estimates 

(Rasmussen, 2004). The acquisition function is defined on the model’s predictive distribution and is responsible for balancing exploration and exploitation. In the multi-objective setting, popular acquisition functions include expected hypervolume improvement (Emmerich et al., 2006), expected improvement (Jones et al., 1998) of random scalarizations of the objectives (Knowles, 2006), and information-theoretic (entropy-based) methods (Hernandez-Lobato et al., 2016; Belakaria et al., 2019; Suzuki et al., 2020)

. Recently, Thompson sampling (TS) has also been shown to have strong empirical performance with randomly scalarized or hypervolume objectives 

(Paria et al., 2020; Bradford et al., 2018).

3 Related Work

3.1 High-Dimensional Bayesian Optimization

The GP models typically used as global surrogates in BO rely on distance-based covariance functions that often perform poorly in high-dimensional input spaces (Oh et al., 2018). This is because the size of the input space increases exponentially with the dimensionality, leading to larger distances and smaller covariances between points. Because of these modeling challenges, BO often prefers selecting points on the boundary of the domain as this is often where the model is the most uncertain. As a result, this often leads to over-exploration and poor optimization performance.

A popular approach to resolving this issue is to map the high-dimensional inputs to a low-dimensional space via a random embedding (Wang et al., 2016; Munteanu et al., 2019; Letham et al., 2020). Other methods rely on additive structure and assume that the objective function can be expressed as a sum of low-dimensional components (Kandasamy et al., 2015; Wang et al., 2018; Gardner et al., 2017; Mutny and Krause, 2018). However, both low-dimensional linear structure and additive structure are strong assumptions, and these methods often perform poorly if the assumed structure does not exist (Eriksson and Jankowiak, 2021). This is especially problematic when optimizing multiple objectives since all objectives will need to have the same assumed structure, which is unlikely in practice. Oh et al. (2018) propose the use of a cylindrical kernel which prefers selecting points from the interior of the domain. However, their BOCK method may perform poorly when the optimum is actually located on the boundary. LineBO (Kirschner et al., 2019) optimizes the acquisition function along one-dimensional lines which helps avoid over-exploration.

Eriksson et al. (2019) propose TuRBO, a popular and robust approach for high-dimensional BO that avoids over-exploration by performing optimization in local trust regions. This targets the optimization to focus on promising regions of the input space. TuRBO is designed for the large evaluation setting with large batches sizes and makes no strong assumption on the objective function. Large batch sizes are commonly used to minimize end-to-end optimization time and ensure high throughput. SCBO (Eriksson and Poloczek, 2021) extends TuRBO to handle black-box constraints, but this approach is still limited to single-objective optimization. Even though optimization is restricted to a local trust region, both TuRBO and SCBO fit GP models to the entire history of data collected by a single trust region, which may lead to slow GP inference. Furthermore, data is not shared across trust regions when using multiple trust regions, which can result in inefficient use of the evaluation budget.

3.2 Multi-Objective Bayesian Optimization

Although there have been many recent contributions to multi-objective BO (MOBO), relatively few methods consider the high-dimensional setting for which high parallelism with large batch sizes is crucial to achieve high throughput. DGEMO by Konakovic Lukovic et al. (2020)

uses a hypervolume-based objective with additional heuristics to encourage diversity in the input space while exploring the PF, which works well empirically. Parallel expected hypervolume improvement (

EHVI) (Daulton et al., 2020) has been shown to have strong empirical performance, but the computational complexity of EHVI scales exponentially with the batch size. NEHVI (Daulton et al., 2021) improves scalability with respect to the batch size, but just as for DGEMO and EHVI it works best in low-dimensional search spaces. TSEMO (Bradford et al., 2018) optimizes approximate GP function samples using NSGA-II and uses a hypervolume-based objective for selecting a batch of points from the NSGA-II population. ParEGO (Knowles, 2006) and TS-TCH (Paria et al., 2020) use random Chebyshev scalarizations with parallel expected improvement (Jones et al., 1998) and Thompson sampling, respectively. ParEGO has been extended to the batch setting in various ways including as: (i) MOEA/D-EGO (Zhang et al., 2010), an algorithm that optimizes multiple scalarizations in parallel using MOEA/D (Zhou et al., 2012), and (ii) ParEGO (Daulton et al., 2020), which uses composite Monte Carlo objectives with sequential greedy batch selection and different scalarizations for each candidate. Information-theoretic (entropy-based) methods such as PESMO (Hernández-Lobato et al., 2015), MESMO (Belakaria et al., 2019), PFES (Suzuki et al., 2020), and PPESMO (Garrido-Merchán and Hernández-Lobato, 2020) have also garnered recent interest, but, with the exception of PPESMO, are purely sequential.

All of the above methods typically rely on global GP models, which is problematic in two ways: (i) as the dimensionality of the search space grows, the boundary issue discussed in Section 3.1 becomes more prominent; (ii) a single global GP model scales poorly with the number of function evaluations. As a result, most of these methods have only been evaluated on low-dimensional problems, typically ; Konakovic Lukovic et al. (2020) and Bradford et al. (2018) consider a maximum of and , respectively. The largest problem we found in the literature has  (Paria et al., 2020). Non-Bayesian methods such as MOPLS111At the time of writing, no code is publicly available. (Akhtar and Shoemaker, 2019)

use deterministic surrogate models, e.g., radial basis function (RBF) interpolants, which handle noisy observations poorly 

(Fasshauer and Zhang, 2006).

4 High-dimensional Bayesian optimization with multiple objectives

We now present MORBO, a novel method for high-dimensional multi-objective Bayesian optimization using trust regions. Since methods such as ParEGO achieve strong performance in the low-dimensional setting using a random Chebyshev scalarization of the objectives, it is reasonable to expect the same of a straightforward extension of TuRBO using this approach. However, we find that this strategy performs quite poorly; Figure 2 illustrates this on the D DTLZ2 function. One of the reasons for this is that the trust regions in TuRBO are independent, which leads to an inefficient use of the sample budget if trust regions overlap. Furthermore, using a random Chebyshev scalarization does not directly target specific parts of the PF and may lead to large parts of the PF being left unexplored, as Figure 2 illustrates. To overcome these issues, MORBO introduces data-sharing between the trust regions and employs an acquisition function based on hypervolume improvement, which allows the different trust regions to contribute to different parts of the PF to collaboratively maximize the hypervolume dominated by the PF. This is particularly appealing in settings where the PF may be disjoint or may require exploring very different parts of the search space. While it may be possible to improve the performance of TuRBO using more sophisticated scalarization approaches, the ablation study in Figure 6 shows that the use of an acquisition function based on hypervolume improvement is a key component of MORBO. For the remainder of this section, we describe the key components of MORBO which is also summarized in Algorithm LABEL:algo.

4.1 Trust Regions

To mitigate the issue of over-exploration in high-dimensional spaces, we restrict optimization to local trust regions (trust regions), as proposed by Eriksson et al. (2019). A trust region is defined by a center point and an edge-length , which specifies a hyperrectangular region within the domain. Each trust region maintains success and failure counters that record the number of consecutive samples generated from the trust region that improved or failed to improve (respectively) the hypervolume dominated by the current global PF . If the success counter exceeds a predetermined threshold , the trust region length is increased to and the success counter is set to zero. Similarly, after consecutive failures the trust region length is set to and the failure counter is set to zero. Finally, if the length drops below a minimum edge length , the trust region is terminated and a new trust region is initialized.

Figure 1: Objective values achieved on a D DTLZ2 function for , evaluations with batch size with trust regions. Plot points encode the search behavior, with circles indicating initial points and other shapes and colors corresponding to different trust regions. The red line indicates the final Pareto front achieved with respect to the reference point . We use the same initialization and for both methods. (Left) A straightforward extension of TuRBO using a Chebyshev scalarization makes slow progress and does not explore the trade-offs between the objectives well. (Right) MORBO is more sample-efficient and is able to explore the entirety of the PF. In particular, the trust regions explore different parts of the PF and the resulting coverage is much better compared to the straightforward extension of TuRBO.
Figure 2: A visualization of our batch selection using HVI with . The red points represent the current PF. Blue, orange, and green points show the function values for the selected points under the next posterior sample. To select the th point, the HVI of each candidate is evaluated jointly with the red, blue, orange, and green points. The color indicates the incremental hypervolume.

4.2 Local Modeling

Most BO methods use a single global GP with a stationary Matérn- or squared exponential (SE) kernel using automatic relevance determination (ARD) fitted to all observations collected so far. While the use of a global model is necessary for most BO methods, MORBO only requires each model to be accurate within the corresponding trust region. This motivates the use of local modeling where we only include the observations contained within a local hypercube with edge length . The motivation for using the observations from a slightly larger hypercube is to improve the model close to the boundary of the trust region.

TuRBO and SCBO use local modeling in the sense that each trust region only uses observations collected by that trust region, but each trust region still relies on a global GP fitted to the entire history of that trust region. This may lead to poor performance since stationary kernels assume that the lengthscales are constant across the input space; an assumption that is often violated in practice and can result in poor performance (Snoek et al., 2014). By excluding observations far from the trust region, MORBO can significantly reduce the computational cost since exact GP fitting scales cubically with the number of data points which allows our method to scale to problems involving a large number of total observations.

4.3 Center Selection

In (constrained) single-objective optimization, previous work centers the trust region at the best (feasible) observed point. However, in the multi-objective setting, typically there is no single best solution. Assuming that observations are noise-free, MORBO selects the center to be the feasible point on the PF with maximum hypervolume contribution (HVC(Beume et al., 2007; Loshchilov et al., 2011). If there is no feasible point we choose the point with the smallest total constraint violation. Given a reference point, the HVC of a point on the PF is the reduction in hypervolume if that point were to be removed; that is, the hypervolume contribution of a point is its exclusive contribution to the PF. Centering a trust region at the point with maximal hypervolume contribution collected by that trust region promotes coverage across the PF, as points in crowded regions will have lower contribution than points adjacent to gaps in the PF. MORBO uses multiple trust regions to explore different regions of the PF in parallel in order to increase the overall coverage, as illustrated in Figure 2. In particular, we rank the points according to their hypervolume contributions and select trust region centers based on their ranked hypervolume contributions in a sequential greedy fashion, excluding points that have already been selected as the center for another trust region. Besides encouraging diversity in the exploration of the PF, using multiple trust regions results in multiple GP models, each fitted on a smaller amount of data, which greatly improves scalability.

4.4 Batch Selection

Maximizing hypervolume (improvement) has been shown to produce high-quality and diverse PFs (Emmerich et al., 2006). Given a reference point, the hypervolume improvement (HVI) from a set of points is the increase in HV when adding these points to the previously selected points. Expected HVI (EHVI) is a popular acquisition function that integrates HVI over the GP posterior. However, maximizing EHVI requires numerical optimization, which involves evaluating the GP posterior many times within in the optimizer’s inner loop. This becomes prohibitively slow as the number of objectives (and constraints) and in-sample data points increases. To allow scalability to large batch sizes , we instead use Thompson sampling (TS) to draw posterior samples from the GP and optimize HVI under each realization on a discrete set of candidates.

We select points for the next batch in a sequential greedy fashion and condition upon the previously selected points in the batch by computing the HVI with respect to the current PF . In particular, to select the point from a set of candidate points we draw a sample from the joint posterior over , which yields the realization . We select the point as the candidate point that maximizes the HVI jointly with the realizations of the previously selected points as shown in Figure 2. Conditioning on the previously selected points and computing the HVI under a sample from the joint posterior over the previously selected points and the discrete set of candidates leads to more diverse batch selection compared to selecting each point independently. This batch selection strategy also allows us to straightforwardly implement fully asynchronous optimization, where evaluations are dispatched to different “workers” (e.g. individual machines in a simulation cluster) and new candidates are generated whenever there is capacity in the worker pool. In the asynchronous setting, success/failure counters and trust regions can be updated after every observations are received, and any intermediate observations can immediately be used to update the local models.

algocf[!ht]    

5 Experiments

We evaluate MORBO on an extensive suite of benchmarks. In Section 5.1 we show that using multiple trust regions allows MORBO to discover a disjoint PF and in Section 5.2 we consider two low-dimensional problems. In Section 5.3-5.5, we consider three challenging real-world problems: a D trajectory planning problem, a D problem of designing optical systems for AR/VR applications, and a D automotive design problem with black-box constraints. We compare MORBO to NSGA-II, NEHVI, ParEGO, approximate Thompson sampling using random Fourier features (Rahimi and Recht, 2007) with Chebyshev scalarizations (TS-TCH), TSEMO, DGEMO, MOEA/D-EGO, and Sobol quasi-random search. MORBO is implemented in BoTorch (Balandat et al., 2020)

; the code will be available as open-source software upon publication. We run all methods for

replications and initialize them using the same random initial points for each replication. We use the same hyperparamters for MORBO on all problems, see the supplementary materials for more details. All experiments were run on a Tesla V100 SXM2 GPU (16GB RAM).

5.1 Discovering disjoint Pareto frontiers

We first illustrate that using multiple trust regions allows MORBO to efficiently discover disconnected PFs that require exploring different parts of the domain. Figure 4 (right) shows that the trust regions collaborate and are able to discover the disconnected PF on the D MW7 function with constraints.

5.2 Low-dimensional problems

We consider two low-dimensional problems to allow for a comparison with existing BO baselines. The first problem we consider is a D vehicle safety design problem in which we tune thicknesses of various components of an automobile frame to optimize proxy metrics for maximizing fuel efficiency, minimizing passenger trauma in a full-frontal collision, and maximizing vehicle durability. The second problem is a D welded beam design problem, where the goal is to minimize the cost of the beam and the deflection of the beam under the applied load (Deb and Sundar, 2006). The design variables describe the thickness/length of the welds and the height/width of the beam. In addition, there are additional black-box constraints that need to be satisfied for a design to be valid.

Figure 4 presents results for both problems. While MORBO is not designed for such simple, low-dimensional problems, it is still competitive with other baselines such as TS-TCH and ParEGO on the vehicle design problem, though it cannot quite match the performance of NEHVI and TSEMO.222DGEMO is not included on this problem as it consistently crashed due to an error deep in the low-level code for the graph-cutting algorithm. The results on the welded beam problem illustrate the efficient constraint handling of MORBO.333DGEMO, TSEMO, MOEA/D-EGO, and TS-TCH are excluded as they do not consider black-box constraints. On both problems, we observe that NSGA-II struggles to keep up, performing barely better (vehicle safety) or even worse (welded beam) than quasi-random Sobol exploration.

Figure 3: MORBO can discover the disjoint PF on the D MW7 function with constraints. Different colors correspond to points discovered by different trust regions.
Figure 4: (Left) NEHVI performs the best on the D vehicle design problem with objectives. (Right) MORBO outperforms the other methods on D welded beam problem with constraints.

5.3 Trajectory planning

We consider a trajectory planning problem similar to the rover trajectory planning problem considered in (Wang et al., 2018). As in the original problem, the goal is to find a trajectory that maximizes the reward when integrated over the domain. The trajectory is determined by fitting a B-spline to design points in the D plane, which yields a D optimization problem. In this experiment, we constrain the trajectory to begin at the pre-specified starting location, but we do not require it to end at the desired target location. In addition to maximizing the reward of the trajectory, we also minimize the distance from the end of the trajectory to the intended target location. Intuitively, these two objectives are expected to be competing because reaching the exact end location may require passing through areas of large cost, which may result in a lower reward. The results from , evaluations using batch size and initial points are presented in Figure 5, which shows that MORBO performs the best and that other BO methods struggle to compete with NSGA-II.

Figure 5: (Left) MORBO outperforms other methods on the D trajectory planning problem. (Middle) Illustration of the results on the D Optical design problem. NSGA-II performs better than the BO baselines but is not competitive with MORBO. (Right) MORBO shows compelling performance on the D Mazda vehicle design problem with black-box constraints.

5.4 Optical Design problem

We consider the problem of designing an optical system for an augmented reality (AR) see-through display444The code for the optical design problem is proprietary, but we aim to open source a surrogate model for reproducibility by the time of publication. This optimization task has

parameters describing the geometry and surface morphology of multiple optical elements in the display stack. Several objectives are of interest in this problem, including display efficiency and display quality. Each evaluation of these metrics requires a computationally intensive physics simulation of the optical system that takes several hours to run. In order to obtain precise estimates of the optimization performance at reasonable computational cost, we conduct our evaluation on a neural network surrogate model of the optical system rather than on the actual physics simulator. In this benchmark, the task is to explore the Pareto frontier between display efficiency and display quality (both objectives are normalized w.r.t. the reference point). We consider

initial points, batch size , and a total of , evaluations. As , evaluations and dimensions are out of reach for the other BO baselines, we run NEHVI, ParEGO, TS-TCH, TSEMO, and MOEA/D-EGO for , evaluations and DGEMO for , evaluations. Figure 5 shows that MORBO achieves substantial improvements in sample efficiency compared to NSGA-II. We observe that none of the other BO baselines are competitive with NSGA-II.

5.5 Mazda vehicle design problem

We consider the -car Mazda benchmark problem (Kohira et al., 2018). This challenging multi-objective optimization variable involves tuning decision variables that represent the thickness of different structural parts. The goal is to minimize the total vehicle mass of the three vehicles (Mazda CX-, Mazda , and Mazda ) as well as maximizing the number of parts shared across vehicles. Additionally, there are black-box output constraints (evaluated jointly with the two objectives) that enforce that designs meet performance requirements such as collision safety standards. This problem is, to the best our knowledge, the largest multi-objective optimization problem considered by any BO method and requires fitting as many as GP models to the objectives and constraints. The original problem underlying the Mazda benchmark was solved on what at the time was the world’s fastest supercomputer and took around , CPU years to compute (Oyama et al., 2017), requiring a substantial amount of energy.

We follow the suggestions by Kohira et al. (2018) and use the reference point and optimize the normalized objectives and corresponding to the total mass and number of common gauge parts, respectively. Additionally, an initial feasible point is provided with objective values and , corresponding to an initial hypervolume of for the normalized objectives. This initial solution is given to all algorithms. We consider an evaluation budget of , evaluations using batches of size and initial points.

Figure 5 demonstrates that MORBO clearly outperforms the other methods. Sobol did not find a single feasible point during any of the runs, illustrating the challenge of finding solutions that satisfy the constraints. While NSGA-II made progress from the initial feasible solution, it is not competitive with MORBO.

5.6 Ablation study

Finally, we study the sensitivity of MORBO with respect to the two most important hyperparameters: the number of trust regions

and the failure tolerance . Using multiple trust regions allows exploring different parts of the search space that potentially contribute to different parts of the Pareto frontier. The failure tolerance controls how quickly each trust region shrinks: A large leads to slow shrinkage and potentially too much exploration, while a small may cause each trust region to shrink too quickly and not collect enough data. MORBO uses trust regions and by default, similar to what is used by Eriksson et al. (2019).

In this experiment, we consider varying the number of trust regions and failure tolerance for a D DTLZ2 problem, the D trajectory planning problem, and the D optical design problem. We also investigate the affect of using HVI as the acquisition function instead of relying on random Chebyshev scalarizations. As shown in Figure 6, using more than one trust region is important for the performance of MORBO. This is due to the efficient data-sharing in MORBO where the different trust regions can use any data that falls within the trust region, whereas in TuRBO, each trust region is an independent optimization process without any data-sharing. We note that even though a larger failure tolerance leads to slightly better performance on the optical design problem, the default settings of MORBO perform well across all problems.

Figure 6: We investigate the sensitivity of MORBO with respect to the two most important hyperparameters. The default settings of MORBO are trust regions and a failure streak of . We observe that using trust regions performs significantly better than using a single trust region on all problems.

6 Discussion

We proposed MORBO, an algorithm for multi-objective BO over high-dimensional search spaces. By using an improved trust region approach with local modeling and hypervolume-based exploration, MORBO scales gracefully to high-dimensional problems and high-throughput settings. In comprehensive empirical evaluations, we showed that MORBO not only improves performance, but allows to effectively tackle important real-world problems that were previously out of reach for existing BO methods. In doing so, MORBO achieves substantial improvements in sample efficiency compared to existing state-of-the-art methods such as evolutionary algorithms. Given that, due to the lack of scalable alternatives to date, methods such as NSGA-II have been the method of choice for many practitioners, we expect MORBO to unlock significant savings in terms of time and resources across the many disciplines that require solving these challenging optimization problems.

There are, however, some limitations to our method. While MORBO can handle a large number of black-box constraints, using hypervolume as an optimization target means that computational complexity scales poorly in the number of objectives for which we aim to explore the Pareto frontier. Approximate HV computations can alleviate this issue, but may come at the cost of reducing optimization quality. Further, MORBO is optimized for the large-batch high-throughput setting and other methods may be more suitable for and achieve better performance on simpler lower-dimensional problems with small evaluation budgets.

References

  • Akhtar and Shoemaker [2019] T. Akhtar and C. A. Shoemaker. Efficient multi-objective optimization through population-based parallel surrogate search, 2019.
  • Auger et al. [2009] A. Auger, J. Bader, D. Brockhoff, and E. Zitzler. Theory of the hypervolume indicator: Optimal mu-distributions and the choice of the reference point. In

    Proceedings of the Tenth ACM SIGEVO Workshop on Foundations of Genetic Algorithms

    , FOGA ’09, page 87–102, New York, NY, USA, 2009. Association for Computing Machinery.
  • Balandat et al. [2020] M. Balandat, B. Karrer, D. R. Jiang, S. Daulton, B. Letham, A. G. Wilson, and E. Bakshy. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In Advances in Neural Information Processing Systems 33, 2020.
  • Belakaria et al. [2019] S. Belakaria, A. Deshwal, and J. R. Doppa. Max-value entropy search for multi-objective Bayesian optimization. In Advances in Neural Information Processing Systems 32, 2019.
  • Beume et al. [2007] N. Beume, B. Naujoks, and M. Emmerich. Sms-emoa: Multiobjective selection based on dominated hypervolume. European Journal of Operational Research, 181(3):1653–1669, 2007. ISSN 0377-2217. doi: https://doi.org/10.1016/j.ejor.2006.08.008. URL https://www.sciencedirect.com/science/article/pii/S0377221706005443.
  • Bradford et al. [2018] E. Bradford, A. M. Schweidtmann, and A. Lapkin. Efficient multiobjective optimization employing Gaussian processes, spectral sampling and a genetic algorithm. J. of Global Optimization, 71(2):407–438, June 2018. ISSN 0925-5001. doi: 10.1007/s10898-018-0609-2. URL https://doi.org/10.1007/s10898-018-0609-2.
  • Daulton et al. [2020] S. Daulton, M. Balandat, and E. Bakshy. Differentiable expected hypervolume improvement for parallel multi-objective Bayesian optimization. In Advances in Neural Information Processing Systems 33, NeurIPS, 2020.
  • Daulton et al. [2021] S. Daulton, M. Balandat, and E. Bakshy. Parallel Bayesian optimization of multiple noisy objectives with expected hypervolume improvement, 2021.
  • Deb and Sundar [2006] K. Deb and J. Sundar. Reference point based multi-objective optimization using evolutionary algorithms. In

    Proceedings of the 8th annual conference on Genetic and evolutionary computation

    , pages 635–642, 2006.
  • Deb et al. [2002] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6(2):182–197, 2002.
  • Deb et al. [2002] K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. Scalable multi-objective optimization test problems. volume 1, pages 825–830, 06 2002. ISBN 0-7803-7282-4. doi: 10.1109/CEC.2002.1007032.
  • Emmerich et al. [2006] M. T. M. Emmerich, K. C. Giannakoglou, and B. Naujoks. Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels. IEEE Transactions on Evolutionary Computation, 10(4):421–439, 2006.
  • Eriksson and Jankowiak [2021] D. Eriksson and M. Jankowiak. High-dimensional Bayesian optimization with sparse axis-aligned subspaces. arXiv preprint arXiv:2103.00349, 2021.
  • Eriksson and Poloczek [2021] D. Eriksson and M. Poloczek. Scalable constrained Bayesian optimization. In

    International Conference on Artificial Intelligence and Statistics

    , pages 730–738. PMLR, 2021.
  • Eriksson et al. [2019] D. Eriksson, M. Pearce, J. R. Gardner, R. Turner, and M. Poloczek. Scalable global optimization via local Bayesian optimization. In Advances in Neural Information Processing Systems 32, pages 5497–5508, 2019.
  • Fasshauer and Zhang [2006] G. E. Fasshauer and J. G. Zhang. Scattered data approximation of noisy data via iterated moving least squares. Proceedings of Curve and Surface Fitting: Avignon, pages 150–159, 2006.
  • Frazier [2018] P. I. Frazier. A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
  • Gardner et al. [2017] J. R. Gardner, C. Guo, K. Q. Weinberger, R. Garnett, and R. B. Grosse. Discovering and exploiting additive structure for Bayesian optimization. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1311–1319. PMLR, 2017.
  • Garrido-Merchán and Hernández-Lobato [2020] E. C. Garrido-Merchán and D. Hernández-Lobato. Parallel predictive entropy search for multi-objective Bayesian optimization with constraints, 2020.
  • Gaudrie [2019] D. Gaudrie. High-Dimensional Bayesian Multi-Objective Optimization. PhD thesis, Université de Lyon, 2019.
  • Gopakumar et al. [2018] A. M. Gopakumar, P. V. Balachandran, D. Xue, J. E. Gubernatis, and T. Lookman. Multi-objective optimization for materials discovery via adaptive design. Scientific Reports, 8(1):3738, 2018.
  • Hernandez-Lobato et al. [2016] D. Hernandez-Lobato, J. Hernandez-Lobato, A. Shah, and R. Adams. Predictive entropy search for multi-objective Bayesian optimization. In Proceedings of The 33rd International Conference on Machine Learning, pages 1492–1501, 2016.
  • Hernández-Lobato et al. [2015] D. Hernández-Lobato, J. M. Hernández-Lobato, A. Shah, and R. P. Adams. Predictive entropy search for multi-objective Bayesian optimization, 2015.
  • Jones et al. [1998] D. R. Jones, M. Schonlau, and W. J. Welch. Efficient global optimization of expensive black-box functions. Journal of Global Optimization, 13:455–492, 1998.
  • Kandasamy et al. [2015] K. Kandasamy, J. Schneider, and B. P’oczos. High dimensional Bayesian optimisation and bandits via additive models. In Proceedings of the 32nd International Conference on Machine Learning, ICML, 2015.
  • Kirschner et al. [2019] J. Kirschner, M. Mutny, N. Hiller, R. Ischebeck, and A. Krause. Adaptive and safe Bayesian optimization in high dimensions via one-dimensional subspaces. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 3429–3438. PMLR, 2019.
  • Knowles [2006] J. Knowles. Parego: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Transactions on Evolutionary Computation, 10(1):50–66, 2006.
  • Kohira et al. [2018] T. Kohira, H. Kemmotsu, O. Akira, and T. Tatsukawa. Proposal of benchmark problem based on real-world car structure design optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pages 183–184, 2018.
  • Konakovic Lukovic et al. [2020] M. Konakovic Lukovic, Y. Tian, and W. Matusik. Diversity-guided multi-objective Bayesian optimization with batch evaluations. Advances in Neural Information Processing Systems, 33, 2020.
  • Letham et al. [2020] B. Letham, R. Calandra, A. Rai, and E. Bakshy. Re-examining linear embeddings for high-dimensional Bayesian optimization. arXiv preprint arXiv: 2001.11659, 2020.
  • Loshchilov et al. [2011] I. Loshchilov, M. Schoenauer, and M. Sebag. Not all parents are equal for mo-cma-es. In R. H. C. Takahashi, K. Deb, E. F. Wanner, and S. Greco, editors, Evolutionary Multi-Criterion Optimization, pages 31–45, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
  • Ma and Wang [2019] Z. Ma and Y. Wang. Evolutionary constrained multiobjective optimization: Test suite construction and performance comparisons. IEEE Transactions on Evolutionary Computation, 23(6):972–986, 2019. doi: 10.1109/TEVC.2019.2896967.
  • Marler and Arora [2004] R. T. Marler and J. S. Arora. Survey of multi-objective optimization methods for engineering. Structural and multidisciplinary optimization, 26(6):369–395, 2004.
  • Mathern et al. [2021] A. Mathern, O. Steinholtz, A. Sjöberg, M. Önnheim, K. Ek, R. Rempling, E. Gustavsson, and M. Jirstrand. Multi-objective constrained Bayesian optimization for structural design. Structural and Multidisciplinary Optimization, 63, 02 2021. doi: 10.1007/s00158-020-02720-2.
  • Munteanu et al. [2019] A. Munteanu, A. Nayebi, and M. Poloczek. A framework for Bayesian optimization in embedded subspaces. In Proceedings of the 36th International Conference on Machine Learning, (ICML), 2019.
  • Mutny and Krause [2018] M. Mutny and A. Krause. Efficient high dimensional Bayesian optimization with additivity and quadrature Fourier features. In Advances in Neural Information Processing Systems 31, pages 9019–9030, 2018.
  • Oh et al. [2018] C. Oh, E. Gavves, and M. Welling. BOCK: Bayesian optimization with cylindrical kernels. In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3865–3874. PMLR, 2018.
  • Oyama et al. [2017] A. Oyama, T. Kohira, H. Kemmotsu, T. Tatsukawa, and T. Watanabe. Simultaneous structure design optimization of multiple car models using the k computer. In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1–4, 2017.
  • Paria et al. [2020] B. Paria, K. Kandasamy, and B. Póczos. A flexible framework for multi-objective Bayesian optimization using random scalarizations. In R. P. Adams and V. Gogate, editors, Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, volume 115 of Proceedings of Machine Learning Research, pages 766–776. PMLR, 22–25 Jul 2020.
  • Pleiss et al. [2020] G. Pleiss, M. Jankowiak, D. Eriksson, A. Damle, and J. R. Gardner. Fast matrix square roots with applications to gaussian processes and bayesian optimization, 2020.
  • Ponweiser et al. [2008] W. Ponweiser, T. Wagner, D. Biermann, and M. Vincze. Multiobjective optimization on a limited budget of evaluations using model-assisted s-metric selection. In G. Rudolph, T. Jansen, N. Beume, S. Lucas, and C. Poloni, editors, Parallel Problem Solving from Nature – PPSN X, pages 784–794, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
  • Rahimi and Recht [2007] A. Rahimi and B. Recht. Random features for large-scale kernel machines. In Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, page 1177–1184, 2007.
  • Rasmussen [2004] C. E. Rasmussen. Gaussian Processes in Machine Learning, pages 63–71. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
  • Ray and Liew [2002] T. Ray and K. Liew. A swarm metaphor for multiobjective design optimization. Engineering Optimization, 34(2):141–153, 2002. doi: 10.1080/03052150210915.
  • Regis and Shoemaker [2013] R. G. Regis and C. A. Shoemaker. Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Engineering Optimization, 45(5):529–555, 2013.
  • Sener and Koltun [2018] O. Sener and V. Koltun. Multi-task learning as multi-objective optimization. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  • Snoek et al. [2014] J. Snoek, K. Swersky, R. Zemel, and R. P. Adams. Input warping for Bayesian optimization of non-stationary functions. In Proceedings of the 31st International Conference on Machine Learning, ICML’14, 2014.
  • Suzuki et al. [2020] S. Suzuki, S. Takeno, T. Tamura, K. Shitara, and M. Karasuyama. Multi-objective Bayesian optimization using pareto-frontier entropy. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 9279–9288. PMLR, 13–18 Jul 2020. URL http://proceedings.mlr.press/v119/suzuki20a.html.
  • Tanabe and Ishibuchi [2020] R. Tanabe and H. Ishibuchi. An easy-to-use real-world multi-objective optimization problem suite. Applied Soft Computing, 89:106078, 2020. ISSN 1568-4946. doi: https://doi.org/10.1016/j.asoc.2020.106078.
  • Wang et al. [2016] Z. Wang, F. Hutter, M. Zoghi, D. Matheson, and N. De Freitas. Bayesian optimization in a billion dimensions via random embeddings. J. Artif. Int. Res., 55(1):361–387, Jan. 2016.
  • Wang et al. [2018] Z. Wang, C. Gehring, P. Kohli, and S. Jegelka. Batched large-scale Bayesian optimization in high-dimensional spaces. In International Conference on Artificial Intelligence and Statistics, volume 84 of Proceedings of Machine Learning Research, pages 745–754. PMLR, 2018.
  • Yang et al. [2019] K. Yang, M. Emmerich, A. Deutz, and T. Bäck. Multi-objective Bayesian global optimization using expected hypervolume improvement gradient. Swarm and Evolutionary Computation, 44:945 – 956, 2019. ISSN 2210-6502. doi: https://doi.org/10.1016/j.swevo.2018.10.007.
  • Zhang et al. [2010] Q. Zhang, W. Liu, E. Tsang, and B. Virginas. Expensive multiobjective optimization by moea/d with Gaussian process model. IEEE Transactions on Evolutionary Computation, 14(3):456–474, 2010. doi: 10.1109/TEVC.2009.2033671.
  • Zhou et al. [2012] A. Zhou, Q. Zhang, and G. Zhang.

    A multiobjective evolutionary algorithm based on decomposition and probability model.

    In 2012 IEEE Congress on Evolutionary Computation, pages 1–8, 2012. doi: 10.1109/CEC.2012.6252954.
  • Zitzler et al. [2000] E. Zitzler, K. Deb, and L. Thiele. Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary computation, 8(2):173–195, 2000.

Appendix A Details on Batch Selection

As discussed in Section 3.1, over-exploration can be an issue in high-dimensional BO because there is typically high uncertainty on the boundary of the search space, which often results in over-exploration. This is particularly problematic when using continuous optimization routines to find the maximizer of the acquisition function since the global optimum of the acquisition function will often be on the boundary, see Oh et al. (2018) for a discussion on the “boundary issue” in BO. While the use of trust regions alleviates this issue, this boundary issue can still be problematic, especially when the trust regions are large.

To mitigate this issue of over-exploration, we use a discrete set of candidates by perturbing randomly sampled Pareto optimal points within a trust region by replacing only a small subset of the dimensions with quasi-random values from a scrambled Sobol sequence. This is similar to the approach used by Eriksson and Poloczek (2021) which proved crucial for good performance on high-dimensional problems. In addition, we also decrease the perturbation probability  as the optimization progresses, which Regis and Shoemaker (2013) found to improve optimization performance. The perturbation probability  is set according to the following schedule:

where is the number of initial points, is the total evaluation budget, , , and .

Given a discrete set of candidates, MORBO draws samples from the joint posterior over the function values for the candidates in this set and the previously selected candidates in the current batch, and selects the candidate with maximum average HVI across the joint samples. This procedure is repeated to build the entire batch.555In the case that the candidate point does not satisfy that satisfy all outcome constraints under the sampled GP function, the acquisition value is set to be the negative constraint violation. Using standard Cholesky-based approaches, this exact posterior sampling has complexity that is cubic with respect to the number of test points, so exact sampling is only feasible for modestly sized discrete sets.

a.1 RFFs for fast posterior sampling

While asymptotically faster approximations than exact sampling exist; see Pleiss et al. (2020) for a comprehensive review, these methods still limit the candidate set to be of modest size (albeit larger), which may not do an adequate job of covering a the entire input space. Among the alternatives to exact posterior sampling, we consider using Random Fourier Features (RFFs) (Rahimi and Recht, 2007), which provide a deterministic approximation of a GP function sample as a linear combination of Fourier basis functions. This approach has empirically been shown to perform well with Thompson sampling for multi-objective optimization (Bradford et al., 2018). The RFF samples are cheap to evaluate and which enables using much larger discrete sets of candidates since the joint posterior over the discrete set does not need to be computed. Furthermore, the RFF samples are differentiable with respect to the new candidate , and HVI is differentiable with respect to using cached box decompositions (Daulton et al., 2021), so we can use second-order gradient optimization methods to maximize HVI under the RFF samples.

We tried to optimize these RFF samples using a gradient based optimizer, but found that many parameters ended up on the boundary, which led to over-exploration and poor BO performance. In an attempt to address this over-exploration issue, we instead consider continuous optimization over axis-aligned subspaces which is a continuous analogue of the discrete perturbation procedure described in the previous section. Specifically, we generate a discrete set of candidates points by perturbing random subsets of dimensions according to , as in the exact sampling case. Then, we take the top initial points with the maximum HVI under the RFF sample. For each of these best initial points we optimize only over the perturbed dimensions using a gradient based optimizer.

Figure 7 shows that the RFF approximation with continuous optimization over axis-aligned subspaces works well on for on the DTLZ2 function, but the performance degrades as the dimensionality increases. Thus, the performance of MORBO can likely be improved on low-dimensional problems by using continuous optimization; we used exact sampling on a discrete set for all experiments in the paper for consistency. We also see that as the dimensionality increases, using RFFs over a discrete set achieves better performance than using continuous optimization. In high-dimensional search spaces, we find that exact posterior sampling over a discrete set achieves better performance than using RFFs, which we hypothesize is due to the quality of the RFF approximations degrading in higher dimensions. Indeed, as shown in Figure 7, optimization performance using RFFs improves if we use more basis functions on higher dimensional problems ( works better than ).

Figure 7: Optimization performance under various Thompson sampling approaches on DTLZ2 test problems with objectives and various input dimensions . TS-Exact uses exact samples from the joint posterior over a discrete set of points. TS-RFF- and TS-RFF- evaluate approximate sample paths (RFFs) over a discrete set of points with and basis functions, respectively. Cont-RFF- and Cont-RFF- use L-BFGS-B with exact gradients to optimize RFF draws along a subset of the dimensions (see in Appendix A.1 for details) using and basis functions, respectively.

Appendix B Details on Experiments

b.1 Algorithmic details

For MORBO, we use trust regions, which we observed was a robust choice in Figure 6. Following (Eriksson et al., 2019), we set , , and use a minimum length of . We use discrete points for optimizing HVI for the vehicle safety and welded beam problems, discrete points on the trajectory planning and optical design problems, and discrete points on the Mazda problem. Note that while the number of discrete points should ideally be chosen as large as possible, it offers a way to control the computational overhead of MORBO; we used a smaller value for the Mazda problem due to the fact that we need to sample from a total of GP models in each trust region as there are black-box constraints. We use an independent GP with a a constant mean function and a Matérn- kernel with automatic relevance detection (ARD) and fit the GP hyperparameters by maximizing the marginal log-likelihood (the same model is used for all BO baselines).

When fitting a model for MORBO, we include the data within a hypercube around the trust region center with edgelength . In the case that there are less than points within that region, we include the closest points to the trust region center for model fitting. The success streak tolerance to be infinity, which prevents the trust region from expanding; we find this leads to good optimization performance when data is shared across trust regions. For NEHVI and ParEGO, we use quasi-MC samples and for TS-TCH, we optimize RFFs with Fourier basis functions. All three methods are optimized using L-BFGS-B with random restarts. For DGEMO, TSEMO, and MOEA/D-EGO, we use the default settings in the open-source implementation at https://github.com/yunshengtian/DGEMO/tree/master. Similarly, we use the default settings for NSGA-II the Platypus package (https://github.com/Project-Platypus/Platypus). We encode the reference point as a black-box constraint to provide this information to NSGA-II.

b.2 Synthetic problems

The reference points for all problems are given in Table 1. We multiply the objectives (and reference points) for all synthetic problems by and maximize the resulting objectives.

Problem Reference Point
DTLZ2 [, ]
Vehicle Safety [, , ]
Welded Beam [, ]
MW7 [, ]
Table 1: The reference points for each synthetic benchmark problem.

Dtlz2:

We consider the -objective DTLZ2 problem with various input dimensions . This is a standard test problem from the multi-objective optimization literature. Mathematical formulas for the objectives are given in Deb et al. (2002).

Mw7:

For a second test problem from the multi-objective optimization literature, we consider a MW7 problem with objectives, constraints, and parameters. See Ma and Wang (2019) for details.

Welded Beam:

The welded beam problem (Ray and Liew, 2002) is a structural design problem with input parameters controlling the size of the beam where the goal is to minimize objectives (cost and end deflection) subject to constraints. More details are given in Tanabe and Ishibuchi (2020).

Vehicle Safety:

The vehicle safety problem is a -objective problem with parameters controlling the widths of different components of the vehicle’s frame. The goal is to minimize mass (which is correlated with fuel economy), toe-box intrusion (vehicle damage), and acceleration in a full-frontal collision (passenger injury). See Tanabe and Ishibuchi (2020) for additional details.

b.3 Trajectory planning

For the trajectory planning, we consider a trajectory specified by design points that starts at the pre-specified starting location. Given the design points, we fit a B-spline with interpolation and integrate over this B-spline to compute the final reward using the same domain as in Wang et al. (2018). Rather than directly optimizing the locations of the design points, we optimize the difference (step) between two consecutive design points, each one constrained to be in the domain . We use a reference point of [, ], which means that we want a reward larger than and a distance that is no more than from the target location [, ]. Since we maximize both objectives, we optimize the distance metric and the corresponding reference point value by .

b.4 Optical design

The surrogate model was constructed from a dataset of ,

optical designs and resulting display images to provide an accurate representation of the real problem. The surrogate model is a neural network with a convolutional autoencoder architecture. The model was trained using

, training examples and minimizing MSE (averaged over images, pixels, and RGB color channels) on a validation set of , examples. A total of , examples were held-out for final evaluation.

b.5 Mazda vehicle design problem

We limit the number of points used for model fitting to only include the , points closest to the trust region center in case there are more than , in the larger hypercube with side length . Still, for each iteration MORBO using trust regions fits a total of GP models, a scale far out of reach for any other multi-objective BO method.

Appendix C Additional Results

c.1 Pareto Frontiers

We show the Pareto frontiers for the welded beam, trajectory planning, optical design, and Mazda problems in Figure 8. In each column we show the Pareto frontiers corresponding to the worst, median, and best replications according to the final hypervolume. We exclude the vehicle design problem as it has three objectives which makes the final Pareto frontiers challenging to visualize.

Figure 8 shows that even on the low-dimensional D welded beam problem, MORBO is able to achieve much better coverage than the baseline methods. MORBO also explores the trade-offs better than other methods on the trajectory planning problem, where the best run by MORBO found trajectories with high reward that ended up being close to the final target location. In particular, other methods struggle to identify trajectories with large rewards while MORBO consistently find trajectories with rewards close to , which is the maximum possible reward. On both the optical design and Mazda problems, the Pareto frontiers found by MORBO better explore the trade-offs between the objectives compared to NSGA-II and Sobol. We note that MORBO generally achieves good coverage of the Pareto frontier for both problems. For the optical design problem, we exclude the partial results found by running the other baselines for k-k evaluations and only show the methods the ran for the full k evaluations. For the Mazda problem we show the Pareto frontiers of the true objectives and not the normalized objectives that are described in Section 5.5. MORBO is able to significantly decrease the vehicle mass at the cost of using a fewer number of common parts, a trade-off that NSGA-II fails to explore. It is worth noting that the number of common parts objective is integer-valued and that exploiting this additional information may unlock even better optimization performance of MORBO.

Figure 8: In each column we show the Pareto frontiers for the worst, median, and best replications according to the final hypervolume. We indicate whether an objective is minimized/maximized by , respectively. The reference point is illustrated as a black star. The use of multiple trust regions allows MORBO to consistently achieve good coverage of the Pareto frontier, in addition to large hypervolumes.

c.2 Wall Time Comparisons

While candidate generation time is often a secondary concern in classic BO applications, where evaluating the black box function often takes orders of magnitude longer, existing methods using a single global model and standard acquisition function optimization approaches can become the bottleneck in high-throughput asynchronous evaluation settings that are common with high-dimensional problems.

Table 2 provides a comparison of the wall time for generating a batch of candidates for the different methods on the different benchmark problems. We observe that the candidate generation for MORBO is two orders of magnitudes faster than for other methods such as ParEGO and NEHVI on the trajectory planning problem where all methods ran for the full , evaluations. While candidate generation is fast for TSEMO, the model fitting causes a significant overhead with almost an hour being spent on model fitting after collecting , evaluations on the trajectory planning problem. This is significantly longer than for MORBO, which only requires around seconds for the model fitting due to the use of local modeling. Thus, the use of local modeling is a crucial component of MORBO that limits the computational overhead from the model fitting. In particular, the model fitting for MORBO on the optical design problem rarely exceeds seconds while methods such as DGEMO and TSEMO that rely on global modeling require almost an hour for the model fitting after only collecting , points. Additionally, while MORBO needs to fit as many as GP models on the Mazda problem due to the black-box constraints and the use of trust regions, the total time for model fitting still rarely exceeds minutes while this problem is completely out of reach for the other BO methods that rely on global modeling.

Problem Welded Beam Vehicle Safety Rover Optical Design Mazda
Batch Size () () () () ()
MORBO 1.3 (0.0) 9.6 (0.7) 23.4 (0.4) 9.8 (0.1) 188.16 (1.72)
ParEGO 14.5 (0.3) 1.3 (0.0) 213.4 (11.2) 241.9 (14.9) N/A
TS-TCH N/A 0.6 (0.0) 31.3 (1.1) 48.1 (1.2) N/A
NEHVI 30.4 (0.4) 9.1 (0.1) 997.5 (62.8) 211.27 (6.66) N/A
NSGA-II 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0) 0.0 (0.0)
DGEMO N/A N/A 697.1 (52.5) 2278.7 (199.8) N/A
TSEMO N/A 3.4 (0.1) 3.3 (0.0) 4.6 (0.1) N/A
MOEA/D-EGO N/A 44.3 (0.3) 71.1 (4.3) 97.5 (6.7) N/A
Table 2:

Batch selection wall time (excluding model fitting) in seconds. The mean and two standard errors of the mean are reported. MORBO,

ParEGO, TS-TCH, and NEHVI were run on a Tesla V100 SXM2 GPU (16GB RAM), while DGEMO, TSEMO, MOEA/D-EGO and NSGA-II were run on 2x Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz. For Welded Beam and Vehicle Safety, we ran NSGA-II with in order to avoid a singleton population. For DGEMO, TSEMO and MOEA/D-EGO only , evaluations were performed on Rover (Trajectory Planning) and only , evaluations were performed on Optical Design, so the generation times are shorter than if the full , evaluations were performed.