## 1 Introduction

In many engineering problems, a key objective is to predict an unknown experimental surface over an input domain of interest. However, for complex physical experiments, one can encounter the unfortunate phenomenon of censoring, when the experimental response is missing or partially measured. Censoring arises from a variety of practical experimental constraints, including limits in a measurement device, safety considerations of the experimenter, and a fixed experimental time budget. Fig 1

provides an illustration: experimental censoring typically occurs when the response variable of interest is expensive or dangerous to measure. For predicting the experimental surface, censoring can result in significant loss of information, and therefore, poor predictive performance over the full domain

(Brooks, 1982). For example, suppose an engineer wishes to explore how pressure in a nuclear reactor changes under different control settings. Due to safety concerns, the experiment is forced to stop if the pressure hits a certain upper limit, leading to censored responses. To further complicate matters, the input region which results in censoring is typically unknown prior to experiments, and needs to be learned from data.Given the unavoidable and unknown nature of censoring in complex physical experiments, it is therefore of great interest to carefully design experimental runs. This ensures we maximize the predictive power of the fitted model with a limited budget of expensive experimental runs. To this end, we present in this work a new integrated censored mean-squared error (ICMSE) method, which adaptively learns the underlying censoring behavior, and then uses this information to select design points which minimize predictive uncertainty. Combined with a Gaussian process model (GP; Sacks et al., 1989) on the unknown experimental surface, our ICMSE method leverages the smoothness properties of the GP to maximize predictive accuracy over the full domain of interest. In this article, we will demonstrate the utility of the ICMSE method by solving two real-world problems.

### 1.1 3D-printed aortic valves for surgical planning

The first problem concerns the design of 3D-printed tissue-mimicking aortic valves for heart surgeries. With advances in additive manufacturing (Gibson et al., 2014), 3D-printed medical prototypes (Rengier et al., 2010) play an increasingly important role in pre-surgical studies (Qian et al., 2017). They are particularly helpful in complicated heart diseases, e.g., aortic stenosis, where 3D-printed aortic valves can be used to select the best surgical options with minimal post-surgical complications (Chen et al., 2018b). The printed aortic valve (see Fig 2(a)) contains a biomimetic substructure: an enhancement polymer (white) is embedded in a substrate polymer (clear); this is known as metamaterial (Wang et al., 2016) in the bioengineering literature. The goal is to understand how the stiffness of the metamaterials is affected by the geometry of the enhancement polymer (see Fig 2(b)).

For a given geometry, the stiffness can be evaluated by either a computer simulation (less accurate) or a physical experiment (more accurate). The latter, however, is very costly: we need to 3D print each metamaterial sample, then physically test its stiffness using a load cell. Furthermore, the measurement from physical experiments may be censored due to an inherent upper limit in the testing machine. This is shown in Fig 2(c): if the metamaterial sample is stiffer than the load cell (i.e., a spring), the experiment is forced to stop to prevent breakage of the load cell. One possible workaround is to use a stiffer load cell, however, it is oftentimes not a preferable option. A stiffer load cell with a broader measurement range can be very expensive; it may cost over a hundred times more than the standard load cells which are integrated in the machine. Here, the proposed ICMSE method can adaptively design experimental runs which maximize predictive power of the GP model under censoring. The fitted GP model can then be used to quickly predict the underlying response surface, thereby providing an efficient surrogate for expensive experiments. Our design method is particularly valuable in urgent surgical applications, where one can only perform a small number of runs before the actual surgery.

### 1.2 Thermal processing in wafer manufacturing

The second problem considers the design of the semiconductor wafer manufacturing process (Quirk and Serda, 2001; Jin et al., 2012). Wafer manufacturing involves the processing of silicon wafers in a series of refinement stages, to be used as circuit chips. Among these stages, thermal processing is one of the most important stage (Singh et al., 2000), since it facilitates the necessary chemical reactions and allows for surface oxidation. Fig 3(a) illustrates the typical thermal processing procedure: a laser beam (in orange) is moved back and forth over a rotating wafer. Here, industrial engineers wish to understand how different process parameters (see Fig 3(a)) affect the minimal wafer temperature after heating. The minimal temperature provides information on the completeness of the chemical reactions, and is therefore an important quality measurement in wafer manufacturing (Van Gurp and Palmen, 1998).

However, laser heating experiments are quite costly to conduct, involving high wafer material and operation costs. In industrial settings, the minimal wafer temperature (response variable of interest) is often subject to censoring, due to the nature of the measurement procedure. This is shown in Fig 3(b): the wafer temperature is typically measured by either an array of temperature sensors or a thermal camera, both of which have upper measurement limits (Feteira, 2009). While more sophisticated sensors exist, they are much more expensive and may lead to tedious do-overs of the expensive experiments. The proposed ICMSE method can be used to adaptively design experimental runs which maximize predictive power of the GP model under censoring. The fitted GP model (using these runs) can then be used to efficiently predict the experimental surface of the manufacturing process.

### 1.3 Literature

GP regression (or kriging, see Matheron, 1963) is widely used as a predictive model for expensive experiments (Sacks et al., 1989), and has been applied in many applications, including cosmology (Kaufman et al., 2011), rocket design (Mak et al., 2018), and healthcare (Chen et al., 2019). The key appeals of kriging are its flexible model structure, and its closed-form expressions for prediction and uncertainty quantification. Kriging is popular not only in the computer experiments literature (Sacks et al., 1989; Currin et al., 1991; Santner et al., 2013), but also for modeling noisy experiments (Ankenman et al., 2010), and for combining information from both computer and physical experiments (Kennedy and O’Hagan, 2001). We will adapt in this work a recent GP model (Cao et al., 2018) for prediction with censored data.

To our knowledge, there is no existing work on the experimental design problem for kriging under response censoring. Existing designs for kriging can be divided into two categories: space-filling and model-based designs. Space-filling designs aim to fill empty gaps in the input space; this includes minimax designs (Johnson et al., 1990), maximin designs (Morris and Mitchell, 1995), and maximum projection designs (Joseph et al., 2015). Model-based designs instead maximize an optimality criterion based on an assumed kriging model; this includes integrated mean-squared error designs (Sacks et al., 1989) and maximum entropy designs (Shewry and Wynn, 1987). Such designs can also be implemented sequentially in an adaptive manner, see Lam (2008); Xiong et al. (2013); Chen et al. (2017); Binois et al. (2019); Bect et al. (2019). The above methods, however, do not consider the possibility of censoring in experimentation. Since it is not known which inputs may lead to censoring a priori, an effective design scheme should be performed adaptively to learn and incorporate the underlying censoring behavior. The proposed ICMSE method achieves exactly this based on a GP model: it learns the underlying censoring behavior from the data, then sequentially minimizes predictive uncertainty under censoring via an adaptive closed-form design criterion. In doing so, we show that the adaptive ICMSE method can yield considerably improved predictive performance over existing design methods, in both motivating applications with censored experimental response.

### 1.4 Structure

Section 2 presents the proposed ICMSE method for designing only physical experiments (the “single-fidelity” setting). Section 3 extends the ICMSE framework with auxiliary computer experiment data (the “multi-fidelity” setting). Section 4 demonstrates the effectiveness of our method in the two motivating applications. Section 5 concludes the work.

## 2 Single-fidelity ICMSE design

We now present the ICMSE design method for the setting of only physical experiments (i.e., single-fidelity); a more elaborate multi-fidelity ICMSE method is discussed later in Section 3. We first review the GP model for censored data, and derive the new ICMSE design criterion. We then visualize this criterion via a 1D example, and provide some insights.

### 2.1 Modeling framework

We adopt the following model for physical experiments. Let

be a vector of

input variables (each normalized to ), and let be its latent response from the physical experiment prior to potential censoring (see Fig 1). We assume:(2.1) |

where is the mean of the latent response at input , and is the corresponding measurement error. We further suppose that follows a GP with mean

, variance

, and correlation function with parameters . This is denoted as:(2.2) |

The experimental noise

is assumed to be i.i.d. normally distributed.

For simplicity, we consider only the case of right-censoring below, i.e., censoring of the response only when it exceeds some known upper limit (this is the setting for both motivating problems). All equations and insights derived in the paper hold analogously for the general case of interval censoring, albeit with more cumbersome notation. Suppose, from experiments, responses are observed without censoring, and responses are right-censored at limit , where . The training set experimental data can then be written as the set , where is a vector of observed responses at inputs , is the latent response vector for inputs in censored regions prior to censoring, and is the vector of the right-censoring limit. Assuming known model parameters, a straightforward adaptation of the equations (11) and (12) in Cao et al. (2018) gives the following closed-form expressions for the conditional mean and variance of at new input :

(2.3) | ||||

(2.4) |

Here, , , is a one-vector of length , and is an identity matrix. Furthermore, is the expected response for the latent vector given the dataset , is its conditional variance, and . The computation of these quantities will be discussed later in Section 3.3. The conditional mean (2.3) is used to predict the mean experimental response at an untested input , and the conditional variance (2.4) is used to quantify predictive uncertainty.

### 2.2 Design criterion

Now, given data from experiments ( of which are observed exactly, of which are censored), we propose a new design method which accounts for the underlying censoring behavior. Let be a potential next input for experimentation, be its latent response prior to censoring, and be its corresponding observation after censoring, with denoting the indicator function. The proposed method chooses the next input as:

(2.7) |

The design criterion can be interpreted in several parts. First, the term quantifies the predictive variance (i.e., mean-squared error, MSE) of the mean response at an untested input , given both the training data and the potential observation . Intuitively, this is the quantity we wish to minimize for design, since we want to find which new input can minimize predictive certainty. Second, note that this MSE term cannot be used as a criterion, since it depends on the potential observation , which has yet to be observed. This problem can be resolved by taking the conditional expectation (more on this below). Finally, the integral over ensures the next design point yields low predictive uncertainty over the entire design space.

The proposed sequential design in (2.7) can be viewed as an extension of the sequential integrated mean-squared error (IMSE) design (Lam, 2008; Santner et al., 2013) for the censored response setting at hand. Assuming no censoring (), the sequential IMSE design chooses the next input by minimizing:

(2.8) |

Note that, in the uncensored setting, the MSE term in (2.8) does not depend on the potential observation , which allows the criterion to be easily computed in practice. However, in the censored setting at hand, not only does this MSE term depend on , but such an observation may not be directly observed due to censoring! The conditional expectation in the proposed criterion (2.7) addresses this by accounting for the possibility of censoring in .

Furthermore, one attractive feature of the ICMSE criterion (2.7) is that it will be adaptive to the experimental responses from data. This is because the criterion (2.7) inherently hinges on whether the potential observation is censored (i.e., ) or not (i.e., ), but this censoring behavior needs to be learned from experimental data. Viewed this way, our ICMSE criterion can be broken down into two steps: it (i) learns the underlying censoring behavior from experimental responses, then (ii) samples the next point which minimizes the average predictive uncertainty under censoring. We will show how our method adaptively incorporates the possibility of censoring for sequential design, in contrast to the existing IMSE method (2.8).

#### 2.2.1 No censoring in training data

To provide some guiding intuition, consider a simplified situation with no censoring in the training set, i.e., (censoring may still occur for the new ). In this case, the following proposition gives an explicit expression for the ICMSE criterion.

###### Proposition 1.

Suppose there is no censoring in training data, i.e., . Then our ICMSE criterion (2.7) has the explicit expression:

(2.9) | ||||

Here, , , , and follow from (2.5) and (2.6). and

are the probability density and cumulative distribution functions for the standard normal distribution.

In words, is the predictive mean at given data , and are the predictive variances at and , respectively, and is the posterior correlation between and . Note that the -dimensional integral in (2.9) can also be efficiently computed in practice; we provide more discussion later in Corollary 1. The proof of this proposition can be found in Appendix A.2 of the supplemental article.

To glean intuition from the criterion (2.9), we compare with the existing sequential IMSE criterion (2.8). Under no censoring in training data (i.e., ), (2.8) can be rewritten as:

(2.10) |

Comparing (2.10) with (2.9), we notice a key distinction in our new criterion: the presence of , where is the normalized right-censoring limit under the posterior distribution at . We call the censoring adjustment function. Fig 4 visualizes for different choices of . Consider first the case of large. From the figure, we see that as , in which case the proposed ICMSE criterion (2.9) reduces to the standard IMSE criterion (2.10). Intuitively, this makes sense: a large value of (i.e., a high right-censoring limit) means that a new observation at

has little posterior probability of being censored at

. In this case, the ICMSE criterion (which minimizes predictive variance under censoring) should then reduce to the IMSE criterion (which minimizes predictive variance under no censoring). Consider next the case of small. From the figure, we see that as , in which case the proposed criterion (2.9) reduces to the integral of . Again, this makes intuitive sense: a small value of (i.e., a low right-censoring limit) means a new observation at has high posterior probability of being censored. In this case, the ICMSE criterion reduces to the predictive variance of the testing point given only the first training data points, meaning a new design point at offers no reduction in predictive variance. Viewed this way, the proposed ICMSE criterion modifies the standard IMSE criterion by accounting for the underlying censoring behavior via the censoring adjustment function .Equation (2.9) also reveals an important trade-off for the proposed design under censoring. Consider first the standard IMSE criterion (2.10), which minimizes predictive uncertainty under no censoring. Since the first term does not depend on the new design point , this uncertainty minimization is achieved by maximizing the second term . This can be interpreted as the variance reduction from observing (Gramacy and Apley, 2015). Consider next the proposed ICMSE criterion (2.9), which maximizes the term . This can further be broken down into (i) the maximization of variance reduction term , and (ii) the maximization of the censoring adjustment function . Objective (i) is the same as for the standard IMSE criterion – it minimizes predictive uncertainty assuming no response censoring. Objective (ii), by maximizing the censoring adjustment function , aims to minimize the posterior probability of the new design point being censored. Putting both parts together, our ICMSE criterion (2.9) features an important trade-off: it aims to find a new design point which jointly minimizes predictive uncertainty (in the absence of censoring) and the posterior probability of being censored by the experiment. This trade-off will prove to be key to tackling the two motivating applications.

#### 2.2.2 Censoring in training data

We now consider the general case of censored training data . The following proposition gives an explicit expression for the ICMSE criterion.

###### Proposition 2.

Here, is the predictive variance at point conditional on the data . The full expression for matrix , while closed-form, is quite long and cumbersome; this expression is provided in Appendix A.3. The key computation in calculating is the evaluation of several orthant probabilities from a multivariate normal distribution, and we will provide further computation details in Section 3.3. The proof for this proposition can be found in Appendix A.3 of the supplemental article.

While this general ICMSE criterion (2.11) is more complex, its interpretation is quite similar to the earlier criterion – its integrand contains a posterior variance term conditional on data , and a variance reduction term from the potential observation . The matrix on the variance reduction term serves a similar purpose to the censoring adjustment function. A large value of (in a matrix sense) suggests a low posterior probability of censoring for a new point , whereas a small value suggests a high posterior probability of censoring. This again results in the important trade-off for sequential design under censoring: the proposed ICMSE criterion aims to find the next design point which not only (i) minimizes predictive uncertainty of the fitted model in the absence of censoring, but also (ii) minimizes the posterior probability that the resulting observation is censored. The latter is adaptively learned from the training data, and is not considered by the standard IMSE criterion.

### 2.3 A toy example

We illustrate the proposed ICMSE criterion using a toy 1-dimensional example. Suppose the physical experiment has the following mean response function:

(2.12) |

with measurement noise variance . Further suppose censoring occurs above an upper limit of . We first run an initial 10-run maximin design (Johnson et al., 1990), which results in eight observed points and two censored points. Fig 5(a) visualizes the mean function and the initial design points, along with the corresponding predictive mean and its uncertainty using equations (2.3) and (2.4). Here, we use the Gaussian correlation function for . The figure also shows the next design point obtained from the proposed ICMSE criterion (2.11).

Fig 5(b) shows how the next point is obtained by visualizing the proposed ICMSE criterion (in red). First, the optimal point avoids regions with high probability of censoring, by having high ICMSE values within censored regions (i.e., ). This is due to the presence of in (2.11), which accounts for the posterior probability of censoring given data. Second, minimizes the overall predictive uncertainty, by having low ICMSE values in regions away from existing experimental runs. This can be seen within the regions and , where the local minima of the ICMSE criterion are located between training points. To jointly achieve both, our ICMSE design chooses the new point at , which avoids the actual censored regions, and also away from observed points.

Fig 5(b) also shows two variations of the existing IMSE design (2.8), which practitioners may use for the censoring problem at hand. The first (“IMSE”) is the IMSE criterion for the uncensored kriging model (2.6) using only observed data (black line), and the second (“IMSE-censor”) is the IMSE criterion for censored kriging model (2.4) using the whole training set (blue dotted line). From Fig 5(b), we see that both the IMSE criterion and the IMSE-censor criterion would choose the next design point within , which results in a censored response from the experiment. Our ICMSE criterion, factoring in the possibility of censoring, suggests a new design point minimizes predictive uncertainty under censoring.

## 3 Multi-fidelity ICMSE design

Next, we extend the ICMSE design to the multi-fidelity setting, where auxiliary computer experiment data are available. We first present the multi-fidelity modeling framework, and extend the earlier ICMSE criterion to this setting. We then present an algorithmic framework for efficient implementation, and investigate its performance on a 1D toy example.

### 3.1 Modeling framework

Let denote the computer experiment output at input . We model as the GP model:

(3.1) |

Following Section 2.1, let denote the latent mean response for physical experiments at input . We assume that takes the form:

(3.2) |

where is the so-called discrepancy function, quantifying the difference between computer and physical experiments at input . Following Kennedy and O’Hagan (2001), we model this discrepancy using a zero-mean GP model:

(3.3) |

where is independent of . Here, physical experiments are observed with experimental noise as in Section 2.1, whereas computer experiments are observed without noise.

Suppose computer experiments and physical experiments ( experiments in total) are conducted at inputs , yielding data and

. Note that censoring occurs only in physical experiments, since computer experiments are conducted via numerical simulations. Assuming all model parameters are known (parameter estimation is discussed later in Section

3.3), the mean response at a new input has the following conditional mean and variance:(3.4) | |||

(3.5) |

where is the covariance vector, and is the covariance matrix. Here, is the expected response for latent vector given data , and is its conditional variance, with . While such equations appear quite involved, they are simply the multi-fidelity extensions of the earlier kriging equations (2.3) and (2.4). For simplicity, we have overloaded some notation from (2.3) and (2.4) here; the difference should be clear from context.

### 3.2 Multi-fidelity design criterion

Now, we extend the ICMSE design to the multi-fidelity setting. The interest is in designing physical experiments (which may be censored), given auxiliary computer experiment data (which are typically not censored). This scenario is encountered in many practical problems, particularly in designing physical experiments for validating computer codes.

Under the presented multi-fidelity model, the following proposition gives an explicit expression for the ICMSE design criterion.

###### Proposition 3.

The proof can be found in Appendix B.1. The following corollary gives a simplification of (3.6) under a product correlation structure.

###### Corollary 1.

Suppose and are product correlation functions:

(3.7) |

with . Then, the ICMSE criterion (3.6) can be further simplified as:

(3.8) |

where , and is an matrix with entry:

(3.9) |

The key simplification from Corollary 1 is that it reduces the -dimensional integral in the ICMSE criterion (3.6) to a product of 1-dimensional integrals, which can be more easily computed. Furthermore, if the 1-dimensional correlation functions are Gaussian, these integrals can be evaluated in closed-form, which yields a nice closed-form design criterion for ICMSE (see Appendix B.2 for details). Given the computational complexities of censored data, this simplification is necessary for efficient design optimization. Corollary 1 is motivated by the simplification of the IMSE criterion in Sacks et al. (1989). The proof can be found in Appendix B.2 of the supplementary article.

The interpretation of the multi-fidelity ICMSE criterion (3.6) is analogous to that of the single-fidelity ICMSE criterion (2.11). Similar to the censoring adjustment function, the matrix factors in the underlying censoring behavior learned from data, and is used to adjust the variance reduction term in the criterion. Viewed this way, the ICMSE criterion (3.6) provides the same design trade-off as before: the next design point should jointly (i) avoid censored regions by adaptively learning such regions from data at hand, and (ii) minimize predictive uncertainty from the GP model.

### 3.3 An adaptive algorithm for sequential design

We present next an adaptive algorithm ICMSE (Algorithm 1) for implementing our ICMSE design. This algorithm applies for both the single-fidelity setting in Section 2 and the multi-fidelity setting in Section 3. First, an initial -point design is set up for initial experimentation: physical experiments for the single-fidelity setting, and computer experiments for the multi-fidelity setting. In our implementation, we used the maximum projection (MaxPro) design proposed by Joseph et al. (2015), which provides good projection properties and thereby good GP predictive performance. Next, the following two steps are performed iteratively for adaptive sampling: (i) using observed data , the GP model parameters are estimated using maximum likelihood, (ii) the next design point is then obtained by minimizing the ICMSE criterion (equation (2.11) for single-fidelity, equation (3.6) for multi-fidelity), along with its corresponding response . This sequential procedure is repeated until a desired number of samples is obtained.

To optimize the ICMSE criterion, we use the Nelder-Mead method (Nelder and Mead, 1965) implemented in the R package nloptr (Ypma et al., 2014)

, in favor of standard gradient descent optimization methods. This is because, while the ICMSE criterion admits a closed-form expression under a product Gaussian correlation structure, its gradient is much more difficult to compute. The main computational bottleneck in optimization is evaluating moments of the truncated multivariate normal distribution for

(see equations (A.10) and (B.3) in the supplementary article). In our implementation, these moments are efficiently computed using the R package tmvtnorm (Wilhelm and Manjunath, 2010). Appendix C details further computational steps to speed-up design optimization.### 3.4 1D illustration with adaptive design algorithm

We now illustrate the proposed algorithm ICMSE on a toy multi-fidelity 1D example. Suppose the computer simulation and physical experiment follow the mean responses:

(3.10) | ||||

(3.11) |

with measurement variance . Further assume a right-censoring limit of . We begin with an initial -run maximum design for computer experiments. From this, we then perform a sequential -run design for physical experiments using Algorithm 1. Here, we use the product Gaussian correlation, which, using Equation 3.9 in Corollary 1, gives a closed-form ICMSE criterion. The proposed method is compared with the existing IMSE-censor method (see Section 2.3) and the sequential MaxPro method (Joseph, 2016), which provides a sequential implementation of the MaxPro design.

Fig 6(a)-(c) compare the proposed ICMSE method with the two existing methods. Here, the solid blue line marks the computer experiment response , the black line marks the mean response of the physical experiment , and the dotted blue line shows the fitted mean response with (3.4) using all 20 runs. Compared to the existing two methods, the proposed ICMSE design yields visually improved prediction of the true mean response , with the fitted response matching in both censored and uncensored regions. A comparison of MSE confirms this improvement: our ICMSE method has an MSE of , whereas the IMSE-censor and sequential MaxPro methods have an MSE of and , respectively. Note that the IMSE-censor method is terminated early after 13 runs; this is because many sequential design points are very close together, which causes numerical instability and expensive computation of the emulation formula.

One reason for this improvement is that the proposed ICMSE criterion captures the aforementioned trade-off in choosing points which jointly (i) avoid censored regions and (ii) minimize predictive uncertainty. For (i), note that only 1/20 = 5% of sequential runs are censored for ICMSE in this example, whereas 8/20 = 40% and 11/13 86% of sequential runs are censored for sequential MaxPro and IMSE-censor. This shows that the our method indeed adaptively learns the underlying censoring behavior of the black-box experiment from data, and avoids sampling in such regions. For (ii), we see from Fig 6 that the sequential runs from ICMSE are not only spaced out from existing points, but also concentrated near the boundary of the censored region. Intuitively, this minimizes predictive uncertainty by ensuring design points well-explore the input space while avoiding loss of information due to censoring.

## 4 Case studies

We now return to the two motivating applications. For the wafer manufacturing application (which has only one type of experiment), we use the single-fidelity ICMSE designs in Section 2. For the surgical planning application (which has both computer and physical experiments), we use the multi-fidelity ICMSE designs in Section 3.

### 4.1 Thermal processing in wafer manufacturing

Consider first the wafer manufacturing application in Section 1.2, where an engineer is interested in how certain process inputs affect the heating performance of a wafer chip. There are six input variables – the first two controlling wafer thickness and its rotation speed, and the next three controlling the heating laser (i.e., its moving speed, radius and power), and the last being heating time. The response of interest is the minimum temperature over the wafer, which provides an indication of the wafer’s quality after thermal processing. Standard industrial temperature sensors have a measurement limit of °C (Thermo Electric Company, 2010), and temperatures greater than this limit are censored in the experiment.

As mentioned earlier, certain physical experiments are not only costly (e.g., wafers and laser operation can be expensive), but also time-consuming to perform (e.g., each experiment requires a re-calibration of thermal sensors, as well as a warmup and cooldown of the laser beam). In order to compare the sequential performance of these methods over a large number of runs, we mimic the costly physical experiments^{1}^{1}1The surgical planning application in Section 4.2 performs actual physical experiments, but provides fewer sequential runs due to the expensive nature of such experiments.
with COMSOL Multiphysics simulations (Fig 7(a)), which provides a realistic representation of heat diffusion physics (Dickinson et al., 2014)

. Measurement noise is then added, following an i.i.d. zero-mean normal distribution with standard deviation

°C.The set-up is as follows. We first start with an -run initial experiment, then perform sequential runs. Note that the total number of runs is slightly more than the rule-of-thumb sample size of recommended by Loeppky et al. (2009) – this is to ensure good predictive accuracy under censoring. Similar to Section 3.4, the proposed ICMSE method is compared with the sequential MaxPro method. Since both IMSE and IMSE-censor methods lead to very poor predictive models under censoring and can be very time-consuming to perform, we remove these methods from our comparison. The fitted GP models from each design method are then tested on temperature data generated (without noise) on a 200-run Sobol’ sequence (Sobol’, 1967). Of these 200 test samples, 25 samples have minimum temperatures which exceed the censoring limit of °C, which suggests that roughly of the design space leads to censoring.

#### 4.1.1 Predictive performance

Fig 7(b) compares the root mean-squared error (RMSE) of the testing set for the sequential runs, using the two design methods. We see that, while both sequential methods provide a relatively steady improvement in predictive accuracy, the proposed ICMSE method gives noticeably improved predictive accuracy over sequential MaxPro. In particular, from the 45 sequential runs, our method achieves an RMSE reduction of roughly over the initial 30 runs, which is more than two times greater than the RMSE reduction of for sequential MaxPro. This can again be explained by the fact that our design method jointly avoids censoring and minimizes predictive uncertainty. The proposed ICMSE method uses the fitted GP model to identify regions with low probability of censoring, then targets design points within such regions to maximize information from experiments. We observe that our method yields no censored measurements, whereas sequential MaxPro yields five censored measurements (a censoring rate of ). Moreover, our method adaptively chooses points which minimize predictive uncertainty of the GP model under censoring. This is shown in Fig 7(b): while the RMSE for sequential MaxPro appears to stagnate, our method gives increasingly lower RMSE for larger sample size. Under censoring, the proposed ICMSE method provides noticeably improved predictive performance for wafer manufacturing, compared to existing design methods.

### 4.2 3D-printed aortic valves for surgical planning

Consider next the surgical planning application in Section 1.1, which uses state-of-the-art 3D printing technology to mimic biological tissues. Here, doctors are interested in predicting the stiffness of the printed organs with different metamaterial geometries. We will consider three design inputs , which parametrize a standard sinusoidal form of the substructure curve , with diameter (see Fig 2(b) for a visualization). This parametric form has been shown to provide effective tissue-mimicking performance in prior studies (Wang et al., 2016; Chen et al., 2018a). The response of interest is the modulus at strain level of 8%, which quantifies the stiffness at a similar load situation inside the human body (Wang et al., 2016).

We use the multi-fidelity ICMSE design framework in Section 3, since both computer and physical experiments can be performed. Computer simulations are performed with finite element analysis (Zienkiewicz et al., 1977), using the COMSOL Multiphysics. The computation time for one simulation run is around 30 minutes on 24 Intel Xeon E5-2650 2.20GHz processing cores. As for physical experiments, samples are first 3D-printed by the Connex 350 machine (Stratasys Ltd.), and then the stiffness is measured by a load cell (see Fig 2(c)) using uniaxial tensile tests (Wang et al., 2016). Note that physical experiments can be much more costly than computer simulations, requiring expensive material and printing costs, as well as several hours of an experimenter’s time per sample. Here, censoring is only present in physical experiments; this happens when the force measurement of the load cell exceeds the standard limit of . This corresponds to a modulus upper limit of .

The following design set-up is used. We start with an -run initial computer experiment design, and then perform sequential runs using physical experiments. The limited number of sequential runs is due to the urgent demand of the patients; in such cases, only one to two days of surgical planning can be afforded (Chen et al., 2018a). Since physical experiments require tedious 3D printing and a tensile test (around 1.5 hours per run), this means only a handful of runs can be performed in urgent cases. As before, we compare our design approach with the sequential MaxPro method. The fitted GP models from both methods are tested on the physical experiment data from a 20-run Sobol’ sequence. Among these 20 runs, five of them are censored due to the load cell limit; in such cases, we re-perform the experiment using a different testing machine with a wider measurement range. Note that this is typically not feasible in urgent surgical scenarios, since it requires even more time-consuming tests and higher material costs.

#### 4.2.1 Predictive performance

Fig 8 compares the RMSE over the eight sequential runs from the two design methods. We see that, while the sequential MaxPro shows some stagnation in terms of predictive error improvement, the proposed ICMSE design provides more noticeable improvements over the sequential procedure. More specifically, our method achieves an RMSE reduction of roughly over the initial fitted GP model (fitted using 25 computer experiment runs), which is around three times better than the RMSE reduction of for the sequential MaxPro method. This improvement can again be attributed to the key design trade-off. First, ICMSE adaptively identifies and avoids censored regions on the design space, using the fitted multi-fidelity model (3.4). Our ICMSE yields no censored measurements, whereas the sequential MaxPro yields three censored measurements (a censoring rate of ). Furthermore, the proposed method minimizes the predictive uncertainty of the fitted multi-fidelity model under censoring. To contrast, sequential MaxPro method instead selects sequential runs to minimize the MaxPro criterion of the concatenated computer and physical experiment designs. While this encourages physical experiment runs to be “space-filling” to the initial computer experiment runs, the resulting design is not adaptive to the underlying censoring mechanism, and can yield poor predictive performance, as shown in Fig 8.

We investigate next the predictive performance of both designs within censored regions. This is because valves in such regions (i.e., stiff) can be used to mimic older patients (Sicard et al., 2018). We divide the test set (20 runs in total) into two categories: observed runs (15 in total) and censored runs (five in total, with the uncensored responses obtained via additional experiments). Table 1 compares the RMSE of the two methods for both censored and uncensored test runs. For both methods, the RMSE for observed test runs are much smaller than that for censored test runs, which is as expected. For observed test runs, ICMSE performs noticeably better than sequential MaxPro, with lower RMSE. This is not surprising, since ICMSE has more uncensored sequential runs than sequential MaxPro. For censored test runs, ICMSE also performs slightly better than sequential MaxPro, with lower RMSE. One reason for this is that the ICMSE criterion encourages new runs near (but not within) censored regions of the fitted model (see Fig 6), in an effort to maximize information under censoring. Because of this adaptivity, our method achieves better predictive performance within the censored region of the design space, without putting any sequential runs in this region!

#### 4.2.2 Discrepancy modeling

In addition to improved predictive performance, our design method can also yield valuable insights on the discrepancy between computer simulation and reality. The learning of this discrepancy from data is important for several reasons: it allows doctors to (i) pinpoint where simulations may be unreliable, (ii) identify potential root causes for this discrepancy, and (iii) improve the simulation model to better mimic reality. In our modeling framework, this discrepancy can be estimated as:

(4.1) |

where is the predictor for the physical experiment mean, fitted using 25 initial computer experiment runs and eight physical experiment runs, and is the computer experiment model fitted using only the 25 initial runs.

Fig 9 shows the fitted discrepancy as a function of each pair of design inputs, with the third input fixed. These plots reveal several interesting insights. First, when the diameter is moderate (i.e., ), Fig 9(a) and (b) show that the discrepancy is quite small; however, when is small (i.e., ) or large (i.e., ), the discrepancy can be quite large. This is related to the limitations of finite element modeling. When diameter is small, the simulations can be inaccurate, since the mesh size would be relatively large compared to . When diameter is large, simulations can again be inaccurate, due to the violation of the perfect interface assumption between the two printed polymers. Second, from Fig 9, model discrepancy also appears to be largest when all design inputs are large (i.e., close to 1). This suggests that simulations can be unreliable, when the stiff material is both thick () and fluctuating (). Finally, the model discrepancy is mostly positive over the design domain, revealing smaller stiffness evaluation via simulation compared to physical evaluation. This may be caused by the hardening of 3D-printed samples due to exposure to natural light, as an aging property for the polymer family (e.g., see Liao et al., 1998). Therefore, the printed aortic valves should be stored in dark storage cells for surgical planning, to minimize exposure to light.

## 5 Conclusion

In this paper, we proposed a novel integrated censored mean-squared error (ICMSE) method for adaptive design of physical experiments under response censoring. This new design method iteratively performs the following two steps: it first learns the underlying censoring behavior of the expensive experiment from data, then selects the next design point yielding the greatest reduction in average predictive uncertainty under censoring. This can also be viewed as jointly minimizing the predictive uncertainty of the fitted model and the probability that the resulting observation is censored. We derived closed-form expressions for the new ICMSE design criterion in both the single-fidelity and multi-fidelity settings, and presented an adaptive design algorithm for efficient implementation. We then demonstrated the effectiveness of the proposed ICMSE design over existing methods in two real-world applications: 3D-printed aortic valves for surgical planning, and thermal processing in wafer manufacturing.

## References

- Ankenman et al. (2010) Ankenman, B., Nelson, B. L., and Staum, J. (2010). Stochastic kriging for simulation metamodeling. Operations Research, 58(2):371–382.
- Bect et al. (2019) Bect, J., Bachoc, F., Ginsbourger, D., et al. (2019). A supermartingale approach to Gaussian process based sequential design of experiments. Bernoulli, 25(4A):2883–2919.
- Binois et al. (2019) Binois, M., Huang, J., Gramacy, R. B., and Ludkovski, M. (2019). Replication or exploration? Sequential design for stochastic simulation experiments. Technometrics, 61(1):7–23.
- Brooks (1982) Brooks, R. (1982). On the loss of information through censoring. Biometrika, 69(1):137–144.
- Cao et al. (2018) Cao, F., Ba, S., Brenneman, W. A., and Joseph, V. R. (2018). Model calibration with censored data. Technometrics, 60(2):255–262.
- Chen et al. (2019) Chen, J., Mak, S., Joseph, V. R., and Zhang, C. (2019). Function-on-function kriging, with applications to 3D printing of aortic tissues. arXiv preprint arXiv:1910.01754.
- Chen et al. (2018a) Chen, J., Wang, K., Zhang, C., and Wang, B. (2018a). An efficient statistical approach to design 3D-printed metamaterials for mimicking mechanical properties of soft biological tissues. Additive Manufacturing, 24:341–352.
- Chen et al. (2018b) Chen, J., Xie, Y., Wang, K., Wang, Z. H., Lahoti, G., Zhang, C., Vannan, M. A., Wang, B., and Qian, Z. (2018b). Generative invertible networks (GIN): Pathophysiology-interpretable feature mapping and virtual patient generation. arXiv preprint arXiv:1808.04495.
- Chen et al. (2017) Chen, R.-B., Wang, W., and Wu, C. F. J. (2017). Sequential designs based on Bayesian uncertainty quantification in sparse representation surrogate modeling. Technometrics, 59(2):139–152.
- Currin et al. (1991) Currin, C., Mitchell, T., Morris, M., and Ylvisaker, D. (1991). Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. Journal of the American Statistical Association, 86(416):953–963.
- Dickinson et al. (2014) Dickinson, E. J., Ekström, H., and Fontes, E. (2014). COMSOL Multiphysics®: Finite element software for electrochemical analysis. a mini-review. Electrochemistry Communications, 40:71–74.
- Feteira (2009) Feteira, A. (2009). Negative temperature coefficient resistance (NTCR) ceramic thermistors: an industrial perspective. Journal of the American Ceramic Society, 92(5):967–983.
- Gibson et al. (2014) Gibson, I., Rosen, D. W., and Stucker, B. (2014). Additive Manufacturing Technologies, volume 17. Springer.
- Gramacy and Apley (2015) Gramacy, R. B. and Apley, D. W. (2015). Local Gaussian process approximation for large computer experiments. Journal of Computational and Graphical Statistics, 24(2):561–578.
- Jin et al. (2012) Jin, R., Chang, C.-J., and Shi, J. (2012). Sequential measurement strategy for wafer geometric profile estimation. IIE Transactions, 44(1):1–12.
- Johnson et al. (1990) Johnson, M. E., Moore, L. M., and Ylvisaker, D. (1990). Minimax and maximin distance designs. Journal of Statistical Planning and Inference, 26(2):131–148.
- Joseph (2016) Joseph, V. R. (2016). Rejoinder. Quality Engineering, 28(1):42–44.
- Joseph et al. (2015) Joseph, V. R., Gul, E., and Ba, S. (2015). Maximum projection designs for computer experiments. Biometrika, 102(2):371–380.
- Kaufman et al. (2011) Kaufman, C. G., Bingham, D., Habib, S., Heitmann, K., and Frieman, J. A. (2011). Efficient emulators of computer experiments using compactly supported correlation functions, with an application to cosmology. The Annals of Applied Statistics, 5(4):2470–2492.
- Kennedy and O’Hagan (2001) Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of computer models. Journal of the Royal Statistical Society: Series B, 63(3):425–464.
- Lam (2008) Lam, C. Q. (2008). Sequential Adaptive Designs in Computer Experiments for Response Surface Model Fit. PhD thesis, The Ohio State University.
- Liao et al. (1998) Liao, K., Schultesiz, C. R., Hunston, D. L., and Brinson, L. C. (1998). Long-term durability of fiber-reinforced polymer-matrix composite materials for infrastructure applications: a review. Journal of Advanced Materials, 30(4):3–40.
- Loeppky et al. (2009) Loeppky, J. L., Sacks, J., and Welch, W. J. (2009). Choosing the sample size of a computer experiment: A practical guide. Technometrics, 51(4):366–376.
- Mak et al. (2018) Mak, S., Sung, C.-L., Wang, X., Yeh, S. T., Chang, Y. H., Joseph, V. R., Yang, V., and Wu, C. F. J. (2018). An efficient surrogate model for emulation and physics extraction of large eddy simulations. Journal of the American Statistical Association, 113(524):1443–1456.
- Matheron (1963) Matheron, G. (1963). Principles of geostatistics. Economic Geology, 58(8):1246–1266.
- Morris and Mitchell (1995) Morris, M. D. and Mitchell, T. J. (1995). Exploratory designs for computational experiments. Journal of Statistical Planning and Inference, 43(3):381–402.
- Nelder and Mead (1965) Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4):308–313.
- Qian et al. (2017) Qian, Z., Wang, K., Liu, S., Zhou, X., Rajagopal, V., Meduri, C., Kauten, J. R., Chang, Y.-H., Wu, C., and Zhang, C. (2017). Quantitative prediction of paravalvular leak in transcatheter aortic valve replacement based on tissue-mimicking 3D printing. JACC: Cardiovascular Imaging, 10(7):719–731.
- Quirk and Serda (2001) Quirk, M. and Serda, J. (2001). Semiconductor Manufacturing Technology, volume 1. Prentice Hall Upper Saddle River, NJ.
- Rengier et al. (2010) Rengier, F., Mehndiratta, A., Von Tengg-Kobligk, H., Zechmann, C. M., Unterhinninghofen, R., Kauczor, H.-U., and Giesel, F. L. (2010). 3D printing based on imaging data: review of medical applications. International Journal of Computer Assisted Radiology and Surgery, 5(4):335–341.
- Sacks et al. (1989) Sacks, J., Welch, W. J., Mitchell, T. J., and Wynn, H. P. (1989). Design and analysis of computer experiments. Statistical Science, 4(4):409–423.
- Santner et al. (2013) Santner, T. J., Williams, B. J., and Notz, W. I. (2013). The Design and Analysis of Computer Experiments. Springer Science & Business Media.
- Shewry and Wynn (1987) Shewry, M. C. and Wynn, H. P. (1987). Maximum entropy sampling. Journal of Applied Statistics, 14(2):165–170.
- Sicard et al. (2018) Sicard, D., Haak, A. J., Choi, K. M., Craig, A. R., Fredenburgh, L. E., and Tschumperlin, D. J. (2018). Aging and anatomical variations in lung tissue stiffness. American Journal of Physiology-Lung Cellular and Molecular Physiology, 314(6):L946–L955.
- Singh et al. (2000) Singh, R., Fakhruddin, M., and Poole, K. (2000). Rapid photothermal processing as a semiconductor manufacturing technology for the 21st century. Applied Surface Science, 168(1-4):198–203.
- Sobol’ (1967) Sobol’, I. M. (1967). On the distribution of points in a cube and the approximate evaluation of integrals. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 7(4):784–802.
- Thermo Electric Company (2010) Thermo Electric Company (2010). Wafer sensors. http://www.te-direct.com/products/silicon-wafers/.
- Van Gurp and Palmen (1998) Van Gurp, M. and Palmen, J. (1998). Time-temperature superposition for polymeric blends. Rheology Bulletin, 67(1):5–8.
- Wang et al. (2016) Wang, K., Wu, C., Qian, Z., Zhang, C., Wang, B., and Vannan, M. A. (2016). Dual-material 3D printed metamaterials with tunable mechanical properties for patient-specific tissue-mimicking phantoms. Additive Manufacturing, 12:31–37.
- Wilhelm and Manjunath (2010) Wilhelm, S. and Manjunath, B. G. (2010). tmvtnorm: A package for the truncated multivariate normal distribution. R Journal, 2(1):25–29.
- Xiong et al. (2013) Xiong, S., Qian, P. Z., and Wu, C. F. J. (2013). Sequential design and analysis of high-accuracy and low-accuracy computer codes. Technometrics, 55(1):37–46.
- Ypma et al. (2014) Ypma, J., Borchers, H., and Eddelbuettel, D. (2014). nloptr: R interface to nlopt. R Journal, 1(4).
- Zienkiewicz et al. (1977) Zienkiewicz, O. C., Taylor, R. L., Zienkiewicz, O. C., and Taylor, R. L. (1977). The Finite Element Method, volume 36. McGraw-Hill London.