# Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection

Gaussian stochastic process (GaSP) has been widely used in two fundamental problems in uncertainty quantification, namely the emulation and calibration of mathematical models. Some objective priors, such as the reference prior, are studied in the context of emulating (approximating) computationally expensive mathematical models. In this work, we introduce a new class of priors, called the jointly robust prior, for both the emulation and calibration. This prior is designed to maintain various advantages from the reference prior. In emulation, the jointly robust prior has an appropriate tail decay rate as the reference prior, and is computationally simpler than the reference prior in parameter estimation. Moreover, the marginal posterior mode estimation with the jointly robust prior can separate the influential and inert inputs in mathematical models, while the reference prior does not have this property. We establish the posterior propriety for a large class of priors in calibration, including the reference prior and jointly robust prior in general scenarios, but the jointly robust prior is preferred because the calibrated mathematical model typically predicts the reality well. The jointly robust prior is used as the default prior in two new R packages, called "RobustGaSP" and "RobustCalibration", available on CRAN for emulation and calibration, respectively.

## Authors

• 9 publications
• ### RobustGaSP: Robust Gaussian Stochastic Process Emulation in R

Gaussian stochastic process (GaSP) emulation is a powerful tool for appr...
01/05/2018 ∙ by Mengyang Gu, et al. ∙ 0

• ### A theoretical framework of the scaled Gaussian stochastic process in prediction and calibration

The Gaussian stochastic process (GaSP) is a useful technique for predict...
07/10/2018 ∙ by Mengyang Gu, et al. ∙ 0

• ### Objective priors for divergence-based robust estimation

Objective priors for outlier-robust Bayesian estimation based on diverge...
04/29/2020 ∙ by Tomoyuki Nakagawa, et al. ∙ 0

• ### Learning Approximately Objective Priors

Informative Bayesian priors are often difficult to elicit, and when this...
04/04/2017 ∙ by Eric Nalisnick, et al. ∙ 0

• ### A Metropolis-Hastings algorithm for posterior measures with self-decomposable priors

We introduce a new class of Metropolis-Hastings algorithms for sampling ...
04/20/2018 ∙ by Bamdad Hosseini, et al. ∙ 0

• ### Calibration of imperfect mathematical models by multiple sources of data with measurement bias

Model calibration involves using experimental or field data to estimate ...
10/27/2018 ∙ by Mengyang Gu, et al. ∙ 0

• ### Reference-free Calibration in Sensor Networks

Sensor calibration is one of the fundamental challenges in large-scale I...
05/30/2018 ∙ by Raj Thilak Rajan, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A central part of the modern uncertainty quantification (UQ) is to describe the natural and social phenomena by a system of mathematical models or equations. Some mathematical models are implemented as computer code in an effort to reproduce the behavior of complicated processes in science and engineering. These mathematical models are called computer models or simulators, which map a set of inputs such as initial conditions and model parameters to a real valued output.

Many computer models are prohibitively slow, and it is thus vital to develop a fast statistical surrogate to emulate (approximate) the outcomes of the computer models, based on the runs at a set of pre-specified design inputs. This problem is often referred as the emulation problem. Another fundamental problem in UQ is called the inverse problem or calibration, where the field data are used to estimate the unobservable calibration parameters in the mathematical model. As the mathematical model can be imprecise to describe the reality, it is usual to address the misspecification by a discrepancy function. Emulation and calibration are the main focus in many recent studies in UQ (Bayarri et al., 2007; Higdon et al., 2008; Bayarri et al., 2009; Liu et al., 2009; Conti and O’Hagan, 2010).

A Gaussian stochastic process (GaSP) is prevalent for emulating expensive computer model (Sacks et al., 1989; Bastos and O’Hagan, 2009)

for several reasons. First of all, many computer models are deterministic, or close to being deterministic, and thus the emulator is often required to be an interpolator, meaning that the predictions by the emulator are equal to the outputs at the design inputs. The GaSP emulator is an interpolator, and can easily be adapted to emulate the stochastic computer model outputs by adding a noise. Second, the number of runs of the computer model used to construct a GaSP emulator is often relatively small, which is roughly

by the “folklore” notion, where is the dimension of the inputs. Third, the GaSP emulator has an internal assessment of the accuracy in prediction, which allows the uncertainty to propagate through the emulator. The GaSP is also widely used to model the discrepancy function in calibration (Kennedy and O’Hagan, 2001; Bayarri et al., 2007), as combining the calibrated computer model and discrepancy function can typically improve the predictive accuracy than the prediction using the computer model alone.

The GaSP model used in emulation and calibration is rather different than the one in modeling spatially correlated data. The key difference is that the input space of the computer model usually has multiple dimensions and completely different scales. The isotropic assumption is thus too restrictive. Instead, for any with dimensions, one often assumes a product correlation (Sacks et al. (1989))

 c(xa,xb)=px∏l=1cl(xal,xbl), (1.1)

where each is a one-dimensional isotropic correlation function for the th coordinate of the input, each typically having an unknown range parameter and fixed roughness parameter , . This choice of the correlation will be used herein due to its flexibility in modeling correlation and tractability in computation.

The performance of a GaSP model in emulation and calibration depends critically on the parameter estimation of the GaSP model. For the emulation problem, it’s been recognized in many studies that some routinely used methods, such as the maximum likelihood estimator (MLE), produce unstable estimates of the correlation parameters (Oakley, 1999; Lopes, 2011). The instability in parameter estimation results in a great loss of the predictive accuracy, as the covariance matrix is estimated to be near-singular or near-diagonal. This problem is partly overcome by the use of the reference prior (Berger et al., 2001; Paulo, 2005), where the marginal posterior mode estimation under certain parameterizations eliminates these two unwelcome scenarios (Gu et al. (2018b)).

Other than the reference prior, many proper and improper priors were previously studied for the GaSP model in emulation and calibration, often with a product form with various parameterizations, including the inverse range parameter , natural logarithm of the inverse range parameter , and correlation parameter , . For example, was utilized in Kennedy and O’Hagan (2001) and was assumed in Conti and O’Hagan (2010). An independent beta prior for is utilized in Higdon et al. (2008), and the spike and Slab prior for the same parameterization is used in Savitsky et al. (2011). Though eliciting the prior information has been discussed in the literature (Oakley (2002)), it is rather hard to faithfully transform subjective prior knowledge to the GaSP model with the product correlation function in (1.1).

In this work, we propose a new class of priors, called the jointly robust (JR) prior, for both the emulation problem and calibration problem. This prior maintains most of the advantages of the reference prior in emulation, and it has a closed-form normalizing constant, moments and derivatives. In comparison, although the computational operations of the reference prior is normally acceptable, the derivative of the reference prior is more computationally expensive. In practice, the numerical derivatives of the reference prior are often used for the marginal posterior mode estimation, which slows down the computation. Moreover, the prior moments and the normalizing constant of range parameters by the reference prior are unknown and even hard to compute, because of the near-singular correlation matrix when all range parameters are large.

In the calibration problem, we establish the posterior propriety for the calibration problem of a wide class of priors, including the reference prior and JR prior in general scenarios. The identifiability problem of the calibration parameters was found in many previous studies (Arendt et al. (2012); Tuo and Wu (2015)), partly due to the large correlation estimated by the data (Gu and Wang (2017)). Though the posteriors of the reference prior and JR prior are shown to be proper for the calibration problem in this work, the density of the JR prior has slightly larger slope than the density of the reference prior when the range parameters in the covariance function get large, preventing the correlation from being estimated to be too large. Numerical results of the advantages of using the JR prior against the reference prior in calibration will be discussed. Two R packages, called “RobustGaSP” and “RobustCalibration”, are developed for the emulation and calibration problems, and the jointly robust prior is used as the default choice in both packages (Gu et al. (2018a); Gu (2018)).

Furthermore, another advantage of the JR prior is that it can identify the inert inputs efficiently through the marginal posterior mode of a full model, whereas the mode with the reference prior does not have this feature. The inert inputs are the ones that barely affect the outputs of the computer model. Having inert inputs is a fairly common scenario with computer models. E.g. In TITAN2D computer model for simulating volcanic eruption (Bayarri et al., 2009), the internal friction angle has a negligible effect on the output. In emulation, having an inert input can sometimes result in worse prediction than simply omitting them and in calibration, one may hope to spend more efforts in calibrating the influential inputs than the inert inputs. The full Bayesian variable selection of the inputs in a computer model is often prohibitively slow, as each evaluation of the likelihood is computationally expensive, whereas the marginal posterior mode by the JR prior is much faster for identifying the inert inputs, discussed in Section 4.1.

Compared to other frequently used priors other than the reference prior, the new class of priors studied in this work is not a product of marginal priors of the range parameter or its transformation. The advantage is that the marginal posterior mode estimation with the new prior is both robust and useful in identifying the inert inputs in the mathematical model. Using a product of marginal priors of the parameters in the covariance function may not achieve both properties at the same time.

The paper is organized as follows. In Section 2, we review the GaSP model in emulation and calibration, exploring the benefit of the reference prior in emulation that were not noticed before. A general theorem about the posterior propriety is also derived in the calibration setting. In Section 3, we introduce the JR prior, and compare with the reference prior in calibration and emulation. The variable selection problem is introduced in Section 4. The numerical studies of using the JR prior for emulation, variable selection and calibration will be discussed in Section 5. We conclude the paper in Section 6.

## 2 Gaussian stochastic process model

In this section, we first shortly introduce the GaSP model in Section 2.1. The model will be extended for emulation and calibration in Section 2.2 and Section 2.3, respectively. The posterior propriety will also be studied in the calibration problem in Section 2.3.

### 2.1 Background

To begin with, consider a stationary Gaussian stochastic process on a -dimensional input space ,

 y(⋅)∼GaSP(μ(⋅),σ2c(⋅,⋅)) (2.1)

where and are the mean and covariance functions, respectively. Any marginal distribution follows a multivariate distribution,

 (y(x1),...,y(xn))T∼MN(μ,σ2R),

where is an

-dimensional vector of the mean, and

is an covariance matrix with the entry being , where is a correlation function.

The mean function for any input is typically modeled via the regression

 μ(x)=h(x)θm=q∑t=1ht(x)θmt, (2.2)

where is a row vector of the mean basis functions and is the unknown regression parameter of the basis function , for , with being the number of the mean basis specified in the model.

The product correlation function in (1.1) is assumed and thus the correlation matrix is , where is the Hadamard product. The entry of is parameterized by , a one dimensional correlation function for the th coordinate of the input, . We focus on two classes of widely used correlation functions: the power exponential correlation and Matérn correlation. Define for any . The power exponential correlation has the form

 cl(dl)=exp{−(dlγl)αl}, (2.3)

where is an unknown nonnegative range parameter to be estimated and is a fixed roughness parameter, often chosen to be a value close to 2 to avoid the numerical problem when (Bayarri et al., 2009; Gu and Berger, 2016).

The Matérn correlation has the following form

 cl(dl)=12αl−1Γ(αl)(dlγl)αlKαl(dlγl), (2.4)

where is the gamma function, is the modified Bessel function of the second kind with the range parameter and roughness parameter being and , respectively. The Matérn correlation has a closed-form expression when with , and becomes the exponential correlation and Gaussian correlation, when and , respectively. Though we focus on these two classes of correlation functions, the results are applicable to other different correlation functions shown in Gu et al. (2018b).

### 2.2 GaSP emulator and the reference prior

The goal of emulation is to predict and assess the uncertainty on the real-valued output of a computationally expensive computer model, denoted as , based on a finite number of chosen inputs, often selected to fill the input domain , e.g. the Latin Hypercube Design (Sacks et al., 1989; Santner et al., 2003). Let us model the unknown function via a GaSP defined in (2.1). Denote the outputs of the computer model at chosen inputs . Conditional on , the GaSP emulator is to predict and quantify the uncertainty of the output at by the predictive distribution of .

The GaSP emulator typically consists of the mean parameters, variance parameter and range parameters, denoted as

. The reference prior for the GaSP model with the product correlation was developed in Paulo (2005) and the form is given by

 πR(θm,σ2,γ)∝πR(γ)σ2, (2.5)

with , where is the expected Fisher information matrix as below

 I∗(γ)=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝n−qtr(W1)tr(W2)...tr(Wpx)tr(W21)tr(W1W2)...tr(W1Wpx)tr(W22)...tr(W2Wpx)⋱⋮tr(W2px)⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠, (2.6)

where , for , and is the partial derivative of the correlation matrix with respect to the th range parameter, and , with .

After marginalizing out by the prior in (2.5), the marginal likelihood will be denoted as . As each evaluation of the likelihood requires the inversion of the covariance matrix, which is generally at the order of

, the full Bayesian computation through the Markov Chain Monte Carlo (MCMC) is typically prohibitive. It is common to simply estimate

by the marginal posterior mode in emulation

 (^γ1,…^γpx)=argmaxγ1,…,γpx{L(γ1,…,γpx∣y)πR(γ1,…,γpx)}. (2.7)

Some routinely used estimators, such as the maximum likelihood estimator (MLE) and maximum marginal likelihood estimator (MMLE) with regard to have been found to be unstable in estimating the range parameters in various studies (see e.g. Figure 2 in Li and Sudjianto (2005), Figure 2.2 in Lopes (2011) and Figure 1 in Gu et al. (2018b)). The problem is often caused by the estimated correlation matrix being near-diagonal (, where

is the identity matrix of size

) or being near-singular (). In both scenarios, the problems are caused by the estimation of the range parameters. More specifically, as shown in Lemma 3.3. in Gu et al. (2018b), the profile likelihood function may not decrease when any , , which sometimes results in the estimated correlation matrix being near diagonal, whereas the marginal likelihood may not decrease when any or all , , leading to the near-diagonal correlation matrix or near-singular correlation matrix, respectively. Thus, the robust estimation of the parameters is defined as avoiding these two possible problems, as follows.

###### Definition 1.

(Robust Estimation.) Estimation of the parameters in the GaSP is called robust, if neither nor , where is the estimated correlation matrix.

It is shown in Gu et al. (2018b) that the marginal posterior mode estimation with the reference prior is robust under or parameterization, while some other alternatives, such as the MLE and MMLE, do not have this property. Note that the near-diagonal estimation () can easily happen for when a product correlation structure is used because, if any of the matrices in the product correlation matrix is near-diagonal, the correlation matrix will be near-diagonal. Thus, using the maximum marginal posterior mode estimation with the reference prior is particularly helpful, when the dimension of the input is larger than 1.

The reference prior has many other advantages in emulation that were not noticed before. First, when the dimension of the inputs increases, the prior mass moves from the smaller values of to the large values of , for each . This is an important property since, as any of , is near diagonal, a degenerate case that should be avoided. When increases, the chance that at least one is estimated to be small increases, if the prior mass does not change along with and, consequently, the chance that also increases. The reference prior adapts to the increase of the dimension by concentrating more prior mass at larger , avoiding , when increases.

Second, when a denser design is used in a fixed domain of the input space, the prior mass of the reference prior parameterized by moves to the domain with smaller values. This is helpful for the inversion of the covariance matrix in practice, because as points fill with a fixed domain of the input space, the covariance matrix becomes singular if does not change.

Here we provide a numerical justification of first two properties of the reference prior in Figure 1, where the reference prior density of the log inverse range parameter for the GaSP with a power exponential correlation function is shown. Comparing the figures with different sample sizes, the mode of the prior moves to the region with larger values of log inverse range parameters (or equivalently the smaller range parameters), when the sample size increases. Comparing the figures with different dimensions, the prior mass moves to the region with smaller values of the log inverse range parameters (or equivalently the larger range parameters), when the inputs have higher dimensions.

The third property of interest is that the reference prior is invariant to the location-scale transformation of the inputs, if the mean basis functions contain only the intercept and the linear terms of . When we apply a location-scale transformation of each coordinate of the input , for , the new reference prior is . This makes the prior scale naturally to the range of the inputs; as a consequence, we do not need to normalize the inputs.

In addition, the reference prior has an appropriate tail decay rate at the limits when and (Gu et al. (2018b)). When for any , the density of the reference prior decreases at an exponential rate approximately; when for all , the density of the reference prior deceases at a polynomial rate. The first part of the tail rates induces an exponential penalty to the likelihood when the correlation matrix is near diagonal, prohibiting the undesired situation in emulation. The posterior with the reference prior has slow polynomial decay rates when is large for all (or equivalently ), allowing the marginal likelihood to come into play at this limit. The larger is found to make prediction more precise (Zhang (2004)), and thus a small polynomial penalty from the reference prior both reduces the singular estimation of the covariance matrix and maintains high accuracy in prediction.

Despite various benefits in using the reference prior for emulation, the computational challenges still persist with the use of the reference prior, even if the posterior mode estimation is used in lieu of the posterior sampling. The computational order of the reference prior is , which is mainly from computing in (2.6), for , and the inversion of the covariance matrix. However, the closed form derivatives of the reference prior are very computationally intensive, as it requires to compute , for . The total computational orders of directional derivatives of the reference prior is , because the computational order of each directional derivative is for the matrix multiplication. Because of these reasons, the author does not find any literature that provides the closed-form derivatives of the reference prior in this scenario, though some frequently used mode search algorithms, such as the low-storage quasi-Newton optimization method (Nocedal (1980)), typically rely on the information of the derivatives. Instead, one typically computes the numerical derivatives, which requires more evaluations of the likelihood, each with in computing the inversion of the covariance matrix, and thus it is also very time consuming. In addition, the reference prior could also induce some extra local modes, making the optimization algorithm harder to converge to the global mode (see e.g. the change of the slope of the reference prior density in the upper middle panel in Figure 1 and another example is given in Figure 3.3 in Gu (2016)).

Some inputs of the computer model may have very small effects on the outputs of the computer model. These inputs are called inert inputs and are often omitted in emulation. When the inert inputs are omitted, a noise is needed in the GaSP emulator, as the emulator should no longer be an interpolator at the design points. The GaSP emulator can be extended to include a noise or nugget, , where still follows a GaSP model and is an independent Gaussian noise. Define the nugget variance ratio parameter . The reference prior has been derived for the GaSP model with a noise (Ren et al. (2012); Kazianka and Pilz (2012); Gu and Berger (2016)). The advantages of using the reference prior with a nugget are similar to our previous discussion and are thus omitted here.

### 2.3 GaSP for computer model calibration

Some parameters in the computer model are unknown and unobservable in experiments. We denote the mathematical model output by , where is a -dimensional vector of observable inputs in experiment and is a -dimensional vector of unobservable parameters. The calibration problem is to estimate by a set of field data . In practice, a perfect mathematical model to the reality is rarely the case. It is common to address the model misspecification by a discrepancy function, such that the reality can be represented as , where and denote the reality and discrepancy function, respectively. It leads to the following statistical model for calibration

 yF(x)=fM(x,θ)+δ(x)+ϵ, (2.8)

where is an independent zero-mean Gaussian noise. For simplicity, we assume is computationally cheap to evaluate for now.

As we often know very little about the discrepancy function, the GaSP is suggested in Kennedy and O’Hagan (2001) to model the discrepancy function, i.e. , where the mean and correlation functions are defined in (2.2) and (1.1), respectively. It is usual to define , the nugget-variance ratio parameter for the computational reason, as now

is a scale parameter which has a conjugate prior.

The parameters in (2.8) consist of the calibration parameters, mean parameters, range parameters, a variance parameter and a nugget parameter in the covariance function, denoted as . Consider the following prior for the calibration problem

 π(θ,θm,σ2,γ,η)∝π(γ,η)π(θ)σ2. (2.9)

As the calibration parameters normally have scientific meanings, is typically chosen by expert knowledge and thus we do not give a specific form herein. To the author’s knowledge, the posterior propriety has not been shown for the above prior in the calibration problem, except for the case that is linear with regard to . We have the following theorem to guarantee the posterior propriety when the prior in (2.9) is used. The proof for Theorem 1 generalizes the proof in Berger et al. (1998), which is a special case with a mean parameter, a variance parameter and two independent observations.

###### Theorem 1.

Assume the prior follows (2.9) for the calibration model in (2.8) with and being proper priors. Let be an matrix. If has full rank and , the posterior is proper.

###### Proof.

Since has full rank and , one can select linearly independent rows of , denoted as , such that is invertible. W.l.o.g., we assume the first rows of are linearly independent.

We first marginalize out the last field data and the resulting density is denoted as . As and are both proper, we then marginalize out and obtain the proper marginal density . Since are the location-scale parameters for the marginal density, one has

 ∫...∫p(yF1:(q+1)∣θm,σ2)π(θm,σ2)dθmdσ2 = ∫...∫1(σ2)(q+1)/2+1p(y(x1)−h(x1)θmσ,...,y(xq+1)−h(xq+1)θmσ)dθmdσ2 = ∫...∫1(σ2)(q+1)/2+1|J−1|p(~y1,...,~yq+1)d~y1...d~yq+1

where the fist equation follows from the definition of the location-scale family and the second equation follows from parameter transformation for , for , with the Jacobian determinant being

 J−1 =∣∣ ∣ ∣ ∣ ∣∣−h1(x1)σ⋯−hq(x1)σ−yF(x1)−h(x1)θmσ3⋮⋱⋮⋮−h1(xq+1)σ⋯−hq(xq+1)σ−yF(xq+1)−h(xq+1)θmσ3∣∣ ∣ ∣ ∣ ∣∣−1 =σq+3∣∣ ∣ ∣ ∣∣h1(x1)⋯hq(x1)yF(x1)⋮⋱⋮⋮h1(xq+1)⋯hq(xq+1)yF(xq+1)∣∣ ∣ ∣ ∣∣−1, =σq+3J−10,

where . Hence one has

 ∫...∫p(yF1:(q+1)∣θm,σ2)π(θm,σ2)dθmdσ2dyF1:(q+1)=|J−10|<∞.

Note that the reference prior in (2.5) is proper for many widely used correlation functions, as long as the intercept is contained in the mean basis matrix, i.e. , where denotes the column space of the mean basis matrix (Gu et al. (2018b)). Theorem 1 states that using the reference prior is legitimate in the calibration problem when the mean basis contains an intercept. Empirically, the reference prior changes very little with an added intercept in the column space of the mean basis matrix.

We have assumed the mathematical model is computationally cheap so far. When the computer model is expensive to run, one can combine the GaSP emulator in the calibration model through a full Bayesian approach. In practice, however, since the field data typically contain larger noises and may not provide much information for the emulation purpose, a modular approach is often used, meaning that the GaSP emulator only depends on the outputs of the computer model (Liu et al. (2009)). We refer to Bayarri et al. (2007) for an overview of combining the GaSP emulator in a calibration model. The modular approach is implemented in Gu (2018), where the parameters in the GaSP emulator are estimated based on the outputs of the computer model (Gu et al. (2018a)

). In the calibration, we draw a sample from the posterior predictive distribution of the GaSP emulator when we need to evaluate the computer model.

## 3 Jointly robust prior

We introduce a new class of priors for calibration and emulation of mathematical models in this section. In Section 3.1 and Section 3.2, we show that the JR prior has all the nice features of the reference prior discussed in Section 2.2. The benefits of the new prior in calibration and identifying the inert inputs in mathematical models will be discussed in Section 3.1 and Section 4, respectively.

### 3.1 Calibration

We first introduce the new prior in the calibration setting, where the model in given in (2.8) with the discrepancy function modeled as a GaSP. Define the inverse range parameter , for , and the nugget-variance parameter in the covariance function. The overall prior follows

 π(θ,θm,σ2,β,η)∝πJR(β,η)π(θ)σ2. (3.1)

The key part is the prior for the range parameters and nugget-variance parameter, where we call it the jointly robust (JR) prior and the form is given as follows

 πJR(β1,...,βpx,η)=C(px∑l=1Clβl+η)aexp{−b(px∑l=1Clβl+η)}, (3.2)

where is a normalizing constant; , and are prior parameters. The name “jointly robust” is used to reflect the fact that the prior can’t be written as a product of the marginal priors of the range parameter for each coordinate of the input, and it is robust in marginal posterior mode estimation (see Section 3.2 for details). The form is inspired by the tail rate of the reference prior at shown in Lemma 4.1 in Gu et al. (2018b). Besides, the JR prior is a proper prior. The posterior propriety of using (3.2) is thus guaranteed when is proper, shown in Theorem 1.

We first show some properties of this prior and then discuss the default choice of the prior parameters. First of all, the normalizing constant of the prior is given as follows.

###### Lemma 1.

(Normalizing constant.) The jointly robust prior is proper and has the normalizing constant , where is the gamma function.

###### Proof of Lemma 1.
 1C= ∫...∫(px∑l=1Clβl+η)aexp(−b(px∑l=1Clβl+η))dηdβ1...dβpx = ∫...∫(∑pxl=1~βl+η)aexp(−b(∑pxl=1~βl+η))∏pxl=1Cldηd~β1...d~βpx,let~βl=Clβl, = ∫zaexp(−bz)∏pxl=1Cl∫...∫~β1+...+~βpx

The marginal prior mean and variance are given in the following lemma.

###### Lemma 2.

(Prior mean and variance.) For , the prior mean and prior variance are given as follows.

and .

and .

###### Proof of Lemma 2.

We only show the prior mean for , as the proof of the prior mean for is similar. For any

 E[βi]= ∫...∫βi(px∑l=1Clβl+η)aexp(−b(px∑l=1Clβl+η))dηdβ1...dβpx = ∫czaexp(−bz)Ci∏pxl=1Cl∫...∫~β1+...+~βpx

Using the similar method for the prior mean, for any , we have

 EπJR[β2i]=2(a+px+2)(a+px+1)(px+1)(px+2)C2ib2.

Part (ii) follows from for . ∎

The prior parameters of the jointly robust prior in Equation (3.2) consist of the overall scale parameter , the rate parameter and input scale parameters , . First, we let , where and are the maximum and minimum values of the input at the th coordinate, which makes the reference prior invariant to the location-scale transformation of the input. The factor is the average distance between the inputs, as the average sample size for each coordinate of the input is when we have inputs from a Lattice design at a dimensional input space. This choice allows the JR prior to match the behavior of the reference prior to the change of dimensions and number of observations. Second, we let to have a large exponential penalty to avoid the estimation of being near diagonal.

The choice of is an open problem and may depend on specific scientific goals. In the calibration setting, when is close to , the prior density is almost flat when and , resulting in the large estimated correlation in some scenarios, which makes the calibrated computer model without the discrepancy function fit the reality poorly (Gu and Wang (2017)). On the contrary, when is large, the method is biased to small correlation and make the prediction less accurate. In the RobustCalibration package (Gu (2018)), is the default setting for the calibration problem, which balances between prediction and calibration. , and , , will be used for all numerical comparisons in calibration.

In Figure 2, the densities of the JR prior and reference prior with are graphed in the upper panels. The JR prior matches the reference prior reasonably well. When the number of observations increases, the mass of the JR prior moves to the domain with the larger values of , preventing overwhelmingly large correlation. The densities of the JR prior are graphed in the lower panels with . When the dimension of inputs increases, the mass of the JR prior moves to the domain with the smaller values of , preventing the covariance matrix from being estimated to be diagonal. Both features are important for avoiding the degenerate cases discussed in Section 2.2.

Furthermore, with , the tail of the JR prior decreases slightly faster than the reference prior when shown in Figure 2. This is helpful for the identification of the calibration parameters, an example of which is given in Section 5.3.

### 3.2 Emulation

In this subsection, we discuss parameter estimation with the JR prior in a GaSP emulator introduced in Section 2.2. As computer models are often deterministic, the JR prior has the following form

 πJR(β1,...,βpx)=C0(px∑l=1Clβl)aexp{−b(px∑l=1Clβl)}, (3.3)

where . The JR prior in (3.3) is a special case of (3.2) with , so the properties of the prior discussed in Section 3.1 can be easily extended to this scenario.

One important feature of the reference prior is that the marginal posterior mode estimation is robust under the and parameterization. Here we show a similar result when the JR prior is used to replace the reference prior in the maximum marginal posterior posterior estimation in (2.7).

###### Theorem 3.1.

(Robust estimation of the JR prior.) Assume the JR prior in (3.3) with and .

• Under the parameterization of the range parameter and the log inverse range parameter , the marginal posterior mode estimation with the JR prior is robust if ,

• Under the parameterization of the inverse range parameter , the marginal posterior mode estimation with the JR prior is robust if .

###### Proof.

By Lemma 3.3 in Gu et al. (2018b), the marginal likelihood if for all , or for any , . The results follow from the fact that the density of the prior is zero at the two limits. ∎

Note that the marginal posterior mode with the reference prior under the parameterization of the inverse range parameter is not robust. Surprisingly, the marginal posterior mode with the reference prior will always be at under parameterization, and should clearly be avoided (Gu et al. (2018b)). By Theorem 3.1, the marginal posterior mode estimation with the JR prior is robust under the parameterization if . It has some added advantages for variable selection, as the posterior is positive if any given in the following remark.

###### Remark 1.

(Tail rates.) Assume the JR prior in (3.3) with , and . Here denotes the vector of for all , .

When , the natural logarithm of the JR prior approximately decreases linearly with the rate .

When for all , the natural logarithm of the JR prior decreases at the rate of .

When and , is finite and positive.

The first and second parts of the jointly robust prior match the exponential and polynomial tail decaying rates of the reference prior discussed in Section 2.2. The third part is an improvement, which allows the identification of inert inputs by the marginal posterior mode with the jointly robust prior, discussed more in the next section. Note that the third part only holds for the parameter estimation under the parameterization by the inverse range parameter , while the JR prior loses such property under the parameterization by the other parameterizations, e.g. and . Thus we propose the following marginal posterior mode estimation with the reference prior

 (^β1,…^βpx)=argmaxβ1,…,βpx{L(β1,…,βpx∣y)πJR(β1,…,βpx)}. (3.4)

where is the JR prior in (3.3), with , and . Here we use the same default prior parameters and for the reasons discussed in Section 3.1. is implemented in the RobustGaSP Package as a default choice for emulation (Gu et al. (2018a)).

## 4 Variable selection and sensitivity analysis

This section discusses the issue for identifying inert inputs of computer models. We first introduce a computationally feasible approach of identifying the inert inputs by the JR prior and then discuss the sensitivity analysis approach.

### 4.1 Identifying the inert inputs by the JR prior

Assume the GaSP emulator is used to model the computer model output . W.l.o.g., we assume the input only appears in the covariance function in (1.1). Variable selection in this context was studied in Schonlau and Welch (2006); Linkletter et al. (2006); Savitsky et al. (2011). In Schonlau and Welch (2006), the variable is selected one by one through a screening algorithm with the functional analysis of the variance, while the number of models to be computed is at the order of . In Linkletter et al. (2006), the size of the range parameters is used as an indicator to decide whether the input is influential and the full posteriors are sampled from a Metropolis Hasting algorithm. In Savitsky et al. (2011), a spike and slab prior is used for the transformation of the range parameters. However, the difficulty with the model selection strategy comes from the computational burden, as the model space is , and each evaluation of the likelihood requires flops for the inversion of the covariance matrix.

Note that the difficulty of the variable selection in this context comes from the fact that no closed-form marginal likelihood is available. However, when using the product correlation function in (1.1), the hope is that, for an inert input , , in which it will not affect the correlation matrix . Using posterior mode estimation with reference prior, this would happen if . However, as shown in the following lemma, marginal posterior mode estimation with robust parameterizations utilizing the reference prior cannot identify inert inputs. The proof follows directly from the tail rate computed in Lemma 4.1. and Lemma 4.2. in Gu et al. (2018b).

###### Lemma 3.

The marginal posterior of range parameters (or the logarithm of inverse range parameters ) goes to if some, but not all, (or ), , for both the power exponential and Matérn correlation functions, when the reference prior in (2.5) is used.

According to Lemma 3, the marginal posterior mode with two parameterizations will never appear at for any , as the posterior density is if for some but not all . The identifiability of inert inputs with the posterior mode estimation, however, requires the posterior density is positive when for some but not all (otherwise it is not a robust parameter estimation). Other transformation of the reference prior is also less likely to both maintain the robustness parametrization and identify inert inputs. Such difficulties could lead to inferior prediction results when some inert inputs are present in computer models.

Luckily, the marginal posterior mode with the JR prior is positive when for some but not all , stated in the following lemma.

###### Lemma 4.

The marginal posterior of inverse range parameters is positive if and some, but not all, , , for both the power exponential and Matérn correlation functions, when the JR prior in (3.3) with , and .

The proof of Lemma 4 follows from the tail rate of the marginal likelihood of the GaSP model (Lemma 3.1 and Lemma 4.1 in Gu et al. (2018a)) and the tail rate of the JR prior in Remark 1. When , the th input is not in the covariance in GaSP model. In practice, exact zero estimation is not likely to be obtained. Thus, we use the normalized inverse range parameters as an indicator of the importance of each input

 ^Pl=Cl^βl∑pxi=1Ci^βi, (4.1)

where are estimated in Equation (3.4). The involvement of is to take the scale of different inputs into account. The part in the denominator is the overall size of the estimation and the is the contribution by the th input.

The size of the inverse range parameters has been used to infer which the input is inert, but with a different prior (Linkletter et al. (2006)). However, the jointly robust prior yields better results, as compared in the Section 5.2.

One may use a certain threshold of the normalized inverse range parameters to predict whether the input is inert or not, i.e.

 ^Pl≤p0/px, (4.2)

where may be chosen as a constant between 0 to 1. Such values could also depend on the number of observations, dimension of the inputs and expected number of inputs to be chosen. However, because all inputs affect the outputs in a computer model, the threshold might be less important and can be chosen based on the scientific goal. We do not try to present a method for the full model selection, as each computation of the likelihood can be expensive. The point, here, is that the computation of the does not take any extra computation (as the posterior modes are typically needed for building a GaSP model), and can serve as a indicator to tell an input is inert or not.

### 4.2 Sensitivity analysis

Sensitivity analysis in computer model concerns with the problem of learning how changes of inputs affect the outputs. The inputs, in the computer model, typically associate with a distribution , reflecting the belief of the input values. One of the main goal related to the sensitivity analysis is to identify how much a set of inputs influence the variability of outputs, which is studied through the functional analysis of the variance (functional ANOVA).

It is possible to decompose a function as follows (Hoeffding (1948))

 fM(x)=z0+px∑i=1zi(x