# Gaussian Processes with Input Location Error and Applications to the Composite Parts Assembly Process

In this paper, we investigate Gaussian process regression with input location error, where the inputs are corrupted by noise. Here, we consider the best linear unbiased predictor for two cases, according to whether there is noise at the target untried location or not. We show that the mean squared prediction error does not converge to zero in either case. We investigate the use of stochastic Kriging in the prediction of Gaussian processes with input location error, and show that stochastic Kriging is a good approximation when the sample size is large. Several numeric examples are given to illustrate the results, and a case study on the assembly of composite parts is presented. Technical proofs are provided in the Appendix.

## Authors

• 16 publications
• 8 publications
• 4 publications
• 10 publications
• ### Composite Inference for Gaussian Processes

Large-scale Gaussian process models are becoming increasingly important ...
04/24/2018 ∙ by Yongxiang Li, et al. ∙ 0

• ### Prediction performance after learning in Gaussian process regression

This paper considers the quantification of the prediction performance in...
06/13/2016 ∙ by Johan Wågberg, et al. ∙ 0

• ### Compressive Estimation of a Stochastic Process with Unknown Autocorrelation Function

In this paper, we study the prediction of a circularly symmetric zero-me...
05/09/2017 ∙ by Mahdi Barzegar Khalilsarai, et al. ∙ 0

• ### Optimal Shape Control via L_∞ Loss for Composite Fuselage Assembly

Shape control is critical to ensure the quality of composite fuselage as...
11/09/2019 ∙ by Juan Du, et al. ∙ 0

• ### Composite Gaussian Processes: Scalable Computation and Performance Analysis

Gaussian process (GP) models provide a powerful tool for prediction but ...
01/31/2018 ∙ by Xiuming Liu, et al. ∙ 0

• ### Accounting for Input Noise in Gaussian Process Parameter Retrieval

Gaussian processes (GPs) are a class of Kernel methods that have shown t...
05/20/2020 ∙ by J. Emmanuel Johnson, et al. ∙ 1

• ### Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck

Gaussian processes provide a framework for nonlinear nonparametric Bayes...
04/23/2020 ∙ by Alec Koppel, et al. ∙ 2

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Gaussian process modeling is widely used to recover underlying functions from scattered evaluations, possibly corrupted by noise. This method has been utilized in spatial statistics for several decades Matheron (1963); Cressie (2015). Later, Gaussian process modeling has been applied in computer experiments to build emulators of their outputs Sacks et al. (1989). In order to capture the randomness of real systems, it is natural to use stochastic simulation in computer experiments. For Gaussian process modeling, the output associated with each input can be decomposed as the sum of a mean Gaussian process output and random (Gaussian) noise. Following the terminology in design of experiments Wu and Hamada (2009), we call the noise added to the mean Gaussian process output as extrinsic noise. The extrinsic noise is usually from uncertainty associated with responses, such as measurement errors, computational errors and other unquantified errors. The corresponding Gaussian process modeling with extrinsic noise is called stochastic Kriging Ankenman et al. (2010). In spatial statistics, the noise is known as a nugget effect Matheron (1963).

Besides extrinsic noise, in some cases, the input measurements are also corrupted by noise. Noisy or uncertain inputs are quite common in spatial statistics, because geostatistical data are often indexed by imprecise locations. Detailed examples can be found in Barber et al. (2006) and Veneziano and Van Dyck (1987). We call the input noise intrinsic noise. If the input measurements are corrupted by noise in a Gaussian process, it is known as a Gaussian process with input location error, and the corresponding best linear unbiased predictor is called Kriging adjusted for location error (KALE) Cressie and Kornak (2003). Also see Girard (2004); Dallaire et al. (2009); Bócsi and Csató (2013); McHutchon and Rasmussen (2011) for more discussions. KALE has been applied in a spectrum of arenas, including robotics Deisenroth et al. (2015), wireless networks Muppirisetty et al. (2016), and Wi-Fi fingerprinting He et al. (2017).

KALE predicts the mean Gaussian process output at an untried point without intrinsic noise. In some cases, however, the prediction of the mean Gaussian process output at an untried point with intrinsic noise is desired. A motivating example is the composite aircraft fuselage assembly process. In this process, a model is needed to predict the dimensional deviations under noisy actuators’ forces. Further, when new actuator forces are implemented in practice, there is an inevitable intrinsic noise, i.e., uncertainty in the actually delivered actuator forces. Therefore, the output at an untried point has intrinsic noise. Under this scenario, we consider Kriging adjusting for location error and noise (KALEN), which is the best linear unbiased predictor of the mean Gaussian process output at an untried point with intrinsic noise.

In this paper, we discuss three predictors, KALE, KALEN, and stochastic Kriging, applied in prediction and uncertainty quantification of Gaussian process regression with input location error. We show that unlike Gaussian process regression without location error, the mean squared prediction error (MSPE) of these three predictors does not converge to zero as the sample size goes to infinity. Furthermore, we show that the limiting MSPE of KALEN and stochastic Kriging are equal if an untried point has intrinsic noise. We obtain an asymptotic upper bound on the MSPE of KALE and stochastic Kriging if there is no noise at an untried point. Numeric results indicate that if the sample size is relatively small and noise is relatively large, KALE or KALEN have a much smaller MSPE, and thus are desirable, compared with stochastic Kriging. We also compare the performance of KALEN and stochastic Kriging in the modeling of a composite parts assembly process problem. We find that the KALEN and stochastic Kriging are comparable across a range of small intrinsic noise levels, corresponding to a range of actuator tolerances, which is consistent with the theoretical analysis.

The remainder of this article is structured as follows. In Section 2, we formally state the problem, introduce KALE and KALEN, and show some asymptotic properties of the MSPE of KALE and KALEN. Section 3

presents some theoretical results when using stochastic Kriging in the prediction of Gaussian processes with input location error. Parameter estimation methods are discussed in Section

4, and numeric results are presented in Section 5. A case study of the composite parts assembly process is considered in Section 6. Technical proofs are given in the Appendix.

## 2 Gaussian Processes with Input Location Error

In this section, we introduce two predictors of the Gaussian processes with input location error, KALE and KALEN. We also give several asymptotic properties of KALE and KALEN.

### 2.1 Two Predictors of Gaussian Processes with Input Location Error

Suppose is an underlying function defined on , and the values of on a convex and compact set are of interest. A standard tool to build emulators is Gaussian process regression (see Fang et al. (2005) and Santner et al. (2013), for example). Specifically, suppose , where is the mean function,

is the variance, and

is the correlation function. For the ease of mathematical treatment, we assume , which is equivalent to removing the mean surface and will not impact the following analysis. Suppose we observe the responses on . Following the terminology in design of experiments Wu and Hamada (2009), we call design points.

For a Gaussian process with input location error, the input measurements are corrupted by noise. In this paper, we mainly focus on the intrinsic error and assume the responses are not influenced by the extrinsic error. Specifically, suppose the responses are perturbed by the intrinsic error, that is, we observe for , where the

’s are i.i.d. random variables with mean

, finite variance

, and have a probability density function

.

Following the approach in Cressie and Kornak (2003), the best linear unbiased predictor of on an untried point is given by

 p(Y;x)=αT1Y+α2, (1)

where are the solution to the optimization problem

 min(α1,α2)E(f(x)−p(Y;x))2=min(α1,α2)E(f(x)−αT1Y−α2)2, (2)

and the responses on the design points are . Note that

 E(f(x)y(xj)) =σ2∫Ψ(x,xj+ϵj)h(ϵj)dϵj, E(y(xj)y(xk)) ={σ2Ψ(xj,xj),j=k,σ2∬Ψ(xj+ϵj,xk+ϵk)h(ϵj)h(ϵk)dϵjdϵk,j≠k. (3)

By plugging (2.1) in (2) and minimizing (2) with respect to , we obtain the solution to (2) is and , where

denotes the covariance vector between

and with

 r(x,xj)=σ2∫Ψ(x,xj+ϵj)h(ϵj)dϵj, (4)

and denotes the covariance matrix with

 Kjk={σ2Ψ(xj,xj),j=k,σ2∬Ψ(xj+ϵj,xk+ϵk)h(ϵj)h(ϵk)dϵjdϵk,j≠k. (5)

Plugging and into (1), we find the best linear unbiased predictor of is

 ^f(x)=r(x)TK−1Y. (6)

Cressie and Kornak (2003) refer to (6) as Kriging adjusting for location error (KALE). If the prediction of on an untried point with intrinsic noise is of interest, it can be shown that we only need to replace in (6) by , where

 rN(x,xj)=σ2∬Ψ(x+ϵ,xj+ϵj)h(ϵj)h(ϵ)dϵjdϵ. (7)

We refer to the corresponding best linear unbiased predictor as Kriging adjusting for location error and noise (KALEN). One simple relation between KALE and KALEN is .

In some cases, there exist closed forms of the integrals in (4)–(7). For example, if the correlation function , and the noise , where is the correlation parameter, and

is a mean zero normal distribution with covariance matrix

, then (4)–(7) can be calculated respectively as

 Kjk=⎧⎪ ⎪⎨⎪ ⎪⎩σ2j=k,σ2(1+4σ2ϵθ)d/2e−θ∥xj−xk∥221+4σ2ϵθj≠k, r(x,xj)=σ2(1+2σ2ϵθ)d/2e−θ∥x−xj∥221+2σ2ϵθ, rN(x,xj)=σ2(1+4σ2ϵθ)d/2e−θ∥x−xj∥221+4σ2ϵθ. (8)

Unfortunately, in general, equations (4)–(7) are intractable and need to be calculated via Monte Carlo integration by sampling from , which can be computationally expensive. For example, if we choose the Matérn correlation function, then (6) does not have a closed form. In this case, the calculation of (6) will require much time, as we will see in Section 5.

### 2.2 The Mean Squared Prediction Error of KALE and KALEN

Now we consider the mean squared prediction error (MSPE) of KALE and KALEN. The MSPE of KALE can be calculated by

 E(f(x)−^f(x))2 =E(f(x)−r(x)TK−1Y)2 =E(f(x)2)−2r(x)TK−1E(f(x)Y)+r(x)TK−1E(YYT)K−1r(x) =σ2Ψ(x,x)−r(x)TK−1r(x), (9)

where is as in (6), and and are as defined in (4) and (5), respectively. The last equality is true because of (2.1). Similarly, one can check the MSPE of KALEN is

 E(y(x)−^y(x))2=σ2Ψ(x,x)−rN(x)TK−1rN(x), (10)

where is as defined in (7).

Define

 ΨS(s,t)=∬Ψ(s+ϵ1,t+ϵ2)h(ϵ1)h(ϵ2)dϵsdϵt. (11)

In Proposition 3.1 of Cervone and Pillai (2015), it is shown that if a function for and , then is a valid correlation function. Therefore, the covariance matrix defined in (5) is positive definite. We first consider the asymptotic properties of (10) as the fill distance goes to zero, where the fill distance of the design points is defined by

 hX:=supx∈Ωminxj∈X∥x−xj∥2. (12)

Notice that the MSPE of KALEN can be expressed as

 E(y(x)−^y(x))2 =σ2Ψ(x,x)−rN(x)K−1rN(X) =σ2(Ψ(x,x)−ΨS(x,x))+σ2ΨS(x,x)−rN(x)K−1rN(X). (13)

Let . Thus, . In the rest of Section 2 and Section 3, we assume the correlation function satisfies the following assumption.

###### Assumption 1.

is a radial basis function, i.e.,

for . Furthermore, assume has continuous second order derivatives and is a decreasing function of , with .

Many widely used correlation functions, including isotropic Gaussian correlation functions and isotropic Matérn correlation functions, satisfy this assumption. For anisotropic correlation functions that have form with an diagonal positive definite matrix and , we can stretch the space to such that for is satisfied. Assumption 1 implies . Intuitively is equal to a covariance matrix plus a nugget parameter. In order to justify this intuition, we need to show that is a covariance matrix, which follows from the fact that is a positive definite function, as stated in the following lemma whose proof is given in Appendix B.

###### Lemma 1.

If is a positive definite function, then is a positive definite function.

In order to study the asymptotic performance of KALE and KALEN, we consider a sequence of designs . We assume the following.

###### Assumption 2.

The sequence of design points satisfies that there exists a constant such that for all , where

 qX=min1≤j≠k≤n∥xj−xk∥/2

for , and card.

It is not hard to find designs satisfy this assumption. For example, grid designs satisfy Assumption 2. In the rest of paper we suppress the dependence of on for notational simplicity. It can be shown that if a Gaussian process has no intrinsic noise, then the MSPE of the corresponding best linear unbiased predictor converges to zero as the fill distance goes to zero. Unlike a Gaussian process without input location error, we show that the limit of the MSPE of KALE and KALEN are usually not zero. In fact, (2.2) and Lemma 1 imply that the MSPE of KALEN is the MSPE of a Gaussian process with extrinsic error plus a non-zero constant. These results are stated in Theorem 1, whose proof is provided in Appendix C.

###### Theorem 1.

The MSPE of KALEN (10) converges to as the fill distance of the design points converges to zero, where is defined in (11).

In Theorem 1, we present a limit of the MSPE of KALEN. The limit is usually not zero. This is expected for KALEN since there is a random error at the untried point . The MSPE limit depends on two parts. One is the variance and the other is the difference . The variance depends on the underlying process, while the difference depends on the distribution of the noise . Roughly speaking, the difference will be larger if the density is more spread out.

One might expect that the MSPE of KALE converges to zero as the fill distance of the design points goes to zero. However, the following proposition shows that, in the case of Gaussian correlation functions and normally distributed intrinsic error, there is a positive lower bound on the MSPE of KALE. The proof can be found in Appendix D.

###### Proposition 1.

Suppose the covariance function for some , and the input noise are i.i.d., where is a mean zero normal distribution with variance . Then for any design , the MSPE of KALE, defined in (2.2), has a lower bound

 σ2(1−(1+4σ2ϵθ)d/2(1+2σ2ϵθ)d).

From Theorem 1 and Proposition 1, we can see that unlike Gaussian processes with only extrinsic error, the MSPEs of the predictors for Gaussian processes with input location error do not converge to zero, unless .

## 3 Comparison Between KALE/KALEN and Stochastic Kriging

It is argued in Cressie and Kornak (2003) and Stein (1999) that using a nugget parameter is one way to counteract the influence of noise within the inputs. Therefore, it is natural to ask whether stochastic Kriging is a good approximation method to predict the value at an untried point, since it is not the best linear unbiased predictor under the settings of Gaussian process with input location error. Cervone and Pillai (2015) claim that a nugget parameter alone cannot capture the effect of input location error. In this paper, we show that the MSPE of stochastic Kriging has the same limit as the MSPE of KALEN, and provide an upper bound on the MSPE of stochastic Kriging if the untried point has no noise, as stated in the following theorem. The proof can be found in Appendix E.

###### Theorem 2.

Let be a constant, where is as defined in (11). A stochastic Kriging predictor of Gaussian process with input location error is defined as

 ^fS(x)=Ψ(x,X)(Ψ(X,X)+μIn)−1Y, (14)

where and .

(i) Suppose there is noise at an untried point. The MSPE of the predictor (14) has the same limit as KALEN, which is , when the fill distance of goes to zero, where is defined in (11).

(ii) Suppose there is no noise at an untried point. An asymptotic upper bound on the MSPE of the predictor (14) is

 2σ2(2π)d∫Rd∣∣1−|b(t)|2∣∣F(Ψ)(t)dt, (15)

where

is the Fourier transform of

and

is the characteristic function of

.

###### Remark 1.

We say is an asymptotic upper bound on a sequence , if there exists a sequence such that and .

Theorem 2 shows that the predictor (14) is as good as KALEN asymptotically. The following proposition states that if the noise is small, then (15) can be controlled. The proof of Proposition 2 can be found in Appendix F.

###### Proposition 2.

Suppose is a sequence of random variables that converges to in distribution. Let

 an=σ2(2π)d∫Rd∣∣1−|bn(t)|2∣∣F(Ψ)(t)dt, (16)

where . Then converges to zero.

One advantage of stochastic Kriging is that we can simplify the calculation since we do not need to calculate the integrals in (5) and (7). If the noise is small and the fill distance is small, Theorem 2 and Proposition 2 state that the MSPE of the predictor (14) can be comparable with the best linear unbiased predictor.

As we mentioned before, it is argued in Cervone and Pillai (2015) that since the integrated covariance function in (5) is not the same as the covariance function in the original Gaussian process without location error, a nugget parameter alone cannot capture the effect of location error. It is true that the MSPE of KALE or KALEN is the smallest among all the linear unbiased predictors. However, our results also show that with an appropriate nugget parameter, the predictor (14) is as good as KALEN asymptotically, and there is little difference between KALE and the predictor (14) if the variance of the intrinsic noise and the fill distance are small.

For the ease of mathematical treatment, we assume the noise ’s are i.i.d. If ’s are independent but not identical, the proof is similar. As a special case, if the underlying process has a Gaussian correlation function, and the intrinsic noise is normally distributed, the lower bound of KALE and the asymptotic upper bound of the predictor (14) can be calculated analytically. We have the following corollary, which is a direct result of Theorem 2, and the proof is omitted.

###### Corollary 1.

Suppose the covariance function and the intrinsic noise are as in Proposition 1. If there is noise at an untried point , the stochastic Kriging predictor has the same asymptotic MSPE of KALEN, which is . If there is no noise at an untried point , the asymptotic upper bound of MSPE for stochastic Kriging is

 2σ2(2π)d∫Rd∣∣1−|b(t)|2∣∣F(Ψ)(t)dt=2σ2(1+1(1+4σ2ϵθ)d/2−2(1+2σ2ϵθ)d/2).

Note that KALE is the best linear unbiased predictor for the prediction of output at an untried point without noise. Therefore, the upper bound of MSPE for stochastic Kriging is also an upper bound of KALE. Corollary 1 implies that the asymptotic MSPE of KALE is smaller than the MSPE of KALEN, which is true intuitively. The MSPE of KALE/KALEN is close to zero if the variance of noise is small. Corollary 1 also implies that the MSPE of KALE/KALEN increases as the input dimension increases.

## 4 Parameter Estimation

An accuracy preserving and computationally feasible technique for estimating the unknown parameters is necessary to actually apply the noisy input model described above. An intuitive approach to estimate the parameters is maximum likelihood estimation. Up to a multiplicative constant, the likelihood function is

 ℓ(θ;X,Y)∝∫…∫|Σ1|−1/2e−12yTΣ−11yh(ϵ1)…h(ϵn)dϵ1…dϵn, (17)

where . Unfortunately, the integral in (17) is difficult to calculate, because the dimension of the integral increases as the sample size increases. In this work, we use a pseudo-likelihood approach proposed by Cressie and Kornak (2003). Define

 ℓg(θ;X,Y)=(2π)−n/2|K|−1/2exp(−12YTK−1Y), (18)

where are parameters we want to estimate, and is defined in (5). The maximum pseudo-likelihood estimator can be defined as

 ^θ1=argsupθℓg(θ). (19)

Because of non-identifiability, parameters inside the Gaussian process and cannot be estimated simultaneously Cervone and Pillai (2015). The properties of the pseudo-likelihood approach are discussed in Cervone and Pillai (2015)

. Here we list a few of them. First, the pseudo-score provides an unbiased estimation equation, i.e.,

 E(S(θ;Y))=E(∇log(ℓg(θ;X,Y)))=0.

Second, the covariance matrix of the pseudo-score and the expected negative Hessian of the log pseudo likelihood can be calculated. However, the consistency of parameters estimated by pseudo-likelihood in the case of Gaussian process has not been theoretically justified to the best of our knowledge.

If we use stochastic Kriging, the corresponding (misspecified) log likelihood function is, up to an additive constant,

 ℓnug(θ,μ;X,Y)=−12log(|Ψ(X,X)+μIn|)−12YT(Ψ(X,X)+μIn)−1Y, (20)

The maximum likelihood estimator of is defined by

 (^θ2,^μ)=argsupθℓnug(θ,μ;X,Y). (21)

Note that the log likelihood function (20) is the log likelihood function for a Gaussian process with only extrinsic noise. Thus it is misspecified, and the estimated parameters may also be misspecified. However, it has been shown by Ying (1991) and Zhang (2004) that the Gaussian process model parameters in the covariance functions may not have consistent estimators. Therefore, using Gaussian process models for prediction may be more meaningful than for parameter estimation. In fact, the parameter estimates do not significantly influence our theoretical results on the MSPE of KALE, KALEN and stochastic Kriging, in the sense of the following theorem.

###### Theorem 3.

Suppose for some constant , for all . Let and be the correlation functions with estimated parameters and as in (19) and (21), respectively. Potential dependency of , and on is suppressed for notational simplicity. Assume the following.

(1) There exists a constant such that for all

 max⎧⎨⎩∥∥ ∥∥F(Ψ)F(^Ψ1)∥∥ ∥∥L∞,∥∥ ∥∥F(Ψ)F(^Ψ2)∥∥ ∥∥L∞⎫⎬⎭⩽A1. (22)

(2) Assumption 1 is true for all . Furthermore, assume the second order derivatives of and have a uniform upper bound for all .

(3) Assumption 2 is true for designs .

Then the following statements are true.

(i) Suppose there is noise at an untried point . Then the MSPEs of KALEN and stochastic Kriging have the limit when the fill distance of goes to zero, where is defined in (11).

(ii) Suppose there is no noise at an untried point . An asymptotic upper bound on the MSPE of KALE and stochastic Kriging is

 2σ2(2π)d∫Rd|1−|b(t)|2|F(Ψ)(t)dt,

where is the characteristic function of .

Theorem 3 states that if the pseudo-likelihood and the mis-specified likelihood can provide reasonable estimated parameters, then we have the following: (1) If an untried point has noise, the limit of the MSPE of KALEN and stochastic Kriging remains the same; and (2) If an untried point has no noise, the upper bounds on the MSPE of KALE and stochastic Kriging can be obtained. The limit and upper bounds are small if the noise is small. These upper bounds are the same as the bounds in Theorem 2. Therefore, the parameter estimation does not significantly influence our theoretical analysis.

The computation complexity of (21) is about the same as that of (19), if (5) can be calculated analytically. Unfortunately, (5) usually does not have a closed form, which substantially increases the computation time of solving (19).

## 5 Numeric Results

In this section, we report some simulation studies to investigate the numeric performance of KALE, KALEN and stochastic Kriging. In Example 1, we use Gaussian correlation functions to fit a 1-d function, where the predictor (6) has analytic form. In Example 2, we use Matérn correlation functions to fit a 2-d function, where the integrals in (4) and (5) need to be calculated by Monte-Carlo Cressie and Kornak (2003).

In both examples, we also include comparisons to a Markov chain Monte Carlo (MCMC) method along the lines described in

Cervone and Pillai (2015). The MCMC methods provide an alternative way to predict Gaussian process regression with input location error as well as parameter estimation. Recall that and are responses on an untried point with and without intrinsic error, respectively. The MSPE-optimal predictors of and are given by

 ^fB(x) =∫rB(x,X+ϵB)Ψ(X+ϵB,X+ϵB)−1Yπ(θ,ϵB|Y)dϵBdθ, ^yB(x) =∫rB(x+ϵ,X+ϵB)Ψ(X+ϵB,X+ϵB)−1Yπ(θ,ϵB|Y)dϵBdθ, (23)

respectively, where is the noise vector, for or , , are parameters, and is the conditional distribution of given . The conditional distribution does not have a closed form, and needs to be calculated by MCMC. By Bayes rule, . The conditional distribution is normal, and is the conditional distribution of noise, given parameter . In practice, it is often assumed that is another normal distribution, and

is a uniform distribution over a region

Cervone and Pillai (2015).

### 5.1 Example 1

Suppose the underlying function is , Higdon (2002). The design points are selected to be 161 evenly spaced points on . The intrinsic noise is chosen to be mean zero normal distributed with the variances , for . We use a Gaussian correlation function to make predictions, and use the pseudo likelihood approach presented in Section 4 to estimate the unknown parameters. For each variance of intrinsic noise, we approximate the squared error by , where the ’s are 8001 evenly spaced points on . Then we run 100 simulations and take the average of to estimate . We estimate by a similar approach. Recall that and are related to KALE and KALEN, respectively. With an abuse of terminology, we still call and MSPE.

In order to make a comprehensive comparison, we also include the results from the Markov chain Monte Carlo method. After 1000 burn-in runs, we run 40 iterations for prediction, i.e., calculating and in (5). The prior we choose is , , and . The final results are not sensitive to the choices of priors.

The RMSPE results, which is the square root of MSPE, for KALE/KALEN, stochatic Kriging, and MCMC, are shown in Table 1/Table 2, respectively.

It can be seen from Tables 1 and 2 that the RMSPE of KALE/KALEN and stochastic Kriging decreases as the variance of the intrinsic noise decreases. This corroborates the results in Theorem 2 and Proposition 2. The difference of RMSPE between KALE/KALEN and stochastic Kriging also decreases when the variance of the intrinsic noise decreases. Comparing Table 2 with Table 1, it can be seen that the RMSPE of KALEN is larger than that of KALE. This is reasonable because KALEN predicts , which includes an error term while does not. The computation of KALE/KALEN has the same complexity as the stochastic Kriging in this example, because a Gaussian correlation function is used, and the integrals in (5) and (7) can be calculated analytically. In all cases, the RMSPE of the direct MCMC approach is larger than KALE/KALEN and stochastic Kriging.

In order to further understand the performance of KALE and stochastic Kriging, one realization among the 100 simulations for Table 1 is illustrated in Figure 1, where the variance of the intrinsic noise is chosen to be 0.05. In Figure 1, the circles are the collected data points. The true function, the prediction curves of KALE and stochastic Kriging are denoted by solid line, dashed line and dotted line, respectively. It can be seen from the figure that both KALE and stochastic Kriging approximate the true function well.

### 5.2 Example 2

In this example, we compare the calculation time of stochastic Kriging and KALE, where the predictor (6) of KALE does not have an analytic form. Suppose the underlying function is for Lim et al. (2002). We use the Matérn correlation functions Stein (1999)

 Φ(x;ν,ϕ)=1Γ(ν)2ν−1(2√νϕ∥x∥2)νKν(2√νϕ∥x∥2), (24)

to make predictions, where is the modified Bessel function of the second kind; and and are model parameters. The Matérn correlation function can control the smoothness of the predictor by and thus is more robust than a Gaussian correlation function Wang et al. (2019). The covariance function is chosen to be . We use maximin Latin hypercube design with 20 points to estimate parameters, and choose the first 100 points in the Halton sequence Halton (1964) as test points. The smoothness parameter is chosen to be , which can provide a robust estimator of .

If we use a Matérn correlation function, the integrals in (4) and (5) do not have analytic forms and are calculated by Monte-Carlo. We randomly choose 30 points to approximate the integral in (4), and 900 points to approximate the integral in (5). Preliminary results show that, if we use Monte-Carlo with different points every time in the evaluation of the integrals in (4) and (5), it is not possible to use maximum pseudo likelihood estimation to estimate the unknown parameters, consisting of in (24), , the variance of noise and the mean . The reason is that at each step of the optimization in maximum pseudo likelihood estimation, we need to calculate the integral, whose computational cost is high. Therefore, we generate 900 points and 30 points randomly one time and use these 900 points and 30 points for evaluations of (5) and (4), respectively. Then we use maximum pseudo likelihood to estimate the unknown parameters.

For stochastic Kriging, we use maximum likelihood to estimate the unknown parameters, which are in (24), , the nugget parameter and the mean . For MCMC, we use 1000 burn-in runs, and 40 runs for calculating the predictor. The prior we choose is , , , . The RMSPE and the processing time of KALE and stochastic Kriging are shown in Table 3.

It can be seen that KALE has some improvement on prediction accuracy over stochastic Kriging. However, KALE takes too much computation time, even though the numbers of design points and test points are relatively small. The comparison would get worse as the number of points became larger. For MCMC, although it may improve the prediction, it is very sensitive to the initial choices of priors. The multimodality discussed in Cervone and Pillai (2015) could be a potential reason. The processing time for the MCMC approach is much larger than stochastic Kriging, but smaller than KALE. Since our main focus is on the comparison between KALE and stochastic Kriging, we do not further discuss the numeric results of MCMC. Therefore, if the integrals in (4) and (5) do not have analytic forms, stochastic Kriging is preferred, especially when the sample size is large and the variance of intrinsic noise is small.

## 6 Case Study: Application in Composite Parts Assembly Process

To illustrate the performance of KALEN and stochastic Kriging, we apply them to a real case study, the composite parts assembly process. As shown in Figure 2 (a) and Figure 2 (b), ten adjustable actuators are installed at the edge of a composite part Yue et al. (2018); Wen et al. (2018). These actuators can provide push or pull forces in order to adjust the shape of the composite part to the target dimensions. The dimensional shape adjustment of composite parts is one of the most important steps in the aircraft assembly process. It reduces the gap between the composite parts and decreases the assembly time with improved dimensional quality. Detailed descriptions about the shape adjustment of composite parts can be found in Wen et al. (2018). Modeling of composite parts is the key for shape adjustment. The objective is to build a model that has the capability to predict the dimensional deviations accurately under specific actuators’ forces. In this model, the input variables are ten actuators’ forces. The responses are the dimensional deviations of multiple critical points along the edge plane near the actuators, shown in Figure 2 (c). We consider responses at 91 critical points around the composite edge in the case study.

In the shape control of composite parts, intrinsic noise commonly exists in the actuators’ forces Yue et al. (2018). When a force is implemented by an actuator, the real force may not be exactly same as the target force. The magnitudes of forces may have uncertainties naturally due to the device tolerances of the hydraulic or electromechanical system of actuators. Uncertainties in the directions and application points of forces come from the deviations of contact geometry of actuators and their installations. For the modeling of composite parts, there are two steps: (i) training the parameters using experimental data; (ii) predicting dimensional deviations for new actuators’ forces. In the training step, we need to consider input error in the experimental data. Additionally, when new actuator forces are implemented in practice, the uncertainty in the actual delivered forces inevitably exists. This suggests that KALEN is suitable for this application scenario. We will show the performance of KALEN and compare it with stochastic Kriging as follows.

The model we use in this case study is for , where is the dimensional deviation vector of the composite part at the critical point and is a mean zero Gaussian random field, with variables in . The correlation function of is assumed to be , where are parameters. The variance of is denoted by . The parameters , , , and are estimated by maximum (pseudo-)likelihood estimation as described in Section 4. The mean function we use in this model is to represent the linear component in dimensional shape control of composite fuselage, which follows the approach in Yue et al. (2018). Specifically, according to the mechanics of composite material and classical lamination theory, there is a linear relationship between dimensional deviations and actuators’ forces within the elastic zone. The term describes how the actuators’ forces impact the part deviations linearly, and represents the nonlinear components so as to get accurate predictions.

For the computer experiments, we generated 50 training samples and 30 testing samples based on a maximin Latin hypercube design. The designed experiments are conducted in the finite element simulation platform developed by Wen et al. (2018)

. It is worth mentioning that the computer simulation here is not a deterministic simulation. The intrinsic noise is added to the actuators’ forces to mimic real actuators. The standard deviations (SD) of actuators’ forces are chosen to be 0.005, 0.01, 0.02, 0.03, and 0.04 lbf (lbf is a unit of pound-force), which is determined by the tolerance of different kinds of actuators according to engineering domain knowledge. The maximum actuators’ force is set to 600 lbf. After we have the computer experiment data, we can estimate the parameters of KALEN by solving the pseudo-likelihood equation (

19), and the parameters of stochastic Kriging by solving the maximum likelihood equation (21). Then, we can use the model to predict dimensional deviations at the untried points in the testing dataset.

The performance of KALEN and stochastic Kriging are compared in terms of mean absolute error (MAE). This is an index that has been commonly used in the composite parts assembly domain to evaluate the modeling performance. We also compare RMSPE of KALEN and stochastic Kriging. The MAE and RMSPE are approximated by averaging the error on the 91 points and multiple samples.

The MAE and RMSPE of KALEN and stochastic Kriging are summarized in Table 4. As the SD of actuators’ forces changes from 0.04 lbf to 0.005 lbf, the MAE and RMSPE of KALEN and stochastic Kriging also decrease. This result is consistent with the conclusions in Theorem 2 and Proposition 2. The MAE and RMSPE of KALEN are slightly smaller than the MAE and RMSPE of stochastic Kriging. Generally speaking, their performances are comparable, especially when the SD of actuators’ forces is small. The main reason is that, when the uncertainty in the input variables is small, stochastic Kriging can approximate the best linear unbiased predictor KALEN very well. Since a Gaussian correlation function is used, the computational complexity of KALEN and stochastic Kriging are the same. In summary, if high-quality actuators are used and the intrinsic noise in the actuators is therefore small, then both KALEN and stochastic Kriging can realize very good prediction performance. When the intrinsic noise in the actuators’ forces becomes larger, KALEN outperforms stochastic Kriging.

## 7 Conclusions and Discussion

We first summarize our contributions in this work. We have investigated three predictors, KALE, KALEN and stochastic Kriging, as applied to Gaussian processes with input location error. When predicting the mean Gaussian process output at an untried point with intrinsic noise, we prove that the limits of MSPE of KALEN and stochastic Kriging are the same as the fill distance of the design points goes to zero. If there is no noise at an untried point, we provide an upper bound on the MSPE of KALE and stochastic Kriging. The upper bound is close to zero if the noise is small, which implies the MSPE of KALE and stochastic Kriging are close. We also provide an asymptotic upper bound on the MSPE of KALE/KALEN and stochastic Kriging with estimated parameters. These results indicate that if the number of data points is large or the variance of the intrinsic noise is small, then there is not much difference between KALE/KALEN and stochastic Kriging in terms of prediction accuracy. The numeric results corroborate our theory. A case study is presented to illustrate the performance of KALEN and stochastic Kriging for modeling in the composite parts assembly process. In this paper, the MSPE of KALE, KALEN, and stochastic Kriging are primarily considered asymptotically. The theory does not cover the results under non-asymptotic cases. It can be expected that the difference between the MSPE of KALE/KALEN and stochastic Kriging will decrease as the fill distance decreases.

The calculation of the predictor (6) is not efficient if the integrals in (4) and (5) do not have an analytic form. If the sample size is large, then using pseudo maximum likelihood to estimate the unknown parameters is challenging, especially when the integrals in (4) and (5) do not have analytic forms. In this case, using stochastic Kriging as an alternative would be more desirable.

## Appendix A A Lemma about MSPE of Stochastic Kriging

###### Lemma 2.

Assume Assumptions 1 and 2 are true. For any fixed constant , converges to zero as the fill distance of goes to zero, where and are as in Theorem 2.

###### Proof.

Note . Define . Under Assumption 1, we have , where

is the Sobolev space. By the interpolation inequality,

. By Corollary 10.25 in Wendland (2004) and the fact that , it can be shown that . Thus, the result follows if we can show converges to zero. By the representer theorem, is the solution to the optimization problem

 ming1∈NΨ(Ω)1nn∑j=1(g1(xj)−Ψ(x,xj))2+μn∥g1∥2NΨ(Ω), (25)

where is the norm of the reproducing kernel Hilbert space . Under Assumption 2, by Lemma 3.4 of Utreras (1988), the result follows from

 ∥g∥2L2⩽ C3(1nn∑j=1(^g1(xj)−Ψ(x,xj))2+h4X∥g∥2H2(Ω)) ⩽ C3(1nn∑j=1(^g1(xj)−Ψ(x,xj))2+μn∥g1∥2NΨ(Ω)+h4X∥g∥2H2(Ω)) ⩽ C3(1nn∑j=1(Ψ(x,xj)−Ψ(x,xj))2+μn∥Ψ(x,⋅)∥2NΨ(Ω)+h4X∥g∥2H2(Ω))→0,

where the last inequality is true because is the solution to (25). ∎

## Appendix B Proof of Lemma 1

By Fourier transform Wendland (2004), we have

 Ψ(xj,xk)=1(2π)d∫Rdei⟨xj−xk,t⟩F(Ψ)(t)dt, (26)

where is the inner product in . Therefore, direct calculation leads to

 ΨS(xj,xk) =∫Rd∫Rd1(2π)d∫Rdei⟨xj+ϵ1−(xk+ϵ2),t⟩F(Ψ)(t)h(ϵ1)h(ϵ2)dtdϵ1dϵ1 =1(2π)d∫Rd(∫Rd∫Rdei⟨xj+ϵ1−(xk+ϵ2),t⟩h(ϵ1)h(ϵ2)dϵ1dϵ2)F(Ψ)(t)dt =1(2π)d∫Rdei⟨xj−xk,t⟩(∫Rdei⟨ϵ1,t⟩∫Rdei⟨−ϵ2,t⟩h(ϵ1)h(ϵ2)dϵ1dϵ1)F(Ψ)(t)dt =1(2π)d∫Rdei⟨xj−xk,t⟩(∫Rdei⟨−ϵ1,t⟩h(ϵ1)dϵ1)(∫