1 Introduction
In scientific and engineering studies, mathematical models are developed by scientists and engineers based on their expert knowledge to reproduce the physical reality. With the rapid development of the computational technique in recent years, many mathematical models are implemented in computer code, often referred as computer models or simulators.
Some parameters of the mathematical model are often unknown or unobservable in experiments. For example, the Kīlauea volcano recently has one of the biggest eruptions in history. The location and volume of the magma chamber, as well as the magma supply and storage rate of this volcano, however, is unobservable. Some field data, such as the satellite interferograms for the ground deformation and gas emission data were used to estimate these parameters for the Kīlauea volcano [1, 2, 11]. Using the field data to estimate the parameters in the mathematical model, and to identify the possible discrepancy between the mathematical model and the reality is widely known as the inverse problem or calibration [3, 4, 15, 27, 32].
Assume the field or experimental data at any the observable input can be represented as a noisy assessment of the unknown reality
(1.1) 
where is the unknown deterministic function of the reality and
is a Gaussian noise with an unknown variance
. Denote the mathematical model at the observable inputs and the calibration parameters . Since the mathematical model is often imperfect to describe the physical reality, one often assumes , where is a discrepancy function between the reality and mathematical model. With this specification, we have the following calibration model for an imperfect mathematical model(1.2) 
When the mean and trend of the reality is properly explained in the mathematical model, the discrepancy function is often modeled as a zeromean Gaussian stochastic process (GaSP) [15], resulting in a closedform expression of the predictive distribution of the reality. It was found in many followingup studies that, however, modeling the discrepancy function by a GaSP causes an identifiability problem of the calibration parameters and consequently, the calibrated mathematical model is far away from the reality (see, for example, [3, 26, 31]).
The calibration model (1.2) is also often used in modeling spatially correlated data, where
is often assumed to be a parametric model for the linear fixed effect of some covariates, and
is a spatial random effect at the spatial coordinate . However, it is also found in many studies that the spatial random effect is confounded with the fixed effect [14, 22].Many recent studies measure the goodness of calibration by the loss between the calibrated mathematical model and reality [26, 27, 31]. These studies seek to find an estimator of that converges to , which minimizes the distance between the reality and mathematical model, i.e.,
(1.3) 
In [26], for instance, the reality is first estimated through a nonparametric regression model without the assistance of the mathematical model. The calibration parameters are then estimated by minimizing the loss between the calibrated mathematical model and the estimator of the reality.
On the other hand, integrating the mathematical model and discrepancy function in (1.2) for prediction is found to have a smaller predictive error than simply using a GaSP or a mathematical model alone [15]. This is because the mathematical model is developed based on the expert knowledge, which contains the information of the complex reality, and the GaSP is a flexible model for the discrepancy not captured by the calibrated mathematical model. The prediction by combining the mathematical model and discrepancy function, however, is sometimes less interpretable by scientists, as the discrepancy function can introduce some nonlinear effect that is hard to be explained. Because of the interpretation reason, one also hopes the calibrated mathematical model can predict the reality well, and this partially explains why some previous studies define as the optimum of the calibration parameters.
To prevent the calibrated mathematical model deviating too much from the reality, we propose to model the discrepancy function by a new stochastic process, called the scaled Gaussian stochastic process (SGaSP). The SGaSP is first proposed in [11], where the SGaSP is shown to be a GaSP with a transformed kernel and the computational complexity is the same as the GaSP. In this work, we provide a theoretical framework of the SGaSP for calibration and prediction. Moreover, we also establish the connection between the maximum likelihood estimator and the kernel ridge regression estimator in the calibration setting. This connection allows us to study the asymptotic properties of the maximum likelihood estimator using some welldeveloped tools for the kernel ridge regression.
We highlight several advantages of using the SGaSP to model the discrepancy function in the calibration model (1.2). First of all, we show that the predictive mean from the SGaSP converges to the reality at the same rate as the one from the GaSP with the suitable choice of the regularization and scaling parameters. Furthermore, with the same regularization and scaling parameters, the calibration parameters in the SGaSP can also converge to , which minimizes the distance between the reality and calibrated mathematical model. The estimated calibration parameters in the GaSP calibration model, in contrast, do not converge to , and the predictive error by the calibrated mathematical model is large as a consequence. Thirdly, since the SGaSP can be shown as a GaSP with a transformed kernel, the predictive distribution also has an explicit form that is easy to compute. This allows us to combine the mathematical model and discrepancy function in a coherent statistical model to both predict the reality and calibrate the mathematical model. Lastly, we found that the outofsample predictive error by combining the calibrated mathematical model and the discrepancy function via the SGaSP is also slightly smaller than that using a GaSP in our numerical experiments, as the mathematical model is estimated closer to the reality in the SGaSP.
This paper is organized as follows. In Section 2, we review the background of the GaSP and the reproducing kernel Hilbert space. The connection between the maximum likelihood estimator and the kernel ridge regression estimator is also studied. In Section 3, we introduce the SGaSP along with the orthogonal series representation of the process and the covariance function. Two convergence issues are discussed in Section 4. We first show that the predictive mean of the reality in the SGaSP calibration converges to the reality with the optimal rate in Section 4.1. Then we show the estimated calibration parameters also converge to in the SGaSP calibration model in Section 4.2. In Section 5, we introduce the discretized SGaSP along with the parameter estimation. Section 6 provides some numerical studies comparing the GaSP, the SGaSP, and two other estimation approaches. We conclude this work in Section 7 and the proof is given in Appendices.
2 Background: Gaussian stochastic process
We first review the Gaussian stochastic process and the reproducing kernel Hilbert space in this section. Assume the mean and trend of the reality are properly modeled in the mathematical model. Consider to model the unknown discrepancy function in the calibration model (1.2) via a realvalued zeromean Gaussian stochastic process on a dimensional input domain ,
(2.1) 
where is a variance parameter and is the correlation for any , parameterized by a kernel function. For simplicity, we assume in this work.
For any , the outputs
follow a multivariate normal distribution
(2.2) 
where the entry of is . Some frequently used kernel functions include the power exponential kernel and the Matérn kernel. We defer the issue of estimating the parameters in the kernel function in Section 5 and assume is known for now.
The reproducing kernel Hilbert space (RKHS), denoted as , attached to the Gaussian stochastic process , is the completion of the space of all functions
with the inner product
For any function , denote the RKHS norm or the native norm. Because the evaluation maps in RKHS are bounded linear, it follows from the Riesz representation theorem that for each and , one has .
Denote the space of squareintegrable functions with . We denote the usual inner product in
. By the Mercer’s theorem, there exists an orthonormal sequence of continuous eigenfunctions
with a sequence of nonincreasing and nonnegative eigenvalues
such that(2.3) 
for any .
The RKHS contains all functions with and . For any and , the inner product can be represented as . For more properties of the RKHS, we refer to Chapter 1 of [30] and Chapter 11 of [7].
2.1 The equivalence between the maximum likelihood estimator and the kernel ridge regression estimator in calibration
Assume one has a set of observations and mathematical model outputs , where
is a qdimensional vector of the calibration parameters.
Denote the regularization parameter . For the calibration model (1.2) with modeled as a GaSP in (2.1), the marginal distribution of follows a multivariate normal after marginalizing out
(2.4) 
Let be the likelihood for in (2.4) given the other parameters in the model. For any given , the maximum likelihood estimator (MLE) of is denoted as
(2.5) 
Conditioning on the observations, and , the predictive mean of the discrepancy function at any has the following expression
(2.6) 
with and being the
dimensional identity matrix.
It is wellknown that the predictive mean in (2.6) can be written as the estimator for the kernel ridge regression (KRR). In the following lemma, we show that is equivalent to the KRR estimator.
Lemma 2.1.
Although modeling the discrepancy function by the GaSP typically improves the prediction accuracy of the reality, the penalty term of (2.7) only contains to control the complexity of the discrepancy. As the RKHS norm is not equivalent to the norm, the calibrated computer model could deviate a lot from the best performed mathematical model in terms of the loss [26]. In Section 3, we introduce the scaled Gaussian stochastic process that predicts the reality as accurately as the GaSP with the aid of the mathematical model, but has more prior mass on the small distance between the reality and mathematical model. As a consequence, the KRR estimator of the new model penalizes both and simultaneously.
3 The scaled Gaussian stochastic process
The loss between the reality and mathematical model can be expressed as a random loss function of the discrepancy function
. We hierarchically model the crucial random variable
to have more probability mass near zero. The scaled Gaussian stochastic process calibration model is defined as follows
(3.1) 
We call the scaled Gaussian stochastic process (SGaSP). Given , the SGaSP becomes the GaSP constrained at the space .
By definition, . Conditioning on all parameters in the model, a simple choice of is
(3.2) 
where is a nonincreasing scaling function and is the density of at induced by the GaSP in (2.1). The measure for induced by the SGaSP is proportional to the measure of induced by the GaSP scaled by a nonincreasing scaling function to have more probability mass near zero than the unconstrained GaSP does.
When is a constant function, the SGaSP becomes the GaSP without the constraint. Following [11], conditioning on all parameters, we assume
(3.3) 
where is a nonnegative scaling parameter.
The benefit of using in (3.2) and in (3.3) is that any marginal distribution of still follows a multivariate normal distribution (see Lemma 2.3 in [11]). In Section 3.1, we explore the orthogonal series representation of the SGaSP.
3.1 Orthogonal series representation
Based on KarhunenLoève theorem, a GaSP admits the following representation for any
(3.4) 
where , and are the th eigenvalue and eigenfunction of the kernel , respectively.
The following lemma shows that the SGaSP can also be represented as an orthogonal random series with independent normal coefficients.
Lemma 3.1 (KarhunenLoève expansion for the SGaSP).
The covariance function of the SGaSP can also be decomposed as an infinite orthogonal series, which is an immediate consequence of the fact that the SGaSP is indeed a GaSP with a transformed kernel (see Lemma 2.3 in [11]) and Lemma 3.1.
Corollary 3.1.
Corollary 3.1 implies that the th eigenvalue of the kernel function in the SGaSP is and the th eigenfunction is the same as the one in the GaSP. The form (3.5) does not give an explicit expression for the kernel in the SGaSP. Instead of truncating the series, one can discretize the integral at finitely many constraint points. This discretizing procedure leads to an explicit expression of the covariance matrix in the likelihood, discussed in Section 5.
It is shown in [11] that the random loss induced by the GaSP can be written as an infinite weighted sum of independent chisquared random variables. The following corollary provides a similar decomposition of in the SGaSP, which follows from Lemma 2.1 in [11] and Corollary 3.1.
Corollary 3.2.
Assume the same conditions in Lemma 3.1 hold. The distribution of induced by the SGaSP follows
where
are independent chisquared random variables with one degree of freedom.
Denote and the RKHS attached to GaSP with kernel and the SGaSP with kernel , respectively. We conclude this subsection by the explicit connection between the inner product of the GaSP and that of the SGaSP.
3.2 Estimation for the SGaSP
With the specification of the density and scaling function for the distribution of in (3.2) and (3.3), respectively, after marginalizing out in (3.1), the marginal likelihood for in the SGaSP follows a multivariate normal distribution
(3.6) 
Denote the likelihood for in (3.6). Similarly, we have the equivalence between the MLE and the KRR estimator for the SGaSP calibration, where both the RKHS norm and norm of the discrepancy function are penalized in the loss function, stated in Lemma 3.3 below.
Lemma 3.3.
The maximum likelihood estimator and predictive mean are the same as the estimator of the penalized kernel ridge regression
where
(3.7) 
with .
4 Convergence properties
4.1 Nonparametric regression by the SGaSP
Let us first consider the following nonparametric regression model,
(4.1) 
where is assumed to follow the zeromean SGaSP prior with the default choice of and in (3.2) and (3.3), respectively. For simplicity, we assume that are independently sampled from .
Assume the underlying truth resides in the dimensional Sobolev space with , where
(4.2) 
with being a sequence of the orthonormal basis of . For any integer vector and a function , denote the mixed partial derivative operator with . For any function in , we have for any .
Recall that . The posterior mean estimator of the SGaSP is equivalent to the KRR estimator as follows
(4.3) 
Recall and are the sequence of the eigenvalues and eigenfunctions of the reproducing kernel associated with , respectively. For all , we assume the eigenvalues satisfy
(4.4) 
for some constants and . For all and , we assume the eigenfunctions are bounded uniformly,
(4.5) 
where is a constant depending on the kernel .
As a motivating example, the Matérn kernel satisfies the decay rate of the eigenvalues in (4.4) when expanded with respect to the Fourier basis [35]. The exact form of the Matérn kernel along with the parameterization and parameter estimation is discussed in Section 5.1.
We are now ready to state the convergence rate of the SGaSP for the nonparametric regression model in (4.1).
Theorem 4.1.
The conditions in Theorem 4.1 can be relaxed in various ways. From the proof of Theorem 4.1, it is easy to see that if and , the estimator still converges to the reality in distance with the same rate . In practice, a small often leads to a small predictive error [30]. The estimation of and the parameters in the kernel function are discussed in Section 5.1. Furthermore, although the stationarity of the process is often assumed for the computational purpose, it is not required in Theorem 4.1.
The estimator for the reality in the SGaSP calibration model is defined as follows
for any , where is the estimator of the penalized KRR obtained by minimizing the loss in (3.7). The following Corollary 4.1 gives the convergence rate of the SGaSP calibration model in predicting the reality.
Corollary 4.1.
The conditions can be relaxed by choosing and to obtain the same convergence rate.
We illustrate the convergence using the following example, where lies in the Sobolev space with and [34].
Example 4.1.
Let the reality be , and consider , where is a Gaussian noise. Let the mathematical model be a mean parameter, i.e. . The goal is to predict at and estimate .
Assume follows the Matérn kernel in (5.5) with the range parameter . The eigenvalues of this kernel satisfy (4.4) with . We test 50 configurations with the number of observations , and the design points are equally spaced in . In each configuration, simulation replicates are implemented. We perform predictions on inputs equally spaced from using the average root of the mean squared error (AvgRMSE) to validate as follows
(4.6) 
where is an estimator of the reality at for . The subscript means both the calibrated mathematical model and discrepancy function are used for prediction.
Figure 1 presents the predictive error of the GaSP calibration and the SGaSP calibration for Example 4.1. In the left panel, the red circles are the using the predictive mean of the calibration model in (1.2) with modeled as a GaSP in (2.1), and the blue circles are using the predictive mean of the discretized SGaSP calibration in (5.1) discussed in Section 5. The basic idea of the discretized SGaSP is to replace the integral in the SGaSP model in (3.1) by for the computational purpose. Interestingly, the red circles and blue circles overlap in the left panel and they both match perfectly well with the black curve, representing the theoretical bound from Corollary 4.1.
Since the mathematical model for Example 4.1 is only a mean parameter, the distance between the reality and the mathematical model is unimodal. We thus use the root of the mean squared error between the estimator of the calibration parameters and the minimizer as the measurement of how well the calibrated mathematical model predicts the reality as follows
(4.7) 
where is the estimator of in the th experiment.
Although the GaSP and the SGaSP perform equally well in prediction for Example 4.1, the estimator of the calibration parameter in the discretized SGaSP calibration converges to the minimizer, but that in the GaSP calibration does not converge to , shown in the right panel of Figure 1. This problem is caused by the difference between the RKHS norm and the norm. As illustrated in Lemma 2.1 and Lemma 3.3, both the RKHS norm and norm of the discrepancy function are penalized in the SGaSP calibration model, whereas the GaSP calibration model only penalizes the RKHS norm of the discrepancy function.
In the Section 4.2, we further show that under some regularity conditions, the calibrated parameters in the SGaSP calibration converges to the minimizer with the same choice of the regularization parameter and scaling parameter.
4.2 Towards reconciling the norm and RKHS norm in calibration
We first list some regularity conditions for the convergence of calibration parameters in the SGaSP calibration model.

is the unique solution of (1.3) and it is an interior point of .

The Hessian matrix
is invertible in a neighborhood of .

For all , it holds that

The function class is Donsker.

.
Assumptions A1 to A3 are regularity conditions of and the mathematical model around . Assumptions A4 to A6 guarantee the KRR estimator converges to uniformly for each in terms of the loss.
Theorem 4.2.
Under assumptions A1 to A6, for the estimated calibration parameters in (3.7), one has
by choosing and .
Note that and also guarantee the predictive mean estimator in the SGaSP calibration converges to the reality at the rate in terms of the loss by Corollary 4.1 and Theorem 4.2. On the contrary, the calibrated parameters of the GaSP calibration typically do not converge to the minimizer. Let . A key difference between the GaSP and the SGaSP calibration is stated in the following Corollary 4.2, which is an immediate consequence from the proof of Theorem 4.2.
Corollary 4.2.
To ensure the convergence of an estimator to the minimizer, one typical requirement is that . It is easy to see that the SGaSP satisfies this condition with . However, because of the difference between the RKHS norm and norm, the estimated parameters in the GaSP calibration model can be far away from the minimizer. As a result, the calibrated mathematical model may not predict the reality accurately in the GaSP calibration model, as found in many previous studies [3, 11, 26, 31].
Theorem 4.2 states that the convergence rate of in the SGaSP calibration (to ) is slightly slower than by choosing and . The convergence rate of the calibration parameters can be obtained by choosing a larger along with the increase of the number of observations, but the convergence rate to the reality obtained in (4.1) may decease. The slower convergence rate to in the SGaSP reflects the added uncertainty of the parameter estimation caused by the model discrepancy. A numerical example will be given in Section 6 to illustrate the difference between the GaSP, the SGaSP calibration and other approaches seeking to find the minimizer .
We graph the natural logarithm of for Example 4.1 in the GaSP and the discretized SGaSP calibration as the red circles and blue circles in the right panel of Figure 1, respectively. The estimated parameters in the discretized SGaSP calibration converges to the minimizer when the sample size goes large, whereas the GaSP calibration does not. The convergence of to the minimizer may look faster in Figure 1 compared with the theoretical upper bound shown in Theorem 4.2. This is because is chosen to be large, resulting in , in which case the SGaSP behaves similarly to some other approaches that estimate the calibration parameters by minimizing the loss for this example [26, 31].
5 Discretized scaled Gaussian stochastic process
We address the computational issue in the SGaSP calibration in this section. Instead of truncating the kernel function in (3.5) by the first several terms, following the approach in [11], one can select distinct points to discretize the input space and replace by in the SGaSP model in (3.1).
Here we let the discretization points be the observed inputs, i.e. for and . The discretized SGaSP is then defined as
Comments
There are no comments yet.