In scientific and engineering studies, mathematical models are developed by scientists and engineers based on their expert knowledge to reproduce the physical reality. With the rapid development of the computational technique in recent years, many mathematical models are implemented in computer code, often referred as computer models or simulators.
Some parameters of the mathematical model are often unknown or unobservable in experiments. For example, the Kīlauea volcano recently has one of the biggest eruptions in history. The location and volume of the magma chamber, as well as the magma supply and storage rate of this volcano, however, is unobservable. Some field data, such as the satellite interferograms for the ground deformation and gas emission data were used to estimate these parameters for the Kīlauea volcano [1, 2, 11]. Using the field data to estimate the parameters in the mathematical model, and to identify the possible discrepancy between the mathematical model and the reality is widely known as the inverse problem or calibration [3, 4, 15, 27, 32].
Assume the field or experimental data at any the observable input can be represented as a noisy assessment of the unknown reality
where is the unknown deterministic function of the reality and
is a Gaussian noise with an unknown variance. Denote the mathematical model at the observable inputs and the calibration parameters . Since the mathematical model is often imperfect to describe the physical reality, one often assumes , where is a discrepancy function between the reality and mathematical model. With this specification, we have the following calibration model for an imperfect mathematical model
When the mean and trend of the reality is properly explained in the mathematical model, the discrepancy function is often modeled as a zero-mean Gaussian stochastic process (GaSP) , resulting in a closed-form expression of the predictive distribution of the reality. It was found in many following-up studies that, however, modeling the discrepancy function by a GaSP causes an identifiability problem of the calibration parameters and consequently, the calibrated mathematical model is far away from the reality (see, for example, [3, 26, 31]).
The calibration model (1.2) is also often used in modeling spatially correlated data, where
is often assumed to be a parametric model for the linear fixed effect of some covariates, andis a spatial random effect at the spatial coordinate . However, it is also found in many studies that the spatial random effect is confounded with the fixed effect [14, 22].
Many recent studies measure the goodness of calibration by the loss between the calibrated mathematical model and reality [26, 27, 31]. These studies seek to find an estimator of that converges to , which minimizes the distance between the reality and mathematical model, i.e.,
In , for instance, the reality is first estimated through a nonparametric regression model without the assistance of the mathematical model. The calibration parameters are then estimated by minimizing the loss between the calibrated mathematical model and the estimator of the reality.
On the other hand, integrating the mathematical model and discrepancy function in (1.2) for prediction is found to have a smaller predictive error than simply using a GaSP or a mathematical model alone . This is because the mathematical model is developed based on the expert knowledge, which contains the information of the complex reality, and the GaSP is a flexible model for the discrepancy not captured by the calibrated mathematical model. The prediction by combining the mathematical model and discrepancy function, however, is sometimes less interpretable by scientists, as the discrepancy function can introduce some nonlinear effect that is hard to be explained. Because of the interpretation reason, one also hopes the calibrated mathematical model can predict the reality well, and this partially explains why some previous studies define as the optimum of the calibration parameters.
To prevent the calibrated mathematical model deviating too much from the reality, we propose to model the discrepancy function by a new stochastic process, called the scaled Gaussian stochastic process (S-GaSP). The S-GaSP is first proposed in , where the S-GaSP is shown to be a GaSP with a transformed kernel and the computational complexity is the same as the GaSP. In this work, we provide a theoretical framework of the S-GaSP for calibration and prediction. Moreover, we also establish the connection between the maximum likelihood estimator and the kernel ridge regression estimator in the calibration setting. This connection allows us to study the asymptotic properties of the maximum likelihood estimator using some well-developed tools for the kernel ridge regression.
We highlight several advantages of using the S-GaSP to model the discrepancy function in the calibration model (1.2). First of all, we show that the predictive mean from the S-GaSP converges to the reality at the same rate as the one from the GaSP with the suitable choice of the regularization and scaling parameters. Furthermore, with the same regularization and scaling parameters, the calibration parameters in the S-GaSP can also converge to , which minimizes the distance between the reality and calibrated mathematical model. The estimated calibration parameters in the GaSP calibration model, in contrast, do not converge to , and the predictive error by the calibrated mathematical model is large as a consequence. Thirdly, since the S-GaSP can be shown as a GaSP with a transformed kernel, the predictive distribution also has an explicit form that is easy to compute. This allows us to combine the mathematical model and discrepancy function in a coherent statistical model to both predict the reality and calibrate the mathematical model. Lastly, we found that the out-of-sample predictive error by combining the calibrated mathematical model and the discrepancy function via the S-GaSP is also slightly smaller than that using a GaSP in our numerical experiments, as the mathematical model is estimated closer to the reality in the S-GaSP.
This paper is organized as follows. In Section 2, we review the background of the GaSP and the reproducing kernel Hilbert space. The connection between the maximum likelihood estimator and the kernel ridge regression estimator is also studied. In Section 3, we introduce the S-GaSP along with the orthogonal series representation of the process and the covariance function. Two convergence issues are discussed in Section 4. We first show that the predictive mean of the reality in the S-GaSP calibration converges to the reality with the optimal rate in Section 4.1. Then we show the estimated calibration parameters also converge to in the S-GaSP calibration model in Section 4.2. In Section 5, we introduce the discretized S-GaSP along with the parameter estimation. Section 6 provides some numerical studies comparing the GaSP, the S-GaSP, and two other estimation approaches. We conclude this work in Section 7 and the proof is given in Appendices.
2 Background: Gaussian stochastic process
We first review the Gaussian stochastic process and the reproducing kernel Hilbert space in this section. Assume the mean and trend of the reality are properly modeled in the mathematical model. Consider to model the unknown discrepancy function in the calibration model (1.2) via a real-valued zero-mean Gaussian stochastic process on a -dimensional input domain ,
where is a variance parameter and is the correlation for any , parameterized by a kernel function. For simplicity, we assume in this work.
For any , the outputs
follow a multivariate normal distribution
where the entry of is . Some frequently used kernel functions include the power exponential kernel and the Matérn kernel. We defer the issue of estimating the parameters in the kernel function in Section 5 and assume is known for now.
The reproducing kernel Hilbert space (RKHS), denoted as , attached to the Gaussian stochastic process , is the completion of the space of all functions
with the inner product
For any function , denote the RKHS norm or the native norm. Because the evaluation maps in RKHS are bounded linear, it follows from the Riesz representation theorem that for each and , one has .
Denote the space of square-integrable functions with . We denote the usual inner product in
. By the Mercer’s theorem, there exists an orthonormal sequence of continuous eigenfunctions
with a sequence of non-increasing and non-negative eigenvaluessuch that
for any .
2.1 The equivalence between the maximum likelihood estimator and the kernel ridge regression estimator in calibration
Assume one has a set of observations and mathematical model outputs , where
is a q-dimensional vector of the calibration parameters.
Let be the likelihood for in (2.4) given the other parameters in the model. For any given , the maximum likelihood estimator (MLE) of is denoted as
Conditioning on the observations, and , the predictive mean of the discrepancy function at any has the following expression
with and being the
-dimensional identity matrix.
It is well-known that the predictive mean in (2.6) can be written as the estimator for the kernel ridge regression (KRR). In the following lemma, we show that is equivalent to the KRR estimator.
Although modeling the discrepancy function by the GaSP typically improves the prediction accuracy of the reality, the penalty term of (2.7) only contains to control the complexity of the discrepancy. As the RKHS norm is not equivalent to the norm, the calibrated computer model could deviate a lot from the best performed mathematical model in terms of the loss . In Section 3, we introduce the scaled Gaussian stochastic process that predicts the reality as accurately as the GaSP with the aid of the mathematical model, but has more prior mass on the small distance between the reality and mathematical model. As a consequence, the KRR estimator of the new model penalizes both and simultaneously.
3 The scaled Gaussian stochastic process
The loss between the reality and mathematical model can be expressed as a random loss function of the discrepancy function
. We hierarchically model the crucial random variable
to have more probability mass near zero. The scaled Gaussian stochastic process calibration model is defined as follows
We call the scaled Gaussian stochastic process (S-GaSP). Given , the S-GaSP becomes the GaSP constrained at the space .
By definition, . Conditioning on all parameters in the model, a simple choice of is
where is a non-increasing scaling function and is the density of at induced by the GaSP in (2.1). The measure for induced by the S-GaSP is proportional to the measure of induced by the GaSP scaled by a non-increasing scaling function to have more probability mass near zero than the unconstrained GaSP does.
When is a constant function, the S-GaSP becomes the GaSP without the constraint. Following , conditioning on all parameters, we assume
where is a non-negative scaling parameter.
The benefit of using in (3.2) and in (3.3) is that any marginal distribution of still follows a multivariate normal distribution (see Lemma 2.3 in ). In Section 3.1, we explore the orthogonal series representation of the S-GaSP.
3.1 Orthogonal series representation
Based on Karhunen-Loève theorem, a GaSP admits the following representation for any
where , and are the th eigenvalue and eigenfunction of the kernel , respectively.
The following lemma shows that the S-GaSP can also be represented as an orthogonal random series with independent normal coefficients.
Lemma 3.1 (Karhunen-Loève expansion for the S-GaSP).
The covariance function of the S-GaSP can also be decomposed as an infinite orthogonal series, which is an immediate consequence of the fact that the S-GaSP is indeed a GaSP with a transformed kernel (see Lemma 2.3 in ) and Lemma 3.1.
Corollary 3.1 implies that the th eigenvalue of the kernel function in the S-GaSP is and the th eigenfunction is the same as the one in the GaSP. The form (3.5) does not give an explicit expression for the kernel in the S-GaSP. Instead of truncating the series, one can discretize the integral at finitely many constraint points. This discretizing procedure leads to an explicit expression of the covariance matrix in the likelihood, discussed in Section 5.
It is shown in  that the random loss induced by the GaSP can be written as an infinite weighted sum of independent chi-squared random variables. The following corollary provides a similar decomposition of in the S-GaSP, which follows from Lemma 2.1 in  and Corollary 3.1.
Denote and the RKHS attached to GaSP with kernel and the S-GaSP with kernel , respectively. We conclude this subsection by the explicit connection between the inner product of the GaSP and that of the S-GaSP.
3.2 Estimation for the S-GaSP
With the specification of the density and scaling function for the distribution of in (3.2) and (3.3), respectively, after marginalizing out in (3.1), the marginal likelihood for in the S-GaSP follows a multivariate normal distribution
Denote the likelihood for in (3.6). Similarly, we have the equivalence between the MLE and the KRR estimator for the S-GaSP calibration, where both the RKHS norm and norm of the discrepancy function are penalized in the loss function, stated in Lemma 3.3 below.
The maximum likelihood estimator and predictive mean are the same as the estimator of the penalized kernel ridge regression
4 Convergence properties
4.1 Nonparametric regression by the S-GaSP
Let us first consider the following nonparametric regression model,
Assume the underlying truth resides in the -dimensional Sobolev space with , where
with being a sequence of the orthonormal basis of . For any integer vector and a function , denote the mixed partial derivative operator with . For any function in , we have for any .
Recall that . The posterior mean estimator of the S-GaSP is equivalent to the KRR estimator as follows
Recall and are the sequence of the eigenvalues and eigenfunctions of the reproducing kernel associated with , respectively. For all , we assume the eigenvalues satisfy
for some constants and . For all and , we assume the eigenfunctions are bounded uniformly,
where is a constant depending on the kernel .
As a motivating example, the Matérn kernel satisfies the decay rate of the eigenvalues in (4.4) when expanded with respect to the Fourier basis . The exact form of the Matérn kernel along with the parameterization and parameter estimation is discussed in Section 5.1.
We are now ready to state the convergence rate of the S-GaSP for the nonparametric regression model in (4.1).
The conditions in Theorem 4.1 can be relaxed in various ways. From the proof of Theorem 4.1, it is easy to see that if and , the estimator still converges to the reality in distance with the same rate . In practice, a small often leads to a small predictive error . The estimation of and the parameters in the kernel function are discussed in Section 5.1. Furthermore, although the stationarity of the process is often assumed for the computational purpose, it is not required in Theorem 4.1.
The estimator for the reality in the S-GaSP calibration model is defined as follows
for any , where is the estimator of the penalized KRR obtained by minimizing the loss in (3.7). The following Corollary 4.1 gives the convergence rate of the S-GaSP calibration model in predicting the reality.
The conditions can be relaxed by choosing and to obtain the same convergence rate.
We illustrate the convergence using the following example, where lies in the Sobolev space with and .
Let the reality be , and consider , where is a Gaussian noise. Let the mathematical model be a mean parameter, i.e. . The goal is to predict at and estimate .
Assume follows the Matérn kernel in (5.5) with the range parameter . The eigenvalues of this kernel satisfy (4.4) with . We test 50 configurations with the number of observations , and the design points are equally spaced in . In each configuration, simulation replicates are implemented. We perform predictions on inputs equally spaced from using the average root of the mean squared error (AvgRMSE) to validate as follows
where is an estimator of the reality at for . The subscript means both the calibrated mathematical model and discrepancy function are used for prediction.
Figure 1 presents the predictive error of the GaSP calibration and the S-GaSP calibration for Example 4.1. In the left panel, the red circles are the using the predictive mean of the calibration model in (1.2) with modeled as a GaSP in (2.1), and the blue circles are using the predictive mean of the discretized S-GaSP calibration in (5.1) discussed in Section 5. The basic idea of the discretized S-GaSP is to replace the integral in the S-GaSP model in (3.1) by for the computational purpose. Interestingly, the red circles and blue circles overlap in the left panel and they both match perfectly well with the black curve, representing the theoretical bound from Corollary 4.1.
Since the mathematical model for Example 4.1 is only a mean parameter, the distance between the reality and the mathematical model is uni-modal. We thus use the root of the mean squared error between the estimator of the calibration parameters and the minimizer as the measurement of how well the calibrated mathematical model predicts the reality as follows
where is the estimator of in the th experiment.
Although the GaSP and the S-GaSP perform equally well in prediction for Example 4.1, the estimator of the calibration parameter in the discretized S-GaSP calibration converges to the minimizer, but that in the GaSP calibration does not converge to , shown in the right panel of Figure 1. This problem is caused by the difference between the RKHS norm and the norm. As illustrated in Lemma 2.1 and Lemma 3.3, both the RKHS norm and norm of the discrepancy function are penalized in the S-GaSP calibration model, whereas the GaSP calibration model only penalizes the RKHS norm of the discrepancy function.
In the Section 4.2, we further show that under some regularity conditions, the calibrated parameters in the S-GaSP calibration converges to the minimizer with the same choice of the regularization parameter and scaling parameter.
4.2 Towards reconciling the norm and RKHS norm in calibration
We first list some regularity conditions for the convergence of calibration parameters in the S-GaSP calibration model.
is the unique solution of (1.3) and it is an interior point of .
The Hessian matrix
is invertible in a neighborhood of .
For all , it holds that
The function class is Donsker.
Assumptions A1 to A3 are regularity conditions of and the mathematical model around . Assumptions A4 to A6 guarantee the KRR estimator converges to uniformly for each in terms of the loss.
Under assumptions A1 to A6, for the estimated calibration parameters in (3.7), one has
by choosing and .
Note that and also guarantee the predictive mean estimator in the S-GaSP calibration converges to the reality at the rate in terms of the loss by Corollary 4.1 and Theorem 4.2. On the contrary, the calibrated parameters of the GaSP calibration typically do not converge to the minimizer. Let . A key difference between the GaSP and the S-GaSP calibration is stated in the following Corollary 4.2, which is an immediate consequence from the proof of Theorem 4.2.
To ensure the convergence of an estimator to the minimizer, one typical requirement is that . It is easy to see that the S-GaSP satisfies this condition with . However, because of the difference between the RKHS norm and norm, the estimated parameters in the GaSP calibration model can be far away from the minimizer. As a result, the calibrated mathematical model may not predict the reality accurately in the GaSP calibration model, as found in many previous studies [3, 11, 26, 31].
Theorem 4.2 states that the convergence rate of in the S-GaSP calibration (to ) is slightly slower than by choosing and . The convergence rate of the calibration parameters can be obtained by choosing a larger along with the increase of the number of observations, but the convergence rate to the reality obtained in (4.1) may decease. The slower convergence rate to in the S-GaSP reflects the added uncertainty of the parameter estimation caused by the model discrepancy. A numerical example will be given in Section 6 to illustrate the difference between the GaSP, the S-GaSP calibration and other approaches seeking to find the minimizer .
We graph the natural logarithm of for Example 4.1 in the GaSP and the discretized S-GaSP calibration as the red circles and blue circles in the right panel of Figure 1, respectively. The estimated parameters in the discretized S-GaSP calibration converges to the minimizer when the sample size goes large, whereas the GaSP calibration does not. The convergence of to the minimizer may look faster in Figure 1 compared with the theoretical upper bound shown in Theorem 4.2. This is because is chosen to be large, resulting in , in which case the S-GaSP behaves similarly to some other approaches that estimate the calibration parameters by minimizing the loss for this example [26, 31].
5 Discretized scaled Gaussian stochastic process
We address the computational issue in the S-GaSP calibration in this section. Instead of truncating the kernel function in (3.5) by the first several terms, following the approach in , one can select distinct points to discretize the input space and replace by in the S-GaSP model in (3.1).
Here we let the discretization points be the observed inputs, i.e. for and . The discretized S-GaSP is then defined as