A theoretical framework of the scaled Gaussian stochastic process in prediction and calibration

07/10/2018 ∙ by Mengyang Gu, et al. ∙ Johns Hopkins University 0

The Gaussian stochastic process (GaSP) is a useful technique for predicting nonlinear outcomes. The estimated mean function in a GaSP, however, can be far from the reality in terms of the L_2 distance. This problem was widely observed in calibrating imperfect mathematical models using experimental data, when the discrepancy function is modeled as a GaSP. In this work, we study the theoretical properties of the scaled Gaussian stochastic process (S-GaSP), a new stochastic process to address the identifiability problem of the mean function in the GaSP model. The GaSP is a special case of the S-GaSP with the scaling parameter being zero. We establish the explicit connection between the GaSP and S-GaSP through the orthogonal series representation. We show the predictive mean estimator in the S-GaSP calibration model converges to the reality at the same rate as the GaSP with the suitable choice of the regularization parameter and scaling parameter. We also show the calibrated mathematical model in the S-GaSP calibration converges to the one that minimizes the L_2 loss between the reality and mathematical model with the same regularization and scaling parameters, whereas the GaSP model does not have this property. From the regularization perspective, the loss function from the S-GaSP calibration penalizes the native norm and L_2 norm of the discrepancy function simultaneously, whereas the one from the GaSP calibration only penalizes the native norm of the discrepancy function. The predictive error from the S-GaSP matches well with the theoretical bound, and numerical evidence is presented concerning the performance of the studied approaches. Both the GaSP and S-GaSP calibration models are implemented in the "RobustCalibration" R Package on CRAN.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In scientific and engineering studies, mathematical models are developed by scientists and engineers based on their expert knowledge to reproduce the physical reality. With the rapid development of the computational technique in recent years, many mathematical models are implemented in computer code, often referred as computer models or simulators.

Some parameters of the mathematical model are often unknown or unobservable in experiments. For example, the Kīlauea volcano recently has one of the biggest eruptions in history. The location and volume of the magma chamber, as well as the magma supply and storage rate of this volcano, however, is unobservable. Some field data, such as the satellite interferograms for the ground deformation and gas emission data were used to estimate these parameters for the Kīlauea volcano [1, 2, 11]. Using the field data to estimate the parameters in the mathematical model, and to identify the possible discrepancy between the mathematical model and the reality is widely known as the inverse problem or calibration [3, 4, 15, 27, 32].

Assume the field or experimental data at any the observable input can be represented as a noisy assessment of the unknown reality

(1.1)

where is the unknown deterministic function of the reality and

is a Gaussian noise with an unknown variance

. Denote the mathematical model at the observable inputs and the calibration parameters . Since the mathematical model is often imperfect to describe the physical reality, one often assumes , where is a discrepancy function between the reality and mathematical model. With this specification, we have the following calibration model for an imperfect mathematical model

(1.2)

When the mean and trend of the reality is properly explained in the mathematical model, the discrepancy function is often modeled as a zero-mean Gaussian stochastic process (GaSP) [15], resulting in a closed-form expression of the predictive distribution of the reality. It was found in many following-up studies that, however, modeling the discrepancy function by a GaSP causes an identifiability problem of the calibration parameters and consequently, the calibrated mathematical model is far away from the reality (see, for example, [3, 26, 31]).

The calibration model (1.2) is also often used in modeling spatially correlated data, where

is often assumed to be a parametric model for the linear fixed effect of some covariates, and

is a spatial random effect at the spatial coordinate . However, it is also found in many studies that the spatial random effect is confounded with the fixed effect [14, 22].

Many recent studies measure the goodness of calibration by the loss between the calibrated mathematical model and reality [26, 27, 31]. These studies seek to find an estimator of that converges to , which minimizes the distance between the reality and mathematical model, i.e.,

(1.3)

In [26], for instance, the reality is first estimated through a nonparametric regression model without the assistance of the mathematical model. The calibration parameters are then estimated by minimizing the loss between the calibrated mathematical model and the estimator of the reality.

On the other hand, integrating the mathematical model and discrepancy function in (1.2) for prediction is found to have a smaller predictive error than simply using a GaSP or a mathematical model alone [15]. This is because the mathematical model is developed based on the expert knowledge, which contains the information of the complex reality, and the GaSP is a flexible model for the discrepancy not captured by the calibrated mathematical model. The prediction by combining the mathematical model and discrepancy function, however, is sometimes less interpretable by scientists, as the discrepancy function can introduce some nonlinear effect that is hard to be explained. Because of the interpretation reason, one also hopes the calibrated mathematical model can predict the reality well, and this partially explains why some previous studies define as the optimum of the calibration parameters.

To prevent the calibrated mathematical model deviating too much from the reality, we propose to model the discrepancy function by a new stochastic process, called the scaled Gaussian stochastic process (S-GaSP). The S-GaSP is first proposed in [11], where the S-GaSP is shown to be a GaSP with a transformed kernel and the computational complexity is the same as the GaSP. In this work, we provide a theoretical framework of the S-GaSP for calibration and prediction. Moreover, we also establish the connection between the maximum likelihood estimator and the kernel ridge regression estimator in the calibration setting. This connection allows us to study the asymptotic properties of the maximum likelihood estimator using some well-developed tools for the kernel ridge regression.

We highlight several advantages of using the S-GaSP to model the discrepancy function in the calibration model (1.2). First of all, we show that the predictive mean from the S-GaSP converges to the reality at the same rate as the one from the GaSP with the suitable choice of the regularization and scaling parameters. Furthermore, with the same regularization and scaling parameters, the calibration parameters in the S-GaSP can also converge to , which minimizes the distance between the reality and calibrated mathematical model. The estimated calibration parameters in the GaSP calibration model, in contrast, do not converge to , and the predictive error by the calibrated mathematical model is large as a consequence. Thirdly, since the S-GaSP can be shown as a GaSP with a transformed kernel, the predictive distribution also has an explicit form that is easy to compute. This allows us to combine the mathematical model and discrepancy function in a coherent statistical model to both predict the reality and calibrate the mathematical model. Lastly, we found that the out-of-sample predictive error by combining the calibrated mathematical model and the discrepancy function via the S-GaSP is also slightly smaller than that using a GaSP in our numerical experiments, as the mathematical model is estimated closer to the reality in the S-GaSP.

This paper is organized as follows. In Section 2, we review the background of the GaSP and the reproducing kernel Hilbert space. The connection between the maximum likelihood estimator and the kernel ridge regression estimator is also studied. In Section 3, we introduce the S-GaSP along with the orthogonal series representation of the process and the covariance function. Two convergence issues are discussed in Section 4. We first show that the predictive mean of the reality in the S-GaSP calibration converges to the reality with the optimal rate in Section 4.1. Then we show the estimated calibration parameters also converge to in the S-GaSP calibration model in Section 4.2. In Section 5, we introduce the discretized S-GaSP along with the parameter estimation. Section 6 provides some numerical studies comparing the GaSP, the S-GaSP, and two other estimation approaches. We conclude this work in Section 7 and the proof is given in Appendices.

2 Background: Gaussian stochastic process

We first review the Gaussian stochastic process and the reproducing kernel Hilbert space in this section. Assume the mean and trend of the reality are properly modeled in the mathematical model. Consider to model the unknown discrepancy function in the calibration model (1.2) via a real-valued zero-mean Gaussian stochastic process on a -dimensional input domain ,

(2.1)

where is a variance parameter and is the correlation for any , parameterized by a kernel function. For simplicity, we assume in this work.

For any , the outputs

follow a multivariate normal distribution

(2.2)

where the entry of is . Some frequently used kernel functions include the power exponential kernel and the Matérn kernel. We defer the issue of estimating the parameters in the kernel function in Section 5 and assume is known for now.

The reproducing kernel Hilbert space (RKHS), denoted as , attached to the Gaussian stochastic process , is the completion of the space of all functions

with the inner product

For any function , denote the RKHS norm or the native norm. Because the evaluation maps in RKHS are bounded linear, it follows from the Riesz representation theorem that for each and , one has .

Denote the space of square-integrable functions with . We denote the usual inner product in

. By the Mercer’s theorem, there exists an orthonormal sequence of continuous eigenfunctions

with a sequence of non-increasing and non-negative eigenvalues

such that

(2.3)

for any .

The RKHS contains all functions with and . For any and , the inner product can be represented as . For more properties of the RKHS, we refer to Chapter 1 of [30] and Chapter 11 of [7].

2.1 The equivalence between the maximum likelihood estimator and the kernel ridge regression estimator in calibration

Assume one has a set of observations and mathematical model outputs , where

is a q-dimensional vector of the calibration parameters.

Denote the regularization parameter . For the calibration model (1.2) with modeled as a GaSP in (2.1), the marginal distribution of follows a multivariate normal after marginalizing out

(2.4)

Let be the likelihood for in (2.4) given the other parameters in the model. For any given , the maximum likelihood estimator (MLE) of is denoted as

(2.5)

Conditioning on the observations, and , the predictive mean of the discrepancy function at any has the following expression

(2.6)

with and being the

-dimensional identity matrix.

It is well-known that the predictive mean in (2.6) can be written as the estimator for the kernel ridge regression (KRR). In the following lemma, we show that is equivalent to the KRR estimator.

Lemma 2.1.

The maximum likelihood estimator defined in (2.5) and predictive mean estimator defined in (2.6) can be expressed as the estimator of the kernel ridge regression as follows

where

(2.7)

Although modeling the discrepancy function by the GaSP typically improves the prediction accuracy of the reality, the penalty term of (2.7) only contains to control the complexity of the discrepancy. As the RKHS norm is not equivalent to the norm, the calibrated computer model could deviate a lot from the best performed mathematical model in terms of the loss [26]. In Section 3, we introduce the scaled Gaussian stochastic process that predicts the reality as accurately as the GaSP with the aid of the mathematical model, but has more prior mass on the small distance between the reality and mathematical model. As a consequence, the KRR estimator of the new model penalizes both and simultaneously.

3 The scaled Gaussian stochastic process

The loss between the reality and mathematical model can be expressed as a random loss function of the discrepancy function

. We hierarchically model the crucial random variable

to have more probability mass near zero. The scaled Gaussian stochastic process calibration model is defined as follows

(3.1)

We call the scaled Gaussian stochastic process (S-GaSP). Given , the S-GaSP becomes the GaSP constrained at the space .

By definition, . Conditioning on all parameters in the model, a simple choice of is

(3.2)

where is a non-increasing scaling function and is the density of at induced by the GaSP in (2.1). The measure for induced by the S-GaSP is proportional to the measure of induced by the GaSP scaled by a non-increasing scaling function to have more probability mass near zero than the unconstrained GaSP does.

When is a constant function, the S-GaSP becomes the GaSP without the constraint. Following [11], conditioning on all parameters, we assume

(3.3)

where is a non-negative scaling parameter.

The benefit of using in (3.2) and in (3.3) is that any marginal distribution of still follows a multivariate normal distribution (see Lemma 2.3 in [11]). In Section 3.1, we explore the orthogonal series representation of the S-GaSP.

3.1 Orthogonal series representation

Based on Karhunen-Loève theorem, a GaSP admits the following representation for any

(3.4)

where , and are the th eigenvalue and eigenfunction of the kernel , respectively.

The following lemma shows that the S-GaSP can also be represented as an orthogonal random series with independent normal coefficients.

Lemma 3.1 (Karhunen-Loève expansion for the S-GaSP).

Assume and are defined in (3.2) and (3.3), respectively. For any , the S-GaSP defined in (3.1) has the following representation

where , and are the th eigenvalue and eigenfunction of the kernel , respectively.

The covariance function of the S-GaSP can also be decomposed as an infinite orthogonal series, which is an immediate consequence of the fact that the S-GaSP is indeed a GaSP with a transformed kernel (see Lemma 2.3 in [11]) and Lemma 3.1.

Corollary 3.1.

Assume and are defined in (3.2) and (3.3), respectively. The marginal distribution of the S-GaSP defined in (3.1) follows a multivariate normal distribution

where the entry of is

(3.5)

Corollary 3.1 implies that the th eigenvalue of the kernel function in the S-GaSP is and the th eigenfunction is the same as the one in the GaSP. The form (3.5) does not give an explicit expression for the kernel in the S-GaSP. Instead of truncating the series, one can discretize the integral at finitely many constraint points. This discretizing procedure leads to an explicit expression of the covariance matrix in the likelihood, discussed in Section 5.

It is shown in [11] that the random loss induced by the GaSP can be written as an infinite weighted sum of independent chi-squared random variables. The following corollary provides a similar decomposition of in the S-GaSP, which follows from Lemma 2.1 in [11] and Corollary 3.1.

Corollary 3.2.

Assume the same conditions in Lemma 3.1 hold. The distribution of induced by the S-GaSP follows

where

are independent chi-squared random variables with one degree of freedom.

Denote and the RKHS attached to GaSP with kernel and the S-GaSP with kernel , respectively. We conclude this subsection by the explicit connection between the inner product of the GaSP and that of the S-GaSP.

Lemma 3.2.

Assume and are defined in (3.2) and (3.3), respectively. Let and be the elements in . It holds that

3.2 Estimation for the S-GaSP

With the specification of the density and scaling function for the distribution of in (3.2) and (3.3), respectively, after marginalizing out in (3.1), the marginal likelihood for in the S-GaSP follows a multivariate normal distribution

(3.6)

Denote the likelihood for in (3.6). Similarly, we have the equivalence between the MLE and the KRR estimator for the S-GaSP calibration, where both the RKHS norm and norm of the discrepancy function are penalized in the loss function, stated in Lemma 3.3 below.

Lemma 3.3.

The maximum likelihood estimator and predictive mean are the same as the estimator of the penalized kernel ridge regression

where

(3.7)

with .

4 Convergence properties

4.1 Nonparametric regression by the S-GaSP

Let us first consider the following nonparametric regression model,

(4.1)

where is assumed to follow the zero-mean S-GaSP prior with the default choice of and in (3.2) and (3.3), respectively. For simplicity, we assume that are independently sampled from .

Assume the underlying truth resides in the -dimensional Sobolev space with , where

(4.2)

with being a sequence of the orthonormal basis of . For any integer vector and a function , denote the mixed partial derivative operator with . For any function in , we have for any .

Recall that . The posterior mean estimator of the S-GaSP is equivalent to the KRR estimator as follows

(4.3)

Recall and are the sequence of the eigenvalues and eigenfunctions of the reproducing kernel associated with , respectively. For all , we assume the eigenvalues satisfy

(4.4)

for some constants and . For all and , we assume the eigenfunctions are bounded uniformly,

(4.5)

where is a constant depending on the kernel .

As a motivating example, the Matérn kernel satisfies the decay rate of the eigenvalues in (4.4) when expanded with respect to the Fourier basis [35]. The exact form of the Matérn kernel along with the parameterization and parameter estimation is discussed in Section 5.1.

We are now ready to state the convergence rate of the S-GaSP for the nonparametric regression model in (4.1).

Theorem 4.1.

Assume the eigenvalues and eigenfunctions of satisfy (4.4) and (4.5), respectively. Further assume and denote . Consider the nonparametric regression model (4.1). For sufficiently large , any and , with probability at least ,

and

by choosing and , where is a constant only depending on the kernel .

The conditions in Theorem 4.1 can be relaxed in various ways. From the proof of Theorem 4.1, it is easy to see that if and , the estimator still converges to the reality in distance with the same rate . In practice, a small often leads to a small predictive error [30]. The estimation of and the parameters in the kernel function are discussed in Section 5.1. Furthermore, although the stationarity of the process is often assumed for the computational purpose, it is not required in Theorem 4.1.

The estimator for the reality in the S-GaSP calibration model is defined as follows

for any , where is the estimator of the penalized KRR obtained by minimizing the loss in (3.7). The following Corollary 4.1 gives the convergence rate of the S-GaSP calibration model in predicting the reality.

Corollary 4.1.

Assume for any and . Let the eigenvalues and eigenfunctions of satisfy (4.4) and (4.5), respectively. For sufficiently large and any and , with probability at least ,

by choosing and , where is a constant only depending on the kernel and .

The conditions can be relaxed by choosing and to obtain the same convergence rate.

We illustrate the convergence using the following example, where lies in the Sobolev space with and [34].

Example 4.1.

Let the reality be , and consider , where is a Gaussian noise. Let the mathematical model be a mean parameter, i.e. . The goal is to predict at and estimate .

Assume follows the Matérn kernel in (5.5) with the range parameter . The eigenvalues of this kernel satisfy (4.4) with . We test 50 configurations with the number of observations , and the design points are equally spaced in . In each configuration, simulation replicates are implemented. We perform predictions on inputs equally spaced from using the average root of the mean squared error (AvgRMSE) to validate as follows

(4.6)

where is an estimator of the reality at for . The subscript means both the calibrated mathematical model and discrepancy function are used for prediction.

Figure 1: Calibration and prediction by the GaSP and discretized S-GaSP calibration models for Example 4.1. In the left panel, the of the GaSP calibration and that of the discretized S-GaSP calibration are graphed as the red and blue circles, respectively; the black curve is , representing the theoretical upper bound by Corollary 4.1 (up to a constant). In the right panel, the natural logarithm of the of the GaSP calibration and that of the discretized S-GaSP calibration is graphed as the red circles and blue circles, respectively; the black line is , representing the theoretical upper bound from Theorem 4.2 (up to a constant). with , and are assumed. The red and blue circles overlap in the left panel.

Figure 1 presents the predictive error of the GaSP calibration and the S-GaSP calibration for Example 4.1. In the left panel, the red circles are the using the predictive mean of the calibration model in (1.2) with modeled as a GaSP in (2.1), and the blue circles are using the predictive mean of the discretized S-GaSP calibration in (5.1) discussed in Section 5. The basic idea of the discretized S-GaSP is to replace the integral in the S-GaSP model in (3.1) by for the computational purpose. Interestingly, the red circles and blue circles overlap in the left panel and they both match perfectly well with the black curve, representing the theoretical bound from Corollary 4.1.

Since the mathematical model for Example 4.1 is only a mean parameter, the distance between the reality and the mathematical model is uni-modal. We thus use the root of the mean squared error between the estimator of the calibration parameters and the minimizer as the measurement of how well the calibrated mathematical model predicts the reality as follows

(4.7)

where is the estimator of in the th experiment.

Although the GaSP and the S-GaSP perform equally well in prediction for Example 4.1, the estimator of the calibration parameter in the discretized S-GaSP calibration converges to the minimizer, but that in the GaSP calibration does not converge to , shown in the right panel of Figure 1. This problem is caused by the difference between the RKHS norm and the norm. As illustrated in Lemma 2.1 and Lemma 3.3, both the RKHS norm and norm of the discrepancy function are penalized in the S-GaSP calibration model, whereas the GaSP calibration model only penalizes the RKHS norm of the discrepancy function.

In the Section 4.2, we further show that under some regularity conditions, the calibrated parameters in the S-GaSP calibration converges to the minimizer with the same choice of the regularization parameter and scaling parameter.

4.2 Towards reconciling the norm and RKHS norm in calibration

We first list some regularity conditions for the convergence of calibration parameters in the S-GaSP calibration model.

  1. is the unique solution of (1.3) and it is an interior point of .

  2. The Hessian matrix

    is invertible in a neighborhood of .

  3. For all , it holds that

  4. The function class is Donsker.

  5. .

  6. The eigenvalues and eigenfunctions of satisfy (4.4) and (4.5), respectively.

Assumptions A1 to A3 are regularity conditions of and the mathematical model around . Assumptions A4 to A6 guarantee the KRR estimator converges to uniformly for each in terms of the loss.

Theorem 4.2.

Under assumptions A1 to A6, for the estimated calibration parameters in (3.7), one has

by choosing and .

Note that and also guarantee the predictive mean estimator in the S-GaSP calibration converges to the reality at the rate in terms of the loss by Corollary 4.1 and Theorem 4.2. On the contrary, the calibrated parameters of the GaSP calibration typically do not converge to the minimizer. Let . A key difference between the GaSP and the S-GaSP calibration is stated in the following Corollary 4.2, which is an immediate consequence from the proof of Theorem 4.2.

Corollary 4.2.

Under assumptions A1 to A6, the estimator for the calibration parameters in the S-GaSP calibration in (3.7) satisfies

Further assuming the mathematical model is differentiable at in (2.7), the estimator of the calibration parameters in the GaSP calibration satisfies

for any , .

To ensure the convergence of an estimator to the minimizer, one typical requirement is that . It is easy to see that the S-GaSP satisfies this condition with . However, because of the difference between the RKHS norm and norm, the estimated parameters in the GaSP calibration model can be far away from the minimizer. As a result, the calibrated mathematical model may not predict the reality accurately in the GaSP calibration model, as found in many previous studies [3, 11, 26, 31].

Theorem 4.2 states that the convergence rate of in the S-GaSP calibration (to ) is slightly slower than by choosing and . The convergence rate of the calibration parameters can be obtained by choosing a larger along with the increase of the number of observations, but the convergence rate to the reality obtained in (4.1) may decease. The slower convergence rate to in the S-GaSP reflects the added uncertainty of the parameter estimation caused by the model discrepancy. A numerical example will be given in Section 6 to illustrate the difference between the GaSP, the S-GaSP calibration and other approaches seeking to find the minimizer .

We graph the natural logarithm of for Example 4.1 in the GaSP and the discretized S-GaSP calibration as the red circles and blue circles in the right panel of Figure 1, respectively. The estimated parameters in the discretized S-GaSP calibration converges to the minimizer when the sample size goes large, whereas the GaSP calibration does not. The convergence of to the minimizer may look faster in Figure 1 compared with the theoretical upper bound shown in Theorem 4.2. This is because is chosen to be large, resulting in , in which case the S-GaSP behaves similarly to some other approaches that estimate the calibration parameters by minimizing the loss for this example [26, 31].

5 Discretized scaled Gaussian stochastic process

We address the computational issue in the S-GaSP calibration in this section. Instead of truncating the kernel function in (3.5) by the first several terms, following the approach in [11], one can select distinct points to discretize the input space and replace by in the S-GaSP model in (3.1).

Here we let the discretization points be the observed inputs, i.e. for and . The discretized S-GaSP is then defined as