# Calibration of imperfect mathematical models by multiple sources of data with measurement bias

Model calibration involves using experimental or field data to estimate the unknown parameters of a mathematical model. This task is complicated by discrepancy between the model and reality, and by possible bias in the data. We consider model calibration in the presence of both model discrepancy and measurement bias using multiple sources of data. Model discrepancy is often estimated using a Gaussian stochastic process (GaSP), but it has been observed in many studies that the calibrated mathematical model can be far from the reality. Here we show that modeling the discrepancy function via a GaSP often leads to an inconsistent estimation of the calibration parameters even if one has an infinite number of repeated experiments and infinite number of observations in a fixed input domain in each experiment. We introduce the scaled Gaussian stochastic process (S-GaSP) to model the discrepancy function. Unlike the GaSP, the S-GaSP utilizes a non-increasing scaling function which assigns more probability mass on the smaller L_2 loss between the mathematical model and reality, preventing the calibrated mathematical model from deviating too much from reality. We apply our technique to the calibration of a geophysical model of Kīlauea Volcano, Hawai`i, using multiple radar satellite interferograms. We compare the use of models calibrated using multiple data sets simultaneously with results obtained using stacks (averages). We derive distributions for the maximum likelihood estimator and Bayesian inference, both implemented in the "RobustCalibration" package available on CRAN. Analysis of both simulated and real data confirm that our approach can identify the measurement bias and model discrepancy using multiple sources of data, and provide better estimates of model parameters.

## Authors

• 9 publications
• 1 publication
• ### A theoretical framework of the scaled Gaussian stochastic process in prediction and calibration

The Gaussian stochastic process (GaSP) is a useful technique for predict...
07/10/2018 ∙ by Mengyang Gu, et al. ∙ 0

• ### Stochastic Kriging for Inadequate Simulation Models

Stochastic kriging is a popular metamodeling technique for representing ...
02/02/2018 ∙ by Lu Zou, et al. ∙ 0

• ### Embedded model discrepancy: A case study of Zika modeling

Mathematical models of epidemiological systems enable investigation of a...
04/13/2020 ∙ by Rebecca E. Morrison, et al. ∙ 0

• ### Computer Model Calibration with Time Series Data using Deep Learning and Quantile Regression

Computer models play a key role in many scientific and engineering probl...
08/29/2020 ∙ by Saumya Bhatnagar, et al. ∙ 0

• ### Calibration Concordance for Astronomical Instruments via Multiplicative Shrinkage

Calibration data are often obtained by observing several well-understood...
11/26/2017 ∙ by Yang Chen, et al. ∙ 0

• ### Considering discrepancy when calibrating a mechanistic electrophysiology model

Uncertainty quantification (UQ) is a vital step in using mathematical mo...
01/13/2020 ∙ by Chon Lok Lei, et al. ∙ 26

• ### Jointly Robust Prior for Gaussian Stochastic Process in Emulation, Calibration and Variable Selection

Gaussian stochastic process (GaSP) has been widely used in two fundament...
04/25/2018 ∙ by Mengyang Gu, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The advancement of science has given rise to the description of various phenomena in physics, chemistry, biology and engineering as sets of generally agreed upon systems of equations or mathematical models. One typically wishes to estimate the unobserved parameters in a mathematical model using experimental observations or field data – a process generally known as model calibration (or data inversion). Estimated model parameters are usually of direct scientific interest, and properly calibrated models are also necessary if one wishes to predict reality at unobserved points.

As the mathematical model is almost always an imperfect representation of reality, much recent work on model calibration has focused on inclusion of a (usually statistical) discrepancy function. In [28], the discrepancy function is modeled as a Gaussian stochastic process (GaSP), and in combination with the calibrated mathematical model often leads to more precise predictions of reality. However, various studies have found that calibration parameters and the discrepancy function cannot be uniquely estimated. This “identifiability issue” degrades model parameter estimates and the ability of the calibrated mathematical model to predict reality [50, 53].

Model calibration is also complicated by spatially correlated patterns of measurement error, caused by the device or field conditions, which we term measurement bias. It is important to clarify the difference between the model discrepancy and measurement bias. The model discrepancy explains the difference between the mathematical model and physical reality and is shared in all data sources, while the measurements bias may change between different data sources. Separating the measurement bias from model discrepancy is important, for instance, as it may give insight into methods for improving the precision of the measurements. In this work we will explain these concepts in more detail using both mathematics and real examples.

Even though measurement biases are reported in many previous studies [55, 26, 14], the simultaneous analysis of both measurement bias and model discrepancy has rarely been considered. Both effects were considered in [40], but the model discrepancy in that work is defined as a linear trend of the predictors, which might not be suitable in other studies. In this work, we introduce a statistical method utilizing multiple sources of data to address the uncertainty of the calibration parameters, model discrepancy, measurement bias, and noise. The GaSP model of the discrepancy function has been found to be confounded with parameters of the mathematical models when using one source of data without measurement bias [6, 50, 53]. The scaled Gaussian stochastic process (S-GaSP) was introduced in [20, 22] to better address the identifiability problem between the mathematical model and reality. We study both methods when multiple sources of data are available, and will argue that the S-GaSP offers advantages in modeling the discrepancy function separating the measurement bias and model discrepancy, which has not been studied before.

We apply our technique to the scientific problem of using satellite interferograms to calibrate a geophysical model of Kīlauea Volcano. However, we emphasize that the methods developed in this work are general for a wide range of problems in model calibration and data fusion.

### 1.1 InSAR and volcano deformation

The Earth’s surface is deformed on a wide range of spatial and temporal scales by processes such as earthquakes, volcanic eruptions, and withdrawal of subsurface fluids such as hydrocarbons and groundwater. Measurements of this deformation are important for hazard monitoring and for constraining source processes. Over the last 25 years, satellite-based radar measurements have made it possible to map deformation over broad swathes of the Earth’s surface to sub-centimeter accuracy from space, revolutionizing scientists’ ability to record and model deformation of the Earth’s surface due to processes such as earthquakes and volcanic eruptions [35, 34].

Interferometric synthetic aperture radar (InSAR) images (interferograms) are most often obtained using data from orbiting microwave-band radar satellites. By ‘interfering’ two radar images of the surface taken from a satellite at approximately the same location but at different times, changes in radar phase are obtained which record temporal changes in the position of the Earth’s surface along the oblique line-of-sight (LoS) vector between satellite and ground. Because only fractional phase change can be measured directly (the number of complete phase cycles between the satellite and ground is not known), these images are wrapped by the radar’s wavelength. “Unwrapping” an image by spatial integration of the phase gradient – relative to a point believed to be non-deforming – yields relative LoS deformation, in units of distance change

[11]. It is these unwrapped images which are most commonly modeled.

InSAR observations have been reported for hundreds of Earth’s volcanoes [9]. Because volcano deformation is typically ascribed to the motion of magma or magmatic fluids in the subsurface, these data have revolutionized scientists’ ability to observe and model magmatic processes at the world’s volcanoes [52, 1, 10, 32, 41] . As latency times have decreased, InSAR observations have also become increasingly useful for hazard monitoring and response during ongoing volcanic crises.

Despite these advances, the interpretation of InSAR data is often greatly complicated by atmospheric noise which contaminates many images, as well as assumptions and simplifications in the geophysical source models used in InSAR data interpretation.

### 1.2 Noise and measurement bias in InSAR data

Noise in InSAR data – introduced by the radar system, by the atmosphere, by reflectors (scatterers) on the ground, and in processing [10] – obscures our ability to resolve surface deformation and can bias source estimates. After removing phase due to the elevation of the Earth’s surface, the wrapped phase for each pixel on the ground may be described by (e.g., [26])

 ϕ=W{ϕd+ϕa+ϕo+ϕl+ϕn}, (1)

where is a “wrapping” operator that drops full phase cycles, is due to LoS ground displacement, is propagation delay due to the atmosphere (tropospheric water vapor and ionospheric electron density [16]), is residual phase due to satellite orbital errors, is residual phase due to look angle error, and is a noise term accounting for remaining noise (scattering variability, thermal noise, etc.) [26]. Note that the unwrapping procedure removes the operator to convert phase to actual ground displacements, which is typically modeled.

Of the sources of uncertainty in equation (1), the spatially-correlated atmospheric term is the most important; spatial and temporal changes of just 20% in relative humidity lead to errors of 10 cm in estimated ground deformation [55]. These errors exhibit strong spatial correlations. Figure 1b shows wrapped InSAR phase at Kīlauea Volcano (section 1.3). A “bullseye” pattern near the center of the image is due to real ground deformation , while most of the remaining fringes are due to . Atmospheric signal may be directly estimated and removed with weather models or the data from continuous GPS or optical satellites (e.g., [13]), or if enough data are available reduced through sophisticated time series analyses [25], but these approaches cannot always be applied and only imperfectly remove atmospheric noise, and remaining errors often reduce effective accuracy from several millimeters to several centimeters [16]. In this work, measurement bias – mainly caused by atmospheric errors – is incorporated into a coherent statistical model of the InSAR data.

Covariance matrices for InSAR data are typically estimated by empirical autocovariance using a selection of points in regions believed to be non-deforming, under the stationary and isotropic assumptions [31]. However, this approach can introduce bias, for instance if noise in the non-deforming region is different than that in the deforming region. In this work, the measurement bias in each InSAR interferogram is modeled by a Gaussian stochastic process with an anisotropic Matérn covariance function, together with the geophysical model and discrepancy function, in a coherent statistical model. The use of anisotropic Matérn covariance was previously suggested in [29] to improve the precision in estimating InSAR data covariance (and model parameter estimates) and as being consistent with tropospheric delay models, but this approach has not been commonly employed.

Stacking (averaging) multiple interferograms is often used to reduce noise and bias which varies between images, and also reduces the number of observations which must be modeled [55, 47]. In this work, we show that under certain assumptions, some usual estimators, such as the maximum likelihood estimator of the calibration parameters inferred from a stacked image, are equivalent to those estimated from simultaneous consideration of all images. This does not hold true, however, if the different measurement biases appear in the images, as may occur due to correlation of atmospheric artifacts with Earth topography. In this work, the approach of separating atmospheric artifacts and model discrepancy using individual satellite images will be compared to the results obtained by image stacking. When the number of images is large, we establish the connection between the limiting distribution of the models of the stacked image with and without the measurement bias.

### 1.3 Modeling volcanic unrest with InSAR data: Kīlauea Volcano

Kīlauea Volcano, on the Island of Hawai‘i, is one of the world’s most active volcanoes and erupted almost continuously from 1983 to 2018. InSAR data have been widely used at Kīlauea to estimate the locations and volume changes of reservoirs and magma intrusions [37, 7, 33, 42], to resolve flank stability [12], and – together with other data sets – to resolve magma supply rate and the concentration of volatiles in Kīlauea’s primary (source) magma [3, 4].

Figure 1 shows InSAR ground displacement at Kīlauea from October, 2011 to May, 2012, when the summit of Kīlauea inflated due to magma storage [3]. These data were recorded from a satellite orbiting roughly north to south (“descending mode”), and recording LoS deformation (‘looking’) along a vector oblique to the Earth’s surface – roughly east to west and downwards at an angle of 41 degrees. This image thus resolves a combination of predominantly vertical and east-west ground deformation. To best constrain 3D ground deformation patterns it is therefore desirable to utilize data from satellites with different look vectors. Finally, the image was subsampled using the quadtree algorithm (Figure 1d) which is widely used to reduce the number of pixels in the InSAR interferogram [27].

InSAR data from volcanoes are most commonly interpreted using analytical elastic halfspace models which are derived from the principals of continuum mechanics [e.g. 36, 54]. These models relate volume and pressure changes in buried cavities of various shapes to surface displacements. Given observed ground deformation data, our goal is to resolve the location (and possibly shape) of the magma reservoir, and the change in volume of the reservoir cavity during the time period of our observations.

We relate ground displacements to properties of the magma reservoir using a geophysical model of an inflating magma reservoir in an elastic crust. We denote the geophysical model by , where are observable and are unobservable. We choose a simple model [36] for which a closed-form analytical solution is available. This model is widely used and has been shown capable of broadly reproducing ground deformation observed at many volcanoes, to first order. For [36] the model parameter vector is given by , where , , and are the 3D coordinates of the reservoir centroid within the earth, is the rate of reservoir volume change, and is a material property (Poisson’s ratio) which over a plausible apriori range does not strongly affects the outcome of the geophysical model and is typically considered a nuisance parameter. 3D ground displacements predicted by the model are converted to 1D LoS displacements by taking the dot product of the 3D deformation vector at each point on the ground with the unit vector describing the LoS. Because we assume that the look vectors are perfectly known, they do not appear in . Note that we utilize observations from more than one look angle, and this process must be repeated for each interferogram.

We highlight several primary challenges in calibrating the mathematical model. First, the class of geophysical models considered herein assumes a very simple earth structure and a spherical source geometry, which might not be able to explain all the variability in the reality. Secondly, the interferograms may be affected by atmospheric conditions and other sources of error (e.g. the northwest region in Figure 1c is atmospheric noise). Interferograms also yield a relative measure of ground deformation, from which absolute deformation must be inferred relative to a point assumed to be non-deforming. Since this point is not known precisely, an uncertain mean value shift is contained in each interferogram. And, finally, the number of pixels in InSAR images can be very large and can become computationally prohibitive in a calibration (inverse) problem.

These issues are addressed as follows in this paper. In Section 2, we introduce a new statistical model that includes both the discrepancy function and measurement bias using multiple sources of field data. The marginal likelihood and the predictive distribution of the new model are derived for the estimation of the calibration parameters, reality, model discrepancy and measurement bias. Section 3 discusses the statistical model of the discrepancy function and measurement bias. The parameter estimation via the full Bayesian inference is discussed in Section 4. The simulated and real numerical evidence comparing several models are given in Section 5 and a short conclusion is given in Section 6. Proofs of lemmas and theorems are provided in the Appendix.

## 2 Model calibration by multiple sources of data

In this section, we introduce a statistical framework that addresses the uncertainty in imperfect mathematical models and observations from multiple sources in Section 2.1. The connection and difference between modeling the full data and aggregated data is discussed in Section 2.2.

### 2.1 A framework of modeling multiple imperfect measurements

Denote the real-valued field data at the observable variable input from the source , . The mathematical model is defined as , where denotes the unobservable calibration parameters. As and may be both imperfect, we consider the model

 yFl(x)=fM(x,θ)+δ(x)+δl(x)+μl+ϵl, (2)

where is a mean parameter of the source and

is an independent Gaussian noise with variance

, for each and for source . In model (2), and denote the model discrepancy function and the th measurement bias function, respectively. For any inputs , we assume that the marginal distributions of and

both independently follow zero-mean multivariate normal distributions

 δ ∼MN(0,τ2R), (3) δl ∼MN(0,σ2lRl), (4)

where and are the covariance matrices, with and , being the variance parameters for , respectively. The th term of and are often parameterized by the kernel functions and , , and . We postpone the discussion of kernel functions to Section 3.

In model (2), the physical reality, denoted as at any coordinate , can be expressed as the summation of the mathematical model and discrepancy function, i.e. , which follows the construction introduced in [28]. The innovation of the model (2) is to explicitly model the measurement bias contained in the observations.

For our scientific goal, long-wavelength “ramp” artifacts may appear in InSAR data due to errors in satellite orbits. Linear or quadratic ramp parameters may be estimated together with geophysical model parameters if the ramps cannot be corrected independently (for instance using data from GPS sensors) [47]. Here for simplicity, and because the area of geographic area of interest is relatively small, we assume that linear and higher-order ramp terms are negligible. Because InSAR measures ground displacements relative to a subjective point assumed to be non-deforming, however, we must account for possible error in the mean value of each image. Hence, we assume an unknown mean parameter for the th interferogram, where , in model (2). In general, the intercept and trend in the observations can be modeled as a mean function in model (2), if they are not captured in the mathematical model.

The assumption in (2) that measurement bias terms are independent may be violated in practice. To reduce dependency between the sources (InSAR images), we only use interferograms which do not share any acquisition times. When the measurement bias is not independent, one may model the data from different sources in a latent factor model, where the factor loading matrix is often estimated by the correlation between each source of the data. However, estimating the correlation between each source of the data introduces extra dimensions of difficulties in calibration, and thus we do not pursue this direction in this work. A possible approach to model the dependency across sources is through the linear model of coregionalization (LMC) [15]

. The principal component analysis was used to estimate the loading matrix in LMC in calibration

[23], and a new way of estimating the loading matrix was recently introduced in [19].

The observations from the th source of data are denoted as at , for . For the InSAR interferograms, each entry of represents a line-of-sight displacement at a point on the Earth’s surface. The following lemma gives the marginal distribution of model (2), after integrating out the random measurement bias functions.

###### Lemma 1.

After integrating out in model (2), , one has the following distributions:

• For each source , the marginal distribution of the field data follows a multivariate normal distribution

 (yFl∣δ,θ,σ2l,Rl,μl,σ20l)∼MN(fMθ+μl1n+δ,σ2lRl+σ20lIn), (5)

where and , with being an

• The marginal posterior distribution of the discrepancy function follows a multivariate normal distribution

 (δ∣{yFl,σ2l,Rl,σ20l,μl}kl=1,θ,τ2,R)∼MN(^δ,^Σ), (6)

where

 ^Σ=(k∑l=1~Σ−1l+(τ2R)−1)−1and^δ=^Σk∑l=1~Σ−1l~yl,

with

 ~yl=yl−fMθ−μl1nand~Σl =σ2lRl+σ20lIn.

Further marginalizing out , the marginal distribution of the field data is given in the following lemma.

###### Lemma 2.

After integrating out both and , , the marginal distribution of the field data follows a multivariate normal distribution

 (YFv∣{σ2l,Rl,σ20l,μl}kl=1,θ,τ2,R)∼MN(1k⊗fMθ+μ⊗1Tn,τ21k1Tk⊗R+Λ), (7)

where denotes the Kronecker product, and is a block diagonal matrix, with the th diagonal block being defined in Lemma 1, . The density of (7) can be expressed as

 p(YFv∣{σ2l,Rl,σ20l,μl}kl=1,θ,τ2,R) =(2π)−nk2τ−n|R|−12|^Σ|12k∏l=1|~Σl|−12exp{−12(k∑l=1~yTl~Σ−1l~yl−(k∑l=1~Σ−1l~yl)T^Σ−1(k∑l=1~Σ−1l~yl))},

where , , and are defined in Lemma 1, for .

Both Lemma 1 and Lemma 2 can be used for computing the likelihood given the other parameters in the full Bayesian analysis. Lemma 2 may be used to develop the maximum likelihood estimator, as both the random model discrepancy and measurement bias functions are marginalized out explicitly. Note that the computational complexity of the marginal density of the field data is in both lemmas, for inverting covariance matrices, each with the size , rather than , even if the covariance matrix in (7) is in Lemma 2. Such simplification is the key to proceed without approximations to compute the likelihood, if is not very large. Note that the simplifications of computation rely on the aligned measurements of each source of field data. When the measurements are misaligned, approximations might be needed when the number of sources is large.

Since the discrepancy function between the mathematical model and reality is often scientifically important, one can draw and record using Lemma 1 in the posterior sampling. The following theorem gives the predictive distribution at any input , given the parameters and posterior samples of .

###### Theorem 1.

For any , one has the following predictive distributions for model (2):

• The predictive distribution of model discrepancy at any input follows a normal distribution

 (δ(x∗)∣δ,{yFl,σ2l,Kl(⋅,⋅),σ20l,μl}kl=1,θ,τ2,K(⋅,⋅))∼N(^δ(x∗),τ2^K∗∗),

where , , and .

• For each source , , the predictive distribution of the measurement bias at any input follows a normal distribution

 (δl(x∗)∣δ,{yFl,σ2l,Kl(⋅,⋅),σ20l,μl}kl=1,θ,τ2,K(⋅,⋅))∼N(^δl(x∗),σ2l^K∗∗l),

where and , with , and being defined in Lemma 1, for .

• For each source , , the predictive distribution of the field data at any input follows a normal distribution

 (yFl(x∗)∣δ,{yFl,σ2l,Kl(⋅,⋅),σ20l,μl}kl=1,θ,τ2,K(⋅,⋅))∼N(^yFl(x∗),τ2^K∗∗+σ2l^K∗∗l+σ20l),

where .

Lemma 1 and Theorem 1 above will be used in the full Bayesian analysis discussed in Section 4. Model (2) provides a general statistical framework that addresses the uncertainty in both model discrepancy and measurement bias using multiple sources of field data. Although the computational cost scales linearly with the number of data sources, it may still become prohibitive with many sources, and in some applications only aggregated data may be available. In geoscience, the stack image of the satellite interferograms is frequently used to reduce the computational operations. The difference of modeling the full data and aggregated data, such as the stack image, is discussed in Section 2.2.

### 2.2 Model calibration by aggregated data

Consider a special case of model (2), where each interferogram has no measurement bias, and the mean parameter and measurement noise variance are shared between all sources of the field data:

 yFl(x)=fM(x,θ)+δ(x)+μ+ϵl, (8)

where is an independent Gaussian noise, for each and for . Assume is the random discrepancy function such that the marginal distribution follows equation (3). Model (8) was previously used in calibration using repeated experimental data [8].

Denote the average value of the field data at the input . When (8) is assumed, it implies the model of the aggregated data

 ¯yF(x)=fM(x,θ)+δ(x)+μ+¯ϵ, (9)

where independently for each .

Denote the observations in the source , , and denote the aggregated data. Although is often used for analyzing geophysical models, the relation between modeling the stack image in (9) versus modeling the full data in (8) is not generally known. The lemma below connects these approaches.

###### Lemma 3.

Denote and the natural logarithm of the likelihood in model (8) and model (9). One has

 ℓ(θ,δ,μ,σ20)=c(σ20)+¯ℓ(θ,δ,μ,σ20), (10)

where

 ¯ℓ(θ,δ,μ,σ20)=−n2log(2πσ20k)−kS2(θ,δ,μ)2σ20,

with and .

Lemma 3 is a direct consequence of density of the models in (8) and in (9). The result in Lemma 3 is important, as it implies some frequently used estimators, such as the maximum likelihood estimator (MLE) for and after integrating out , are exactly the same using the model of the aggregated data in (9) or the model of the full data in (8). This validates the use of stack image in calibration for estimating the calibration parameters and mean parameter if model (8) is appropriate.

When is known, the aggregated data is the sufficient statistics for model (8) of all the parameters and the discrepancy function in model (8). However, when is unknown, the sufficient statistics of model (8) is , where , with [8]

. Note the usual unbiased estimator of

based on the full data is the sample variance , which has the variance in estimating . However, the sample variance of based on the aggregated data is , where the variance of this estimator is , much larger than the sample variance based on full data when is large. Thus, using the aggregated data in (13) is typically less efficient in estimating than the model of the full data in (8).

When the field data contain measurement bias and different mean parameters modeled as in (2), the sampling model of the aggregated data is

 ¯yF(x)=fM(x,θ)+δ(x)+1kk∑l=1μl+1kk∑l=1δl(x)+1kk∑l=1ϵl. (11)

The lemma below gives the marginal distributions of the aggregated data for both model (2) and model (8) of the full data.

###### Lemma 4.
• Assuming the model (2) of the full data, after integrating out and , , the marginal distribution of follows

 (¯yF∣{σ2l,Rl,σ20l,μl}kl=1,θ,τ2,R)∼MN(fMθ+1kk∑l=1μl1n,1k2k∑l=1~Σl+τ2R). (12)

where is defined in Lemma 1.

• Assuming the model (8) of the full data, the marginal distribution of follows

 (¯yF∣σ20,μ,θ,τ2,R)∼MN(fMθ+μ1n,σ20kIn+τ2R). (13)

Lemma 4 follows trivially by the law of the total expectation and total variance so the proof is omitted here. The theorem below connects the the limiting distribution of the stack image in (9) and in (12), which is a direct consequence of Lemma 4.

###### Theorem 2.

Assume , and . When , the limiting distribution of in both (12) and in (13) follows

 (¯yF∣μ,θ,τ2,R)∼MN(fMθ+μ1n,τ2R). (14)

## 3 Modeling the measurement bias and model discrepancy

In this section, we discuss the statistical models of the discrepancy function and measurement bias functions. We first introduce the Gaussian stochastic process in Section 3.1. We then present an example demonstrating the identifiability problem of the estimated calibration parameter in the GaSP model. A new stochastic process, called the discretized scaled Gaussian stochastic process, is introduced in Section 3.2 for the model discrepancy and for reducing the identifiability problem.

### 3.1 Gaussian stochastic process

Let us first consider modeling the measurement bias via a Gaussian stochastic process (GaSP)

 δl(⋅)∼GaSP(0,σ2lKl(⋅,⋅)), (15)

where is the kernel function for , meaning that for any , the marginal distribution of follows a multivariate normal distribution in (3), for .

For any inputs and , the covariance is typically assumed to have a product form in calibration

 Kl(xa,xb)=p∏t=1Kl,t(xat,xbt), (16)

where each is a one-dimensional kernel function for correlation between the coordinate of any two inputs for the source , . Denote . One popular choice of the correlation function is the power exponential correlation, with a roughness parameter typically held fixed and an unknown range parameter for the th coordinate of the input, . Another popular choice is the Matérn correlation. The Matérn correlation with the roughness parameter for has a closed form expression. For example, the Matérn correlation function with is as follows:

 Kl,t(dt)=(1+√5dtγt+5d2t3γ2l)exp(−√5dtγt). (17)

A desirable feature of the Matérn correlation is that the sample path of the process is differentiable. More advantages of using (17) are discussed in [21]. In relation to the present scientific goal, we also note that previous works have argued that Matérn correlation functions are suitable for modeling atmospheric noise in InSAR data [29]. However, we do not limit ourselves to any specific correlation function and the methods discussed in this work apply to all such functions.

In [28], the discrepancy function is also modeled as a GaSP:

 δ(⋅)∼GaSP(0,τ2K(⋅,⋅)), (18)

where is a kernel function modeling the correlation between two inputs. The identifiability issue, however, has been widely observed in modeling spatially correlated data, where the spatial random effect was confounded with a linear fixed effect, i.e. being a linear model of , [44, 24]. These pioneering studies show that the estimated parameters in the fixed effect can change dramatically, depending whether or not to include a term to model the spatial correlation.

In a calibration task, the identifiability of the calibration parameters was also found to be a problem when the discrepancy function is modeled by a GaSP [6, 50, 53]. It was not noticed that the usual estimator of the calibration parameters, such as the maximum likelihood estimator, is typically not even consistent when the discrepancy function is modeled by a GaSP. Here we provide a closed-form example to understand the identifiability problem.

###### Example 1.

Assume field data of the source at input follows

 yFl(xi)=fM(xi,θ)+δ(xi)+ϵl,

where , , with , and is an independent Gaussian noise for each , , and for . Assume the observations are equally spaced from , i.e. , for and . Further assume , and are known and finite. The task is to estimate .

The following lemma shows that the maximum likelihood estimator (MLE) is inconsistent when both and go to infinity in Example 1.

###### Lemma 5.

Assume and are both finite. When and , after marginalizing out , the maximum likelihood estimator of in Example 1 has the limiting distribution:

 ^θMLE∼N(θ,2τ2γ2γ+1). (19)

Lemma 5 means that even if one has infinite repeated experiments and infinite observations in a fixed input domain in each experiment, the asymptotic distribution of the MLE of the calibration parameter, defined as the mean parameter in this example, is inconsistent, if the discrepancy function is assumed to be a GaSP with the exponential kernel. This result is surprising but reasonable, because the discrepancy function is shared across all experiments. No matter how many experiments we have, the shared discrepancy function contained in each experiment does not change. Since the MLE of the calibration parameters in model (8) based on the individual sources of data and aggregated data is the same implied by Lemma 3, one only needs to consider the limiting case where one has one source of data with noise-free observations. We therefore present the following remark, which is a direct consequence of the proof of Lemma 5.

###### Remark 1.

If one has one source of noise-free experimental observations, i.e., , for , where follows a GaSP with the covariance function for any . When , the limiting distribution of the maximum likelihood estimator of follows (19).

The MSE of the MLE of in Example 1 is graphed as the red triangles at different number of observations in Figure 2 when . In the left panel, when , the MSE quickly converges when the sample size increases. In the right panel, the MSE converges to a smaller value when , because the correlation is smaller. Both MSEs converge to the limiting value, , found in Lemma 5.

The results in Lemma 5 and Remark 1 are both examples of the equivalency of the Gaussian measures. Although the closed-form expression of the limiting distribution of the MLE of the calibration parameter in Lemma 5 relies on the exponential kernel function. The MLE is inconsistent for the mean parameter of the GaSP model with many other kernel functions, such as the Matérn kernel. We refer the reader to Chapter 4.2 in [48] for the detailed discussion in this topic.

Many attempts have been made to identify discrepancy function and calibration parameters. In [50], the optimal estimator of the calibration parameter is defined to be the one that minimizes the loss between the mathematical model and reality

 L2(θ)=∫x∈X(yR(x)−fM(x,θ))2dx. (20)

Indeed, in mathematical model calibration, one hopes the variability in the field data to be explained by the estimated calibration parameters in mathematical model, rather than by the discrepancy function, as the calibration parameters often have practical meanings. It is thus sensible to argue that, if the trend and intercept are properly modeled in the mathematical model, a calibrated mathematical model that fits closely the reality is more likely to be true than another mathematical model that is far from the reality, in terms of the loss. However, the estimated calibration parameters in the GaSP model typically do not converge to the one that minimizes the loss in (20) [50, 22].

When the discrepancy function is modeled by a GaSP, the loss between the mathematical model and reality can be expressed as

, a random variable whose measure is induced by the GaSP. The distribution of this random

loss often has too much probability mass at large values, when the correlation is large in a GaSP model. Because of this reason, a new stochastic process for the discrepancy function, called the scaled Gaussian stochastic process, was introduced in [20] and [22], which assigns more probability mass at the region with smaller values than the one induced by GaSP. We introduce a computationally feasible version of this process, called the discretized scaled Gaussian stochastic process in the Section 3.2.

### 3.2 Discretized scaled Gaussian stochastic process

Consider the discretized scaled Gaussian stochastic process (S-GaSP) for modeling the discrepancy function:

 δz(x)={δ(x)∣1nn∑i=1δ2(xi)=Z},δ∼GaSP(0,τ2K(⋅,⋅)),Z∼pδz(⋅), (21)

where is a density function of the squared error between the reality and mathematical model at the observed inputs. Given , S-GaSP is a GaSP constrained at the space . The idea is to assign more probability mass on the smaller squared error by defined later.

Denote the density of induced by the GaSP model in (18), where and are the variance parameter and range parameters in the covariance function, respectively. We let proportional to , but scaled by a non-increasing function:

 pδz(Z=z∣γ,τ2,λz)=fZ(Z=z∣τ2,λz)pδ(Z=z∣γ,τ2)∫∞0fZ(Z=t∣τ2,λz)pδ(Z=t∣γ,τ2)dt, (22)

where is a non-increasing scaling function depending on and scaling parameter , at all . For computational reasons, the default choice of

 fZ(Z=z∣τ2,λz)=λz2τ2Vol(X)exp(−λzz2τ2Vol(X)), (23)

where is a positive scaling parameter and is the volume of the input domain .

It is easy to see that GaSP is a special case of S-GaSP when is a constant function, or equivalently . On the other hand, starting from a GaSP model with any reasonable covariance function , after marginalizing out , it was shown in [20] that the marginal distribution of follws a multivariate normal distribution with a transformed covariance matrix:

 δz∣Θ∼MN(0,τ2Rz), (24)

where

 Rz=(R−1+λnIn)−1, (25)

with the th term of being .

Theoretical and computational properties of S-GaSP, such as the orthorgonal sequence representation, convergence rates and predictive distributions, are studied extensively in [22], where the scaled parameter was shown to be the weight between the penality of norm and the native norm of the model discrepancy in the S-GaSP calibration. A larger

assigns more prior probability on the smaller

norm of the discrepancy function. Under some regularity conditions, a suitable choice of guarantees the predictive distribution in the S-GaSP model converges to the reality as fast as in the GaSP model, and the estimation of the calibration parameters in the S-GaSP model also minimizes the loss between the reality and mathematical model, when the sample size goes to infinity. Thus, we follow [22] to let , with .

We present one example to illustrate the difference between the GaSP and S-GaSP calibrations.

###### Example 2.

Assume where , is an independent Gaussian noise for each and reality is assumed to be a function in [30]:

 yR(x)=16{(30+5x1sin(5x1))(4+exp(−5x2))−100}.

Assume , where are two unknown calibration parameters. The field data is observed at , , drawn from the maximin Latin hypercube design [45]. The goal is to estimate and predict the reality at all .

For Example 2, the true values of the calibration parameters are not well-defined because of the presence of a deterministic discrepancy function. We thus compare the GaSP and S-GaSP models of the discrepancy function by their predictive performance based on two criteria. The first criterion is to use only the calibrated mathematical model to predict the reality, and the second one is to use both the calibrated mathematical model and discrepancy function for predictions.

We evaluate the predictions on at the held-out , uniformly sampled from for . Denote MSE the predictive mean squared error using only the calibrated mathematical model and denote MSE the predictive mean squared error using both the calibrated mathematical model and the discrepancy function for prediction:

 MSEfM=∑n∗i=1(fM(x∗i,^θ)−yR(x∗i))2n∗andMSEfM+δ=∑n∗i=1(^yR(x∗i)−yR(x∗i))2n∗,

where is the predictive mean at by the calibrated mathematical model and discrepancy function.

The out-of-sample predictions using the GaSP and S-GaSP calibrations are provided in Table 1, where the parameters are estimated by the MLE via the low-storage quasi-Newton optimization method [39] with 10 different initializations. The MSE is similar using both GaSP and S-GaSP calibration models, while the MSE