Human-induced climate change is projected to significantly affect the Earth’s cryosphere. The West Antarctic ice sheet (WAIS) is particularly susceptible to warming climate because a large portion of its body is marine based, meaning that the bottom of the ice is below the sea-level. Any significant changes in this part of Antarctica can lead to a consequential sea level change (Fretwell et al., 2013). Computer models are used to project the future of WAIS, but the projections from these computer models are highly uncertain due to uncertainty about the values of key model input parameters (Stone et al., 2010; Gladstone et al., 2012; Chang et al., 2016a; Pollard et al., 2016). Computer model calibration provides a statistical framework for using observational data to infer input parameters of complex computer models.
Following the calibration framework described in the seminal paper by Kennedy and O’Hagan (2001), several researchers have developed methods for inferring model parameters for a variety of different types of computer model output. For instance, Bayarri et al. (2007) provides a wavelet-based approach for calibration with functional model output. Sansó and Forest (2009) calibrates a climate model with multivariate output while Higdon et al. (2008) and Chang et al. (2014) provide approaches for calibrating models with high-dimensional spatial data output. More recently, Chang et al. (2016a) develops an approach for high-dimensional binary spatial data output and Sung et al. (2019) proposes a method for binary time series output. Cao et al. (2018) provides a method for censored functional data. Ice sheet thickness data, including the West Antarctic ice sheet data set we consider here, are frequently in the form of high-dimensional semi-continuous spatial data. No existing calibration methods are suited to this type of data; this motivates the new methodological development in this manuscript.
Several computer model calibration approaches have been applied to infer the parameters and to systematically quantify parametric uncertainty in Antarctic ice sheet models (Gladstone et al., 2012; Chang et al., 2016a, b; Pollard et al., 2016; Edwards et al., 2019). One important caveat to existing approaches to ice sheet model calibration is that the model outputs and observational data need to be transformed or aggregated in some degree to avoid issues involving semi-continuous distributions. To be more specific, the main variable of interest in ice model output and observational data is the spatial pattern of ice thicknesses which have positive values at the locations with ice presence and zero values otherwise. Handling such spatially dependent semi-continuous data with truncation at zero poses non-trivial inferential and computational challenges and existing calibration methods cannot readily handle these issues. Chang et al. (2016a) used ice-no ice binary spatial patterns obtained by dichotomizing the thickness patterns into zeros and ones and hence ignored important information regarding the ice thickness. Pollard et al. (2016) also similarly used highly-summarized data to avoid challenges related to semi-continuous data. Although their results show that such approaches still lead to a meaningful reduction in input parameter uncertainty, one can certainly expect that transforming or summarizing data can result in some significant loss of information. This motivates our methodological development of calibration method that can directly utilize semi-continuous spatial data.
. In this framework the semi-continuous data being analyzed are viewed as a realization from an underlying Gaussian process, which can be observed only when the values are positive. This simple ‘clipped’ Gaussian process approach provides a natural way to impose spatial dependence among zero and non-zero values. However, the use of truncated process can create serious computational issues when applied to a high dimensional data set with a large proportion of zeros. This is because inference based on such a model requires integrating out highly-dependent, high-dimensional, and bounded latent variables for locations with 0 values. Matrix computations for high-dimensional spatial random variables are expensive. Furthermore, designing efficient (‘fast mixing’) Markov chain Monte Carlo methods for Bayesian inference for such models becomes very challenging. This is why a clipped Gaussian process(such as one used by Cao et al., 2018) does not provide a feasible solution for our calibration problem.
In this paper we formulate an emulation and calibration framework that uses two separate processes: one process for modeling the presence and absence of ice and the other for modeling the value of ice thickness given that ice is present. This approach removes the need to integrate out the bounded latent variables for the locations with no ice and hence allows us to circumvent the related computational challenges in the clipped Gaussian process approach. Our proposed method uses likelihood-based principal component analysis (Tipping and Bishop, 1999) to reduce the dimension of model output and observational data (cf. Higdon et al., 2008; Chang et al., 2014), and avoids issues with large non-Gaussian spatial data calibration (cf. Chang et al., 2016a). In our simulated example and real data analysis, we show that our method can efficiently utilize information from large semi-continuous spatial data and lead to improved calibration results compared to using only binary spatial patterns. While our focus is on calibrating a computer model for the West Antarctic Ice Sheet, the methodology we develop here is readily applicable, with only minor modifications, to other calibration problems with semi-continuous data.
The rest of this paper is organized as follows. In Section 2, we introduce the details of our PSU-3D model runs and Bedmap2 observational data that have motivated our methodological development. In Section 3, we describe our new framework for emulation and calibration using semi-continuous data and discuss the computational challenges posed by the large size of the spatial data. In Section 4, we propose a reduced-dimension approach that can mitigate the computational challenges, and in Section 5 we describe the result of our analysis on the model runs and observational data using the proposed approach. In Section 6 we summarize our findings and discuss some possible future directions.
2 Model Runs and Observational Data
In this study we use a state-of-the-art model, the PSU-3D ice model (Pollard et al., 2015, 2016), for studying the evolution of WAIS. This model strikes a good balance between model realism and computational efficiency and hence can allow simulations of long term behavior of WAIS (on the scale of thousands of years) with a relatively high resolution of 20 km. Similar to other complex computer model experiments, simulation runs from the PSU-3D ice model are available only at a limited number of input parameter settings due to the high computational cost. Therefore in this study we take an emulation approach in which we first create a collection of model runs at pre-specified design points in the input parameter space (often called a perturbed physics ensemble) and then build a statistical surrogate based on those model runs.
We use a previously published ensemble of simulations (Chang et al., 2016b; Pollard et al., 2016) generated from PSU-3D ice model with 625 model runs. The parameter settings for ensemble members are determined by a factorial design with 5 design points for each input parameter. There are four input parameters varied in the ensemble: sub-ice-shelf oceanic melt factor (OCFAC, non-dimensional), which determines oceanic melting at the bottom of floating ice shelves caused by the changes in the surrounding ocean temperature; calving factor (CALV, non-dimensional), the rate of calving of iceberg at the oceanic edge of floating shelves; basal sliding coefficient (CRH, m year Pa); velocity of sliding movement of grounded ice, determined by the interface between the grounded ice and its bed rock; asthenospheric relaxation e-folding time (TAU, 1000 years), the time scale for ice sheet evolution caused by changing ice load on its bedrock. While these parameters play important roles in determining the long-term evolution of the Antarctic ice sheet, their values are highly uncertain and hence need to be properly calibrated for realistic simulation.
Each ensemble member is spun up from 40,000 years before present to modern times and then projected into future for 5,000 years. We then extract the spatial patterns of modern grounded ice sheet thickness in Amundsen Sea Embayment (ASE) region, which is expected to be one of the major contributor to sea level change in the future. The spatial pattern in our selected region has pixels with 20 km 20 km resolution (Figures 1 b-d). To calibrate the four input parameters introduced above, we compare these model outputs with the observed modern ice sheet thickness pattern in the same area derived from the Bedmap2 dataset (Fretwell et al., 2013) (Figure 1 a). This recent data product combines a wide range of sources including seismic sounding, radar surveys, and satellite altimetry. Since the observational grid has a higher spatial resolution (1 km
1 km resolution), we upscale the observational data to the model grid using a simple linear interpolation. Note that the model outputs and the observational data for ice thickness are all in the form of high-dimensional semi-continuous spatial data which poses non-trivial statistical challenges for our calibration framework.
3 Computer Model Emulation and Calibration Using Semi-continuous Spatial Data
In this section we describe our statistical framework for inferring the input parameters in the PSU-3D ice model. In particular we focus on describing how the standard computer model emulation and calibration framework (Kennedy and O’Hagan, 2001) can be modified to accommodate the ice thickness patterns introduced above, which take the form of semi-continuous data.
We use the following notation hereafter: Let the
-dimensional vector, , denote the spatial pattern of ice thickness at the spatial locations of the model grid which is generated from the computer model given input parameter setting . Here, is the dimension of the input space which in our application is equal to four. The observed data at the same spatial locations are denoted as a -dimensional vector . Here, and can have either positive values representing the ice thickness or zero values denoting absence of ice at location (see Figures 1).
We denote the design points for the input parameters in our ensemble as . As a result the collection of model output in our ensemble can be denoted as an matrix , with elements for and , where the rows correspond to different input parameter settings while the columns correspond to different spatial locations. In our ice thickness application the number of spatial locations for the grid is and the number of model runs in the ensemble is .
3.1 Computer Model Emulation Using Semi-Continuous Spatial Data
Since only a limited number of computer runs can be carried out, we use an emulator to statistically link the modeled ice thickness to the observational data. However, the semi-continuous nature of prevents direct application of existing GP calibration approaches such as those in Sacks et al. (1989) and Kennedy and O’Hagan (2001). In order to make emulation of the semi-continuous variable possible, we introduce an indicator variable whose value is one if grounded ice is present at the given parameter setting and spatial location or zero otherwise for and . Given that grounded ice is present, we model the thickness as , where is a bijective transformation function that allows to take any real value. We can now formulate the ice thickness as
for and . Using this representation we can translate the problem of emulating into the problem of finding the predictive distributions of the binary response and the transformed thickness values at any untried input parameter setting . Therefore, we can model directly as a multivariate Gaussian process, since its elements are unbounded and continuous. We use a -dimensional vector to denote the emulated process for from a GP. For the binary spatial pattern
, we indirectly emulate them through their corresponding logitsdefined as
for as in Chang et al. (2016a). Since can be again treated as continuous variables with unbounded support the use of the standard GP approach is suitable. Since is an unobserved latent variable even if is one of the existing design points , we do not use a separate notation for the logits at those design points. Our emulation problem now becomes a problem of finding predictive processes and at any untried settings (which are possibly dependent to each other).
3.2 Computer Model Calibration Using Semi-Continuous Spatial Data
Formulating calibration framework also requires to address the issue with semi-continuous data because the standard calibration approach (Kennedy and O’Hagan, 2001) is not applicable. Here we use a similar representation of the observed ice thickness as in (1). We define the variable to be an indicator with a value of one if observed grounded ice presents at and zero otherwise. To transform the observational data, we use the same transformation function as in (1). At any spatial location , we assume observation of ice thickness can be represented as follows:
In a similar fashion to our emulation framework, we set up our model for the transformed thickness and the logit of denoted as . Following Chang et al. (2016a) we set up the following model to link it to the logit for the model output at the best setting () while accounting for data-model discrepancy:
where is the input parameter setting that gives the ‘best’ match between model output and observational data and a normal random variable is a spatially correlated discrepancy term.
The model for needs to be defined only for the locations with . Let be the number of spatial locations with a positive observed thickness. Without loss of generality, we assume that the observed thicknesses at the first locations are positive while the rest are . For , we use the following model for the transformed thickness:
where the random variables and respectively represent the spatially correlated data-model discrepancy and the i.i.d. observational error. The discrepancy covariance reflects the spatial dependence among .
The model in (3
) assigns the following Bernoulli distribution for(conditionally on the value of and discrepancy ):
Given this distribution for , we can view the specification in (2) as a mixture model with the following density:
and are conditionally independent given the relevant parameters, the likelihood for can be factorized as follows:
Here is the vector of emulated process for all positive ’s (i.e. ), and . The Jacobian factors are omitted as they do not depend on any model parameters.
Note that this formulation does not necessarily require independence between and , because dependence can easily be specified through dependence between and or and . This is how we impose dependence between and in our formulation (see Section 4.2 below).
The factorization above shows that the likelihood for can be factored into two parts, one for the positive observations and the other for the indicator variables at all locations . This has an important implication for inference on : utilizing the ice thickness pattern for calibration is essentially using the additional information from the positive ice thickness values on top of the binary spatial pattern of ice presence ( ) in calibration. We will show how this added information improves our inference on the input parameter in both the simulated and the real data examples in Section 5 below.
3.3 Computational and Inferential Challenges
The basic framework described in the previous section faces some computational and inferential challenges when the model output and the observational data are in the form of high-dimensional spatial data (i.e, is large) as in our PSU-3D Ice model calibration problem: First, inference based on the formulations described in Sections 3.1 and 3.2 requires to handle a large number of latent variables for the logits. To be more specific the number of latent variables in the emulation step is and this translates to 2 million variables to infer for our problem. In the calibration step, while the number of latent variables is much smaller than that in the emulation step (), the number of available data points () is much smaller than the number of latent variables () and hence the problem is in fact ill-posed. Second, the size of data for height patterns from the model output is still large even when we consider only those at and with . In our calibration problem, the number of combinations with is about 8.5 million and this makes the standard Gaussian process emulation approach computationally infeasible because of the well-known computational issue with a large covariance matrix (see, e.g., Heaton et al., 2018). Due to the highly irregular spatial distribution of the locations with a simple solution such as using a separable covariance structure (see, e.g., Gu et al., 2016) is not applicable here.
4 Dimension Reduction-Based Approach
We mitigate the aforementioned challenges due to high-dimensional spatial data using the likelihood-based principal component analysis (PCA) methods (Tipping and Bishop, 1999)
. Unlike the singular value decomposition-based PCA, the likelihood-based PCA can easily handle non-Gaussian data or partially observed data and hence is highly suitable for our problem.
Salter et al. (2019) recently has cautioned about possible issues regarding use of principal components in calibration–if the overall range for model output does not cover the range for observational data, calibration based on principal components can yield nonsensical results. Salter et al. (2019) has also proposed an optimal basis approach that can provide a solution in such situation. Chang et al. (2014) and Chang et al. (2016b) also discuss possible issues in a similar vein from the viewpoint of constructing discrepancy terms. Since the model runs and observational data discussed in Section 2 does not have such issues, we choose not to implement the optimal basis approach by Salter et al. (2019).
4.1 Emulation Based on Likelihood-based Principal Component Analysis.
Let be a matrix of logits for the binary patterns ( and ) for the existing model runs. The rows of correspond to the design points in input parameter settings while the columns are for different spatial locations . We apply logistic principal component analysis (Lee et al., 2010) to decompose the logit matrix in the following way:
where is the mean vector for the spatial locations (i.e. the column means of ), is the logistic principal component (LPC) score matrix, and is the LPC matrix with a pre-specified number of principal components . The rows of correspond to the logits for different input parameter settings where denotes a vector of the LPC scores at . The parameters in (7) (, , and
) can be estimated by maximizing the corresponding likelihood function for these parameters given the binary patternsfor existing model runs using the minorization and maximization (MM) algorithm. We predict the logits at any untried setting by predicting the corresponding LPC scores .
Each score (for ) can be predicted separately using a GP emulator with a zero mean and the following exponential covariance function.
for two possibly different input parameter settings and where is the nugget, is the partial sill, and are the range parameters. We find the maximum likelihood estimates of the covariance parameters , , and to construct an emulator for individual principal components. We denote the resulting emulated process of LPC scores at as .
We also apply a likelihood-based PCA method for data with missing values to build an emulator for the ice-thickness patterns. For and with we assume the following model for dimension reduction:
with (), the principal component (PC) loading ( and ) and the PC score ( and ). Again is the pre-determined number of principal components being used for our dimension reduction. This is essentially PCA with missing values and therefore the PC loadings and scores can be estimated via EM algorithm (Stacklies et al., 2007). We denote the resulting loading matrix by , with th element given by . In a similar manner to the problem of emulating logits we predict the latent variables for the thickness at any untried setting and location with a positive thickness value by predicting the corresponding principal component scores .
Again we build an emulator for each principal component separately using a GP emulator with the following exponential covariance function:
for any two input parameter settings and where is the nugget, is the partial sill, and are the range parameters. To incorporate information from the binary pattern we use the following mean function for the th principal component:
where the function
is given by a natural spline regression model whose degrees of freedom is determined through cross-validation(Hastie, 1992). We let be the vector of coefficients for , whose dimensionality is the same as the degrees of freedom of . To construct the GP emulator we find the estimates of the covariance parameters (denoted as , and ) and the parameters for the spline functions (denoted as ) for each th principal component separately via restricted maximum likelihood estimation (REML) (Stein, 1999). When we predict for any untried setting , we replace with given by the Gaussian process emulator described above since is not available if . We let denote the resulting emulated process for .
For any untried input parameter setting , we can predict the ice thickness pattern from our computer model in the following two steps: (i) We first predict the logits of ice-no ice patterns as , and (ii) for each location with the predicted thickness is given as . Note, however, that the thresholding of the logits at 0 is needed only for evaluating emulation performance (such as generating predicted patterns for visual evaluation) and is not used in our actual calibration procedure.
In the calibration step discussed below, we fix the emulator parameters at their MLEs except for the partial sill parameters for , . The partial sill parameters for will be re-estimated along other parameters in the calibration model to account for any possible discrepancies in scale (see e.g., Bhat et al., 2012; Chang et al., 2014, 2015, 2016b, for smiliar approaches ). However, the partial sills for will be fixed at their MLEs without being re-estimated in the calibration stage because the binary patterns usually do not have enough information for the scale parameters of the latent variables and hence re-estimation for the partial sill parameters often cause identifiability issues as discussed in Chang et al. (2016a).
4.2 Calibration Using Basis Representation
Using the emulators for principal components ( and ) described in the previous section we modify the basic calibration framework introduced in Section 3.2 to set up a computationally efficient calibration method. We now rewrite the model for in (4) as
for , where is the th element of an discrepancy basis matrix , are the random coefficients with for , and is the i.i.d. observational error with . The terms and are respectively the basis representations of and in (4) given by our formulation. We also rewrite the model for the logits for in (3) using a similar basis representation as follows:
with the logistic principal component basis matrix , and a discrepancy basis matrix and its corresponding coefficients with . We model the dependence between the coefficients of the discrepancy terms and through a cross correlation matrix , whose th element is the correlation between and .
The discrepancy basis matrices and need to be carefully specified to avoid possible identifiability issues between the effects of input parameters and the discrepancy. For the discrepancy basis for the thickness we use the kernel convolution (Higdon, 1998; Higdon et al., 2008)
with 40 knots that are evenly distributed in the spatial domain, with the exponential kernel function with the range parameter of 400 km. To reduce the identifiablity issues we use the 10 leading eigenvectors of the kernel matrix as
instead of the original kernel matrix. Using eigenvectors instead of the original the basis matrix has a similar regularizing effect as a ridge regression(Hastie et al., 2009). Similarly found in Chang et al. (2014), our pilot simulation study shows that the value of the range parameter for the kernel function has very minimal effect on the inference result (results not shown). For the discrepancy basis matrix for the binary pattern we use the data-driven basis described in Chang et al. (2016a). To be more specific for each spatial location we compute the following measure of signed mismatch between the model output and observational data:
The th element of the basis vector (i.e. is set to be 1) is then defined as
This mismatch measure captures the discrepancy between the model output and observational data that is common across all input parameter settings and translate it into the logit scale. The simulation study in Chang et al. (2016a) shows that this discrepancy vector gives a parsimonious and reasonably accurate representation of discrepancy when the design points are representative sample of possible values of .
4.3 Bayesian Inference
Given the above formulation we conduct Bayesian inference on and other parameters in the model. While using non-Bayesian inference might be possible as well, we choose to use a Bayesian method as it provides a quite straightforward way to quantify the uncertainty in while account for other sources of uncertainties despite the complexity of our model specification.
In a similar fashion to the specification in (5) the representations in (10) and (11) lead to a density function based on a mixture model. The likelihood function for the mixture model conditional on the emulated process and now becomes
for all locations , where the density function is for the case with in (10). As a result we have the following likelihood function for :
We have a similar factorization as in (6) with one factor for the positive observations
and the other for the binary variables at all locations.
To complete the Bayesian model, we assign the following priors for the model parameters (, and ) in our calibration step:
is a uniform distribution within the range thatis positive definite, i.e., , and is the uniform distribution defined over the physically possible range for the parameters . Notice that we have specified weakly informative priors on and for computational stability reasons however we have noticed in our pilot simulations that the posterior analysis is insensitive to the choice of the prior hyper-parameters. For , we assigned moderately informative prior with purpose to encourage to take a value of around 1000. For the re-estimated partial sill parameters , we assigned a slightly informative prior to encourage them to have values around their MLEs estimated in the emulation stage. To account uncertainty on the input parameters , we assign independent uniform priors within range on the input parameters because we have already re-scaled the parameter values in the range , whose limits in the original scale correspond to the physically possible ranges of the input parameters.
The above specification of likelihood and prior lead to a posterior whose density can be factorized as follows:
The first part on the right-hand side is based on the likelihood for () and the relevant priors and obtained by
where , , , and are the prior densities (defined below) and the marginal likelihood can be written as
with . The mean and covariance of are given by