Stochastic Local Interaction Model: Geostatistics without Kriging

01/07/2020 ∙ by Dionissios T. Hristopulos, et al. ∙ Technical University of Crete 0

Classical geostatistical methods face serious computational challenges if they are confronted with large sets of spatially distributed data. We present a simplified stochastic local interaction (SLI) model for computationally efficient spatial prediction that can handle large data. The SLI method constructs a spatial interaction matrix (precision matrix) that accounts for the data values, their locations, and the sampling density variations without user input. We show that this precision matrix is strictly positive definite. The SLI approach does not require matrix inversion for parameter estimation, spatial prediction, and uncertainty estimation, leading to computational procedures that are significantly less intensive computationally than kriging. The precision matrix involves compact kernel functions (spherical, quadratic, etc.) which enable the application of sparse matrix methods, thus improving computational time and memory requirements. We investigate the proposed SLI method with a data set that includes approximately 11500 drill-hole data of coal thickness from Campbell County (Wyoming, USA). We also compare SLI with ordinary kriging (OK) in terms of estimation performance, using cross validation analysis, and computational time. According to the validation measures used, SLI performs slightly better in estimating seam thickness than OK in terms of cross-validation measures. In terms of computation time, SLI prediction is 3 to 25 times (depending on the size of the kriging neighborhood) faster than OK for the same grid size.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 24

page 25

page 27

page 31

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Kriging and its various flavors are the golden rule in geostatistics Christakos (1992); Cressie (1993); Wackernagel (2003); Olea (2012); Chilès and Delfiner (2012) and in mining applications of geostatistics in particular Journel and Kyriakidis (2004); Armstrong (1998); Dowd and Dare-Bryan (2007). However, the application of kriging to large datasets faces difficulties due to the computationally demanding inversion of the covariance matrix Sun et al. (2012); Zhong et al. (2016). Hence, various shortcuts and approximations have been developed that allow working with large datasets including the use of a finite search radius Rivoirard (1987), tapered covariance functions Furrer et al. (2006), and fixed rank kriging Cressie and Johannesson (2008). For reviews of the problem with large data see Sun et al. (2012); Lasinio et al. (2013). For data that are collected on rectangular grids there is an efficient solution which is based on Gaussian Markov random fields (GMRFs) Rue and Held (2005)

. However, this approach cannot be directly extended to scattered data, although it is possible to generalize GMRFs by linking them to stochastic partial differential equations 

Lindgren et al. (2011).

The motivation for stochastic local interaction (SLI) models is the possibility, known in the statistical field theories of physics and in Markov random field models, to construct spatial and spatiotemporal dependence based on energy functions with local interactions. In statistical physics this idea is present in the widely-used Boltzmann-Gibbs distributions that include the Gaussian field theory and the binary Ising model among others, e.g. Mussardo (2010).

Local interactions allow building sparse and explicit precision (inverse covariance) matrices for the spatial dependence. Hence, computationally fast algorithms can be designed for spatial prediction based on the conditional mean. The uncertainty of the estimates is easily assessed by means of the

conditional variance

which follows directly from the precision matrix. The explicit form of the precision matrix implies independence from the curse of covariance matrix inversion, which plagues the application of geostatistical methods to large datasets. Hence, such models can be used as computationally efficient alternatives to kriging.

An SLI model that involves gradient and curvature terms was presented in Hristopulos (2015b). This model has a precision matrix that is not demonstrably non-negative definite, even though it is in practice well-defined in most cases. Herein we present a simpler version of the model which does not include the curvature term and possesses a positive definite precision matrix.

The SLI model can be viewed as an extension of Gaussian Markov random fields (GMRFs) Rue and Held (2005) to scattered data by means of kernel functions. From a different viewpoint, the SLI model can be viewed as a discrete analogue of Spartan spatial random fields which are defined in continuum space Hristopulos (2003); Hristopulos and Elogne (2007); Hristopulos (2015a)

. In order to accommodate random sampling geometries, the SLI model incorporates ideas from machine learning such as the use of kernel functions. The range of the kernels is determined by a set of local bandwidths which is learned from the sample data. Hence, SLI prediction uses only a subset of the data for prediction at any given point. In kriging this restriction is imposed by empirically determining a search neighborhood by trial and error. In the case of SLI, the range of the kernel is automatically adjusted by the algorithm (see Section 

II.3). An additional advantage of SLI over kriging is that the former does not require covariance matrix inversion to compute the prediction and prediction uncertainty.

The remainder of this manuscript is structured as follows: Section II

presents the SLI methodology, including the SLI model, the equations for prediction and conditional standard deviation, and the leave-one-out cross validation procedure for model parameter estimation. In Section 

III the Campbell coal dataset is briefly described. Section IV presents the SLI model parameter estimation, while Section V focuses on the estimation of the variogram for ordinary kriging. Section VI provides the SLI-based and kriging-based estimates of coal resources and the associated uncertainties. Section VII studies the stability of the SLI estimates and compares the SLI and ordinary kriging predictive performance using leave-P-out cross validation. Finally, in Section VIII we present our conclusions, discuss potential applications of the method and analogies with other approaches, and outline directions for future research.

Ii Methodology

ii.1 Notation

In the following, denotes the sample values at the locations of the sampling set , where . The superscript denotes the matrix transpose. The prediction set involves the points which typically correspond to the nodes of a map grid . In general, and are disjoint sets. The SLI optimal prediction at these points will be denoted by . Assuming a constant mean, , fluctuations around the mean will be denoted by primes, i.e., . In case the mean is not homogeneous, a trend function is subtracted to obtain the fluctuations. The SLI model is defined is terms of the fluctuations. However, the constant mean or the trend function do not need to be known a priori, since they can be determined in the estimation stage.

ii.2 Kernel Functions

The SLI formulation uses ideas from kernel regression Nadaraya (1964); Watson (1964) in order to express the local spatial dependence. The interactions between neighboring points are expressed in terms of suitably selected weighting functions, which are supplied by kernel functions.

A kernel (weighting) function is a non-negative function that assigns a real value to any pair of points and . The range of the kernel is determined by a characteristic length scale known as the bandwidth and denoted by . The bandwidth should tend to zero as , so that for large and dense datasets emphasis is placed on points that are close to the target site. For let represent the normalized distance. A kernel function is (a) non-negative, i.e., for all ; (b) symmetric, i.e., for all ; (c) maximized at ; (d) a continuous function of .

A list of common kernel functions is given in Table 1. The first six functions of the table (triangular, Epanechnikov, quadratic, quartic, tricube, spherical, Cauchy) are compactly supported, while the last two functions (exponential, Gaussian) are infinitely extended but integrable.

ii.3 Kernel Weights

The kernel functions can be used to define weights for the interaction between two points and as follows

(1)

where

is the distance vector, while

is a bandwidth specific to the point that reflects the sampling density around that point. As a result of the local bandwidth definition in (1), the kernel weights are not necessarily symmetric with respect to interchange of the indices, i.e., in general , since the local sampling density around the point can be quite different than the sampling density around .

In general, one can consider different bandwidths. However, then the spatial model involves more parameters than data values, since the SLI model has other coefficients in addition to the bandwidths. It is better to define the local bandwidths in terms of a small set of free parameters. An intuitive and flexible choice is to use

(2)

where is the bandwidth assigned to the point , is the distance between and its -nearest neighbor in the sampling set , and is a positive bandwidth tuning parameter. The local distances can be efficiently computed using near-neighbor estimation algorithms, while the global coefficient becomes a model parameter that is determined from the data via the cross-validation procedure as described in Section II.7. The above definition of the bandwidth implies that larger bandwidth values are assigned to points in areas where the sampling is sparse.

Name Equation
Triangular
Epanechnikov
Quadratic
Quartic (biweight)
Tricube
Spherical
Truncated Cauchy
Exponential
Gaussian
Table 1: List of common kernel functions. The normalized distance is given by . if and if is the unit step function.

ii.4 Gradient-based SLI Energy Function

The SLI model is defined in terms of a Gibbs-Boltzmann exponential joint density of the following form Chilès and Delfiner (2012)

where is the SLI parameter vector that needs to be determined from the data. The function is the energy of the SLI model which incorporates the local interactions. The denominator is known in physics as the partition function and represents a normalization constant Mussardo (2010). It will not be necessary to evaluate this constant in order to estimate the parameters or to predict at non-measured positions.

In the case of scattered data, a parsimonious SLI energy functional that includes squared fluctuation and squared “gradient” terms is given by Hristopulos (2015a)

(3)

where is the vector of the local expected values (which is assumed constant herein), is a scale coefficient that is proportional to the overall variability of the data, is a positive rigidity coefficient that weighs the cost of gradients in the data, is the kernel-average square gradient, is a vector of local kernel bandwidths that is used to determine the local bandwidths, and .

The gradient-based energy terms are given by the following kernel average

(4)

where represents the Nadaraya-Watson average Nadaraya (1964); Watson (1964) of the square differences of the data values.

ii.5 SLI Precision Matrix Formulation

The SLI energy function (3) is a quadratic function of the sample values. Hence, it can be equivalently expressed in the following form that involves the precision matrix :

(5)

The precision matrix has the following structure

(6a)
and depends on the reduced parameter vector . In the above, is the identity matrix which results from the energy of the squared fluctuations, and is the gradient precision sub-matrix which results from the sum of the squared differences and is given by
(6b)

where is an abbreviation of the kernel weights (1) and if while if is the Kronecker delta.

Theorem 1.

The SLI precision matrix defined by (6) is strictly positive definite for and . Moreover, it is diagonally dominant.

Proof.

The matrix is positive definite if it is real-valued and symmetric, every non-zero and every it holds that The matrix defined by (6) is real-valued and symmetric by construction. In addition, if we set , it follows from (5) that . This equation holds for any sample and for any , thus for all possible . However, the energy is equivalently expressed based on (3) as follows

It is straightforward to show based on (3) and (4) that since , the energy is strictly positive for any non-zero vector . If all the entries of the vector take the same value, the gradient term, which is proportional to , vanishes; however, the first term is still positive. Hence, the matrix is strictly positive definite.

Diagonal dominance requires that for all the following condition be satisfied

The diagonal dominance condition is stricter than positive definiteness. Hence, the former implies the latter (but not vice versa). From (6) it follows that

(7a)
(7b)

Based on the above and the non-negativity of the weights, i.e., for all , it follows that

Hence the diagonal dominance of the SLI precision matrix is proved.

ii.6 SLI Prediction

In this section we show how to use the SLI model to estimate the unknown values of the field at the sites . Let us consider that one the prediction points is added to the sampling network. The SLI energy function is then modified as follows:

(8)

The first term in (8) is the sample energy (3), while the second and third terms represent the additional energy contribution that involves : the second term is the self-interaction of the prediction point which contributes proportionally to the square of the fluctuation, and the third term involves the interaction of the prediction point with the sampling points. The latter represents local gradients that reflect differences between the fluctuation at and neighboring sample points.

The energy (8

) determines the conditional probability density function at the prediction point, which is given by

(9)

The optimal SLI prediction at is equal to the value that minimizes the energy (8) and thus maximizes the conditional pdf. This value is a stationary point of the energy, i.e., it satisfies

where denotes the optimal parameter vector (excluding ) which is estimated from the data as shown in Section II.7. For now we assume that is known.

The first term in (8) excludes the prediction point and thus it drops out in the minimization. The first-order derivatives of the next two energy terms lead to a linear equation for the stationary point, which admits the following unique solution

(10)

In (10) we used the symmetry of the precision matrix, i.e., the fact that . The precision matrix entries that involve the prediction points are obtained from (7) as follows

(11a)
(11b)

In light of the kernel weight definition (1), the summands in (11a) are either zero (if the distance between and exceeds the local bandwidth) or positive (otherwise).

The predictor (10) is a stationary point of the energy. It remains to verify that it represents the minimum so that the prediction corresponds to the mode of the conditional pdf (9). The second-order derivative of the energy (8) with respect to is given by . According to (11) it holds that since and the kernel weights are positive. Hence the predictor indeed corresponds to the global minimum of the energy.

Properties of the SLI predictor

  1. Note that the prediction equation (10) is analogous to the equation that determines the conditional mean in GMRFs. This is not surprising since the SLI predictor finds the minimum energy at the prediction point, conditionally on the sample values.

  2. The SLI predictor is unbiased as can be seen from (10): the second term on the right hand side, which is proportional to the fluctuations, vanishes upon calculating the ensemble average.

  3. The SLI predictor is not an exact interpolator. According to (10), exactitude requires that if then , for all . This could only occur if the bandwidth tends to zero, or if it is sufficiently small to exclude all other points besides for all . However, the difference between the SLI prediction and the true value is usually negligible. This is due to the fact that the sample value at the nearest point to the prediction location has the largest linear SLI weight and hence the most impact on the prediction.

  4. The prediction is independent of the scale coefficient . This becomes obvious upon inspection of (10), since the prediction involves the ratio of precision matrix elements.

  5. The analogy of SLI with GMRFs allows us to calculate the conditional variance at the prediction point as

    (12)

    The conditional variance endows the SLI method with a measure of prediction uncertainty which is analogous to the kriging variance. Just like the kriging variance, the SLI conditional variance does not explicitly depend on the data values. It only depends indirectly to the extent that the data influence the model parameters.

Computational efficiency:

The numerical cost of the prediction equation (10) is proportional to . Since the predictor is applied times to generate the map grid, the numerical complexity is . According to (11) the numerical complexity of calculating the precision matrix is per prediction point; therefore the overall complexity is again . However, the kernel weights (1) involve a double summation over the sampling points which has a computation complexity , where . The maximum complexity rate, , is obtained for infinite-support kernels. For compactly supported kernels, rates can be achieved. In general, the upper bound of the SLI computational complexity is . The memory space required for storage of the precision matrix scales as , where is the sparsity of the precision matrix; for compactly supported kernels typically .

Kriging has a numerical complexity for the inversion of the covariance matrix, and for the calculation of the kriging weights and the prediction Zhong et al. (2016). The kriging complexity is practically dominated by the inversion of the covariance matrix. In addition, kriging requires the storage of the covariance matrix which consumes of memory space. The computational requirements of kriging can be reduced by introducing various modifications, e.g., using search neighborhoods with a small number of sampling points around each prediction point Olea (2012) or by means of tapered covariance functions that have compact supports Furrer et al. (2006).

ii.7 SLI Parameter Estimation

The SLI parameter vector is given by . In addition to the data, the estimated also depends on the kernel function. We use the following approach to estimate the SLI model parameters from the data:

  1. The mean is set equal to the sample average. This is a reasonable choice in the absence of spatial trends, especially for large datasets. However, it is also possible to consider as a parameter to be determined from the data by means of cross validation as shown by Hristopulos and Agou (2019).

  2. The order of the near-neighbor used to estimate the local bandwidths according to (2) is typically assigned an integer value determined by experimentation with the data. Herein we use ; we justify this choice below.

  3. The optimal rigidity coefficient and global bandwidth parameter are determined by means of leave-one-out (LOO) cross-validation (CV). In LOO CV sample points are removed one at a time. Their values are then estimated by means of the remaining sample points and the SLI prediction equation (10). After all the points are removed and estimated, a specified validation measure (e.g., mean absolute error, root mean square error) that quantifies the discrepancy between the estimates and the true values is calculated. The validation measure is used as the cost function for the optimization of the SLI parameters.

  4. The optimal value of the scale coefficient is determined by means of Hristopulos (2015b)

    (13)

    The value of is only used in determining the conditional variance but not for prediction.

The above procedure is repeated for different choices of the kernel function, and the kernel model which optimizes the cost function is selected.

Maximum likelihood estimation (MLE) is also possible within the SLI framework. However, in using MLE we are forced to estimate the partition function, , which involves calculating the determinant of the large, albeit sparse, precision matrix .

Iii Description of the Data and Study Area

We apply the SLI model to a set of coal thickness data from 11416 drill-holes. The data come from the Gillette field in Campbell County, Wyoming (USA). The locations of the drill-holes are shown in Fig. 1. Campbell County lies within the Powder River Basin and includes two of the largest coal mines in the world: North Antelope Rochelle coal mine and Black Thunder coal mine. The Gillette coal field contains significant quantities of exploitable coal resources that are low in sulfur and ash content.

Figure 1: Drill-hole locations of the coal thickness data from the Gillette field. The red line represents the border of the area within which the resources will be estimated.

The study domain extends over an area that covers approximately 9500 km2. We focus on coal beds from the upper part of Fort Union Formation which include the Roland bed and coal in the Wyodak-Anderson zone. The stratigraphic cross-section showcasing the Wyodak-Anderson coal zone for part of the Gillette Field is shown in Fig. 2. The geological formations present in the Gillette field are shown in Fig. 3 Ellis et al. (1999). Coal from the Wyodak zone has higher mean gross calorific value (4760 kcal/kg), lower mean ash yield (5.8%), and lower mean total sulfur content (0.46%) compared with the respective values for the combined Wyodak-Anderson coal zones Ellis (2002).

Figure 2: Structural cross section showing the Wyodak-Anderson coal zone (W-A Zone) of the Gillette coalfield Stricker et al. (2007).
Figure 3: Section showcasing geological formations and coal zones for the Gilette Field Stricker et al. (2007).

The main statistics of the marginal distribution of the coal thickness data are listed in Table 2

. The frequency histogram which represents the empirical probability distribution of the thickness values is shown in Fig. 

4. Both the statistics shown in Table 2 and the histogram in Fig. 4

show evidence of mild deviation from the normal (Gaussian) distribution. The spatial distribution of the drill-holes, and hence the sampling density, varies significantly in space as evidenced in Fig. 

1. The median of the nearest distance between drill-holes is 458 m.

Mean (m) Median (m) Maximum (m) Minimum (m) Standard deviation (m) Skewness
7.83 7.71 22.67 0.33 2.99 0.13
Table 2: Statistics of coal thickness in the upper part of Fort Union Formation based on data from the 11416 drill-hole locations of Gillette Field.
Figure 4: Frequency histogram of the coal thickness distribution in the upper part of Fort Union Formation based on data from 11416 drill holes in Gilette field.

All the numerical tests below are conducted and timed on a desktop computer running Windows 10 operating system equipped with Intel(R) Core(TM) i7-6700K CPU 4.00GHz and 16.0 GB RAM installed. The numerical implementation was performed in the MATLAB programming environment (release R2016a).

Iv SLI Model Estimation for Gillette Field

To use SLI for interpolation we first need to estimate the model parameters from the Gillette field coal data. We perform this task using LOO-CV as described in Section II.7. Kernel weighting functions are utilized in the SLI predictions as explained in Section II.2 to determine the weights given by equation (1). To find the optimal kernel function we use LOO-CV on the entire dataset and select the kernel function that optimizes the CV cost function. Herein we use the mean absolute error (MAE) as the cost function. The optimization is performed in the Matlab environment using the constrained minimization function fmincon with the interior-point algorithm Nocedal and Wright (2006).

iv.1 Kernel function selection

We compare the predictive performance of different compactly supported kernels from the list shown in Table 1. Compact kernels lead to sparse precision matrices (see Section II.5) that are computationally less intensive for big datasets. The comparison is based on the mean absolute error obtained by means of leave-one-out cross validation. The SLI predictions are based on the equation (10).

Table 3 shows the cross-validation MAE for the different kernel functions obtained using neighbor order to define the distances in equation (2). As is evidenced Table 3, the parameter that controls the gradient precision sub-matrix (6a) takes values close to 115 for all the kernels. The bandwidth control parameter shows more dependence on the kernel function. The MAE takes very similar values for all the studied kernels, ranging between m and m (truncating at two decimal points). Based on these results we select the spherical kernel which gives one of the lowest MAE (slightly less than m). The Epanechnikov kernel also returns practically the same MAE value, but a higher value for ; as the latter determines the bandwidth, higher values of imply increased memory requirements and computational time. In general, we expect that any of the kernels listed in Table 3 (except for the uniform kernel which has a slightly higher MAE) will lead to comparable performance.

Kernel MAE (m)
Uniform
Triangular
Quadratic
Biweight
Tricubic
Epanechnikov
Spherical
Table 3: Comparison of the mean absolute error (MAE) value in coal thickness estimation by SLI using different kernel functions with neighbor order and initial parameter values and . MAE is optimized by means of a constrained optimization method. The value of is constrained in the interval , and , where eps  is the floating-point relative accuracy.
Neighbor order MAE (m)
2 115.0003 2.6795 0.1671
3 115.0003 2.3675 0.1727 1.1199
4 115.0002 2.0518 0.1717 1.1188
Table 4: Comparison of the mean absolute error (MAE) value in coal thickness estimation by SLI using the spherical kernel function with neighbor order and initial parameter values and . The value of is constrained in the interval , and . The computational time was between 5 and 9 minutes.

iv.2 Distribution of SLI bandwidths

The distribution of bandwidths for the spherical kernel is shown in the histogram plot of Fig. 5. The bandwidths are obtained from (2) using and . As evidenced in the histogram plot, the bandwidth distribution is dispersed with a peak at  m; however, there are also values as large as  m. The dispersion of the distribution reflects the variations of the sampling density (cf. Fig. 1) in Gillette field: Larger (smaller) values of the bandwidth are obtained in areas of sparser (denser) sampling. The choice of neighbor order is based on Table 4, which compares the parameter estimates and the MAE values obtained for three different neighbor orders: . The table shows that gives the lowest MAE; however, all the MAE are equal within the second decimal place. In order to be impartial in our selection we choose . As the neighbor order increases, the optimal decreases (see Table 4), thus tending to keep the value of the optimal bandwidth constant. This behavior further demonstrates the ability of SLI to adjust the bandwidth adaptively without user input.

Figure 5: Frequency histogram of the kernel bandwidths (m) used in the SLI method. The bandwidths were determined from (2) using the spherical kernel with near-neighbor order .

iv.3 Parameter estimation using global optimization

Note that the estimated is very close to the initial guess . This result suggests that the optimization, which is performed by means of the constrained optimization Matlab function fmincon with an interior-point algorithm, may be stuck in a local minimum. To investigate this possibility we use the global search optimization function in Matlab, which can find global solutions to problems that contain multiple maxima or minima. This procedure is time intensive, but it allows for a better exploration of the SLI parameter space. We use the Matlab GlobalSearch solver which generates a sequence of starting points (initial guesses) in combination with the interior-point local optimization routine Ugray et al. (2007). The global optimizer is run with different initial values and bounds for .

The results of global optimization for four different combinations of initial values and bounds for are reported in Table 5. All runs take several hours to complete, due to the exhaustive search of the parameter space by the global optimizer. As evidenced in Table 5 the global optimizer gives different solutions depending on the initial conditions. These solutions are in a different region of parameter space than the local optimum . However, all the solutions lead to very similar results for the cost function (i.e., the MAE). This investigation shows that different local minima of the SLI cost function are essentially identical with respect to prediction performance. We further discuss this issue in Section VII. Note that both the global and local solutions give essentially the same estimate for . At the same time, the ratio is also approximately the same for both the global ( 670) and the local ( 676) estimates. This observation is further analyzed and justified in Section VII.1.

Constraints and Initialization MAE (m)
[eps, ],   670.48
,   7.3092 2.3743 1.0900 669.72
[eps, ],   8.3285 2.3762 1.2427 1.1196 669.91
,   8.2394 2.3743 1.2287 1.1196 670.58
Table 5: Estimates of SLI parameters using LOO-CV and optimal mean absolute error (MAE) for the coal thickness data. The estimates are derived using the Matlab global optimization solver GlobalSearch with different initial guesses and constraint intervals for . The SLI model is implemented using the spherical kernel function with neighbor order . The parameter is constrained in the interval and its initial guess is .

V Variogram Modeling

In this section we estimate the variogram model that will be used for ordinary kriging interpolation and resources estimation. We use the robust variogram estimator Cressie and Hawkins (1980) for the empirical variogram of coal thickness which we then fit to five common variogram models.

The robust variogram estimator is given by

(14)

where is the distance (lag) vector between two points and is the number of pairs separated by .

We used (14) to calculate the omnidirectional variogram. We then tested five different theoretical variogram models (Spherical, Cubic, Gaussian, Power, and Exponential) combined with a nugget term. The fits of the theoretical models to the empirical variogram are presented in Fig. 6. The best fitting model was selected by minimizing the sum of weighted squared errors given by (15). The sum of weighted squared errors for the five variogram models tested are shown in Table 6. The spherical model is shown to minimize the sum of weighted squared errors. The optimal parameters for the spherical variogram model are shown in Table 7.

(15)
Variogram Model Gaussian Cubic Spherical Power Exponential
Fitting Error 11.55 2.67 1.92 6.79 3.37
Table 6: Sum of weighted squared errors between the empirical variogram and the five theoretical models that are tested based on equation (15) (units are in m4).
Figure 6: Robust variogram estimator (circles) and optimal fits to theoretical models (Gaussian, Spherical, Power, Exponential, and Cubic). The horizontal axis is the lag distance . The vertical axis represents the variogram values of coal thickness.
(m2) (km) (m2)
8.05 29.1 1.07
Table 7: Parameters of the optimal spherical variogram model: is the correlated variance, is the correlation length, and is the nugget variance. The parameters are estimated by minimizing the error function (15).

The optimality of the spherical function for both the variogram model and the SLI spherical kernel is coincidental. The spherical variogram model implies a spherical covariance, while the spherical kernel in SLI enters in the precision matrix. Finally, we used the omnidirectional variogram estimator and did not consider anisotropic variogram models. We estimated geometric anisotropy using the method of covariance Hessian identity Hristopulos (2002); Chorti and Hristopulos (2008). This led to an estimate of the correlation length ratio along the two principal directions slightly less than two, which is considered as marginally isotropic.

Vi Estimation of Coal Resources

In this section we estimate the coal resources of Gillette field using SLI and ordinary kriging. Our goal is to compare the SLI predictions with those of ordinary kriging. This comparison allows assessing the interpolation performance of SLI against an established stochastic method. The estimate of the resources is based on estimated coal thickness values at the nodes of an interpolation grid. The interpolation domain (contained inside the red-line border in Fig. 1) is covered by a map grid comprising orthogonal (approximately square) cells. The cell size is 114.7 m 134.7 m.

vi.1 SLI Estimation of Coal Resources

The coal deposits in the upper part of Fort Union Formation in Gillette field are estimated by SLI interpolation at a total volume equal to

(16)

where is the surface area of the interpolation grid cell and the estimated coal thickness for the layers described in Section III at the center points of the grid. These estimates are based on the spherical kernel function and the model parameters of Table 3.

The SLI method estimates both the most probable coal thickness at each point and the respective prediction variance. The coal thickness prediction map is shown in Fig. 6(a). The standard deviation of the SLI prediction, i.e., the square root of the conditional variance of the SLI predictor given by (12), is shown in Fig. 7(a).

(a) SLI Map
(b) SLI StD
Figure 7: SLI estimates of coal thickness and standard deviation for the upper part of Fort Union Formation in Gillette field.

An interesting feature of the SLI standard deviation map is the low uncertainty values near the Northeastern border of the study area which is a region of low sampling density. However, as we show in Fig. 8, both the bandwidth and the number of SLI neighbors (i.e., sampling points that are coupled with the prediction point via the SLI model) are rather high (i.e., more than 600) near the boundary. Hence, the uncertainty in this area is constrained by a large, albeit distant, number of neighbors. The bandwidth for each prediction point is determined via the optimization process based on the neighbor order, the kernel function, and the parameter . The bandwidth values can thus vary considerably. This is in contrast with fixed-radius kriging, which maintains a constant search radius for all the prediction points. The number of SLI neighbors for each prediction point is determined by the local bandwidth and the sampling point configuration. An inspection of the SLI standard deviation map (Fig. 7(a)) and the map of local neighbors (Fig. 7(b)) shows that the number of neighbors is correlated with the SLI standard deviation. The SLI uncertainty seems to behave more like the standard deviation derived from several stochastic simulations, in which the uncertainty is proportional both to the data spacing and to the complexity in the fluctuations, than like the kriging standard deviation.

(a) Log bandwidth
(b) Local neighbors
Figure 8: (Left): Logarithm of the SLI bandwidth over the map area. (Right): Logarithm of number of neighbors per map grid point.

vi.2 OK Estimation of Coal Resources

For the estimation of the coal resources in the study area and the visualization of the spatial distribution of coal thickness we use an OK interpolation grid identical with the SLI grid. We tested four different circular search neighborhoods. The respective search radii are equal to 1.5 km, 5.7 km, 9.1 km and 14.6 km. These values were guided by the correlation length of the selected spherical model ( km). We implemented LOO-CV for each search radius, using the same variogram parameters (shown in Table 7).

Results for cross validation measures obtained with the search radii of 9.1 km and 14.6 km are shown in Table 8. There is no significant difference between the CV measures obtained with these radii and those obtained with radii of 1.5 km and 5.7 km. For the purpose of mapping the coal thickness we use the radius of 14.6 km because it allows almost full coverage of the map grid. The resulting kriging maps of coal thickness and its standard deviation are shown in Figure 9. Note that the colorbar scale next to the OK standard deviation extends to values less than 1 m, i.e., lower than the nugget standard deviation ( m). This is due to using common color scales for the OK and SLI maps. The values of the OK standard deviation are actually larger than the nugget standard deviation.

Figure 9: (a) Map of coal thickness based on ordinary kriging. (b) Map of OK standard deviation . Both maps are obtained using the spherical variogram model with the optimal parameters (see Table 7) and a search neighborhood of 14.6 km.

Based on ordinary kriging the coal deposits are estimated at 72.4 m3, using the spherical variogram model and model parameters of Table 7 with neighborhood 14.6 km. The SLI prediction takes  12 min, while kriging with a search radius of 9.1 km (14.6 km) takes  42.7 min ( 319.2 min).

vi.3 Method Comparisons

In this section we compare the performance of SLI and OK interpolation in terms of (i) the LOO-CV error of the coal thickness estimates (ii) various cross validation statistics and (iii) the interpolated maps of coal thickness.

First, we focus on the leave-one-out cross validation error which is given by

where is the SLI (OK) estimate at the point based on the remaining points and the parameter vector estimate derived from the points.

The histograms of the LOO-CV errors are shown in Fig. 10 and exhibit overall good agreement. The error distributions are symmetric, reflecting the unbiasedness of the SLI and OK predictors. The vast majority of the error values are distributed between 5 m and 5 m. However, there are a few error values with magnitudes between 5m and 10 m as well as some values with larger magnitude.

Figure 10

also exhibits box plots of the LOO-CV errors. The central mark of the box plots is the median of the empirical distribution, and the edges of the box correspond to the 25th and 75th percentiles. The whiskers extend to the most extreme values that are not considered outliers. Values are considered as outliers if they lie outside the range

, where and

are the 25th and 75th percentiles of the sample data. This range corresponds to the 99.3% probability interval for the normal distribution. Outliers are plotted as individual points (marked by crosses). A comparison of the two box plots shows that the SLI error is less disperse, while the OK error has more weight in the tails of the distribution, including a couple of values with magnitudes above 15.

Figure 10: Left: Histograms of the SLI (blue online) and OK (yellow online) leave-one-out cross validation errors. The SLI parameters are given in Table 3. The spherical variogram parameters used in OK are given in Table 7.A kriging search radius of 9.1 km is used. Right: Box plots of the SLI and OK errors (see text for definition of the box plot elements).

Next, we compare the predictive performance of SLI and OK in Table 8 by means of cross-validation statistics. The following cross-validation measures are considered:

  1. Mean Error (ME) to represent bias.

  2. Mean Absolute Error (MAE).

  3. Root Mean Square Error (RMSE).

  4. Pearson’s linear correlation coefficient ().

  5. Maximum Absolute Error (MaxAE).

  6. Mean Absolute Relative Error (MARE).

  7. Root Mean Square Relative Error (RMSRE).

The SLI estimates have lower mean absolute error, root mean square error, maximum absolute error and higher correlation coefficient than the kriging estimates. On the other hand, kriging shows less bias, and lower relative MAE and RMSE.

Method ME (m) MAE (m) MARE RMSE (m) RMSRE MaxAE (m)
OK (9.1 km S.R.) 4.0224 1.1788 0.2480 1.8067 0.8312 16.7286 80.27%
OK (14.6 km S.R.) 4.0573 1.1787 0.2480 1.8066 0.8313 16.7290 80.27%
SLI () 5.6420 1.1199 0.2480 1.7033 0.8527 14.8781 82.23%
Table 8: LOO-CV measures of coal thickness based on (i) ordinary kriging with the spherical variogram model and (ii) SLI with spherical kernel and neighbor order . The variogram parameters are given in Table 7 and the SLI parameters are given in Table 3. A kriging search radius equal to 9.1 km is used. The following validation measures are shown: MAE: mean absolute error; MaxAE: Maximum absolute error; MARE: mean absolute relative error; RMSE: root mean square error; RMSRE: root mean square relative error; : Pearson’s correlation coefficient. S.R.: Search radius.

Finally, the difference between the SLI and OK estimates of coal thickness over the map grid is illustrated in Fig. 11. The four intervals used are defined as follows: OKSLI corresponds to differences less than m (higher OK estimates), OK=SLI includes differences between m and 1m, SLIOK contains differences between 1 and 3.5m, and SLIOK differences higher than 3.5m. The difference between the SLI and OK predictions ranges from  m (higher OK estimates) to 9.67 m (higher SLI estimates). Figure 10(a) shows that the SLI estimate of coal thickness at most map grid locations is approximately equal to OK estimates. At locations where this is not true, SLI estimates tend to exceed the OK estimates more frequently than the opposite. This leads to the overall higher SLI-based estimate of resources ( versus based on OK).

Regarding the spatial distribution of prediction uncertainty, Fig. 10(b) shows that the SLI standard deviation in most places exceeds the OK standard deviation. This is not surprising, since the OK predictor is constructed based on the minimization of the mean error variance while the SLI method does not constrain the variance. At the same time, there are spatial pockets where the SLI uncertainty is lower than the respective OK value. A juxtaposition of the plots in Fig. 10(b) and Fig. 8 shows that areas with lower SLI variance (e.g., near the northeastern border of the study domain) coincide with areas that have high bandwidth and, especially, high number of SLI neighbors. The presence of many neighbors tends to constrain the uncertainty, since the precision matrix elements include a higher number of positive terms and , according to (11a); this tends to increase and thus, according to (12), reduce the variance.

(a) Prediction differences
(b) Std differences
Figure 11: (a) Map of coal thickness differences between the SLI and OK estimates of coal thickness (m). (b) Map of the standard deviation differences () of coal thickness. SLI maps are obtained using the spherical kernel and neighbor order (see Table 4). Kriging maps are obtained using the spherical variogram model with the optimal parameters (see Table 7) and a search neighborhood of 14.6 km.

Vii Cross Validation Analysis

In this section we use the method of cross validation Arlot et al. (2010); Chilès and Delfiner (2012) with two goals in mind: (i) to investigate the stability of the SLI model parameter estimates and (ii) to better assess the predictive performance of SLI and Kriging.

In cross validation studies the dataset is split in two disjoint sets: the test set is used to estimate the model parameters, while the training set includes the points where the model performance is tested. The performance of the model is quantified by calculating a validation statistic (see below) that measures the difference between the data values in the test set and those estimated by means of the model. The split of the dataset is performed multiple times resulting in different configurations for the test and training sets. This leads to a set of validation statistics which can be averaged over all cross-validation configurations, thus providing a statistical measure of the model’s performance.

Herein we use leave-P-out cross validation (LPO-CV), where is an integer number which determines how many points are contained in the test set and how many points are contained in the training set. Since the Campbell dataset includes a large number of data points, we use , where is the floor function which returns the greatest integer that does not exceed , and is a rate that determines which percentage of the data are used to test the model. In the current study we use 100 repetitions (folds) for a small training set with (i.e., we keep ten percent of the data in the test set) and with a large training set with . The test and training sets are randomly selected and they are used in cross validation analysis with both kriging and SLI.

vii.1 SLI Parameter Stability Analysis

We investigate the dependence of the SLI parameters on the number and the configuration of the sampling points. We are motivated by the observations in Section IV.3, multiple local optima with very similar values of the cross-validation cost function seem to exist. The parameter is not further investigated since its estimate is stable (cf. Section IV.3). The SLI parameter estimates are based on the spherical kernel function with neighbor order . The initial guesses for the parameter values are and . The (local) constrained minimization algorithm is used.

The SLI parameters and estimated for each of the 100 folds based on a small training set (1142 locations) are presented in Fig. 12. As evidenced in Fig. 11(a), 61 estimates of have values in the range of . However, there are also two estimates of that are of order , two estimates in the range , 34 estimates of that are of order

. While this behavior may seem odd in the first place, the ratio of

is approximately constant, i.e., as we show in Fig. 11(b). Note that similar behavior was observed in LOO-CV (see Section IV.3), albeit the value of the ratio was different (. The value of the ratio in general depends on the configuration of the sampling points (cf. Table 5).

(a)
(b) vs.
Figure 12: SLI and parameters estimated from 100 small training sets that include 10% of the data. (a) Values of the parameter. (b) Scatter plot of versus (logarithmic axes are used).

In order to develop some understanding for this behavior, let us recast the SLI predictive equations (10) and (11) as follows

(17)

where

Equation (17) shows that the prediction is independent of . In addition, if , the prediction essentially remains invariant if is replaced by , since the factors in the numerator and the denominator cancel out. Then, SLI energy is also multiplied by . Finally, the new value of according to (13) is also multiplied with since it is proportional to the energy. These arguments show that different values of and with constant ratio lead to nearly identical predictions, provided that the condition is fulfilled. Exact proportionality only holds if the above condition is fulfilled for all the prediction points.

We repeat the same experiment using large training sets that include (i.e., ) of the points. In this case the estimates of are close to the initial guess for all the subsets. Similarly, estimates are in the range . The results are shown in Fig. 13. In this case the estimates of are considerably more stable and there is no apparent trend. This is explained by the higher sampling density of larger training sets, and the fact that the training sets share many points (since of the total number of points are used in each training set).

(a)
(b) vs.
Figure 13: SLI and parameters estimated from 100 large training sets that include 90% of the data. (a) Values of the parameter. (b) Scatter plot of versus .

vii.2 Comparison of Cross-Validation Statistics

In this section we compare the interpolation performance of SLI and OK in terms of the cross-validation statistics, using results from both the small and large training sets. In the SLI analysis the spherical kernel with neighbor order is used while in OK the spherical model with nugget is fitted to the empirical variogram and a kriging search radius of 9.1 km is used to reduce computational time. This radius is sufficient for cross-validation analysis.

In Fig. 14 we compare the box plots for ME, MAE, RMSE and R for the two methods based on cross validation analysis of small training sets. In addition, Fig. 15 extends the comparison to large training sets. The main patterns evidenced in the box plots between the SLI and the OK cross validation statistics are the following:

  1. The SLI MAE and RMSE have lower median values than the respective OK measures (for both small and large training sets).

  2. The correlation coefficient has a higher median value for SLI than for OK (also for both small and large training sets).

  3. In the case of small training sets, the ME median for OK is closer to zero (0.0055) than for SLI (0.0155). In addition, the inter-quartile range is higher for SLI (0.0898) than for OK (0.0770).

  4. This pattern is reversed for large training sets: the OK-based ME median is while the SLI-based ME median is . The inter-quartile range is higher for OK () than for SLI (). So, it seems that SLI makes more efficient use of the higher sampling density.

Figure 14: Box plots of cross-validation statistics for the SLI and OK methods. The CV statistics are calculated using a training set that contains 10% of the data points and a validation set that comprises 90% of the points.
Figure 15: Box plots of cross-validation statistics for the SLI and OK methods. The CV statistics are calculated using a training set that contains 90% of the data points and a validation set that comprises 10% of the points.

Viii Conclusions

This paper presents a stochastic local interaction (SLI) model for spatial estimation which is based on a generalized concept of the square gradient for scattered data. The SLI spatial prediction is expressed, conditionally on the data, in terms of a sparse precision matrix, which is demonstrably positive definite. The precision matrix allows fast estimation of the mean and variance of the conditional distribution at the prediction locations. The sparse SLI precision matrix is a feature analogous to that of Gaussian Markov random fields defined on regular grids. Hence, SLI extends the original GMRF idea by means of kernel functions that implement local interactions between the nodes of unstructured networks. This implementation is suitable for modeling scattered spatial data. In addition, SLI allows efficient processing of large datasets, because it incorporates spatial correlations in the form of sparse precision matrices. For example, for the current study the sparsity index is , i.e., less than 1% of the precision matrix elements are non-zero. The sparsity feature implies considerably reduced memory space requirements. In addition, matrix inversion is not required to derive the SLI estimates and their standard deviations at unmeasured locations, thus reducing the necessary computational time.

With respect to the analysis of the Campbell coal dataset, SLI gives comparable (slightly better) cross-validation statistics compared to those obtained with Ordinary Kriging with a finite search radius. In addition, SLI estimates a higher volume of coal in the study area than OK. Regarding the prediction uncertainty, the SLI estimates of conditional variance are overall higher than those of Ordinary Kriging. This is not surprising, since the SLI predictor is not based on optimization of the error variance. In terms of computational speed, filling the interpolation grid with the SLI predictor is about 3-25 times faster than with ordinary kriging ( 12 min with SLI versus 42 min with a search radius of 9.1 km and 5 hr with a search radius of 14.6 km). Overall, we have shown that SLI is a competitive method for the spatial interpolation of scattered data which can be applied to the estimation of natural resources, providing computational benefits for large datasets. In conclusion, we believe that SLI will provide a competitive tool for spatial analysis of large spatial datasets.

In terms of future research, anisotropic and non-stationary extensions of SLI are possible, as well as the development of SLI-based conditional simulation capabilities that can take advantage of the sparse precision matrix structure. Geometric anisotropy can be handled using a directional measure of distance in the kernel functions. Non-stationarity can be addressed in two ways: firstly, by introducing a variable mean function instead of a constant in the SLI energy (3), and secondly by replacing the rigidity parameter with a rigidity function . The application of SLI to spatiotemporal data is also possible, and a first step in this direction is presented in Hristopulos and Agou (2019). More general SLI models with additional terms in the energy function and multiple variates can also be developed. Finally, the positive-definiteness of the precision matrix is maintained even if non-Euclidean measures of distance are used in the kernel functions. This is a potential advantage compared to standard geostatistical methods which are based on covariance function models, since the latter are not necessarily permissible for non-Euclidean distance metrics.

Acknowledgment

We would like to thank Ricardo Olea (United States Geological Survey, Reston, Virginia, USA) who provided the coal data analyzed and graciously read a draft of this manuscript, offering valuable comments and insights.

References

References

  • Arlot et al. (2010) Arlot S, Celisse A, et al. (2010) A survey of cross-validation procedures for model selection. Statistics Surveys 4:40–79
  • Armstrong (1998) Armstrong M (1998) Basic Linear Geostatistics. Berlin, Germany: Springer
  • Chilès and Delfiner (2012) Chilès J P, Delfiner P (2012) Geostatistics: Modeling Spatial Uncertainty. New York, NY, USA: John Wiley & Sons, 2nd edition
  • Chorti and Hristopulos (2008) Chorti A, Hristopulos D T (2008) Non-parametric identification of anisotropic (elliptic) correlations in spatially distributed data sets. IEEE Transactions on Signal Processing 56(10):4738–4751
  • Christakos (1992) Christakos G (1992) Random Field Models in Earth Sciences. San Diego: Academic Press
  • Cressie (1993) Cressie N (1993) Spatial Statistics. New York, NY, USA: John Wiley & Sons
  • Cressie and Hawkins (1980) Cressie N, Hawkins M (1980) Robust estimation of the variogram: I. Mathematical Geology 12(2):115–25
  • Cressie and Johannesson (2008) Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(1):209–226
  • Dowd and Dare-Bryan (2007) Dowd P A, Dare-Bryan P C (2007) Planning, designing and optimising production using geostatistical simulation. In Dimitrakopoulos R, editor, Orebody Modelling and Strategic Mine Planning, Carlton, Victoria, Australia: The Australasian Institute of Mining and Metallurgy, Spectrum Series, 2nd edition, 363–378
  • Ellis (2002) Ellis M S (2002) Quality of Economically Extractable Coal Beds in the Gillette Coal Field as Compared With Other Tertiary Coal Beds in the Powder River Basin, Wyoming and Montana. Technical Report No. 2002-174, U.S. Geological Survey
  • Ellis et al. (1999) Ellis M S, Flores R M, Ochs A M, Stricker G D, Gunther G L, Rossi G S, Bader L R, Schuenemeyer J H, Power H C (1999) Gillette Coalfield, Powder River Basin: Geology, Coal Quality, and Coal Resources. Technical report, U.S. Geological Survey Professional Paper 1625-A, Chapter PG
  • Furrer et al. (2006) Furrer R, Genton M G, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics 15(3):502–523
  • Hristopulos (2003) Hristopulos D (2003) Spartan Gibbs random field models for geostatistical applications. SIAM Journal on Scientific Computing 24(6):2125–2162
  • Hristopulos and Elogne (2007) Hristopulos D, Elogne S (2007) Analytical properties and covariance functions for a new class of generalized gibbs random fields. IEEE Transactions on Information Theory 53(12):4667–4679
  • Hristopulos (2002)

    Hristopulos D T (2002) New anisotropic covariance models and estimation of anisotropic parameters based on the covariance tensor identity. Stochastic Environmental Research and Risk Assessment 16(1):43–62

  • Hristopulos (2015a) Hristopulos D T (2015a) Covariance functions motivated by spatial random field models with local interactions. Stochastic Environmental Research and Risk Assessment 29(3):739–754
  • Hristopulos (2015b) Hristopulos D T (2015b) Stochastic local interaction (SLI) model: Bridging machine learning and geostatistics. Computers and Geosciences 85(Part B):26–37
  • Hristopulos and Agou (2019) Hristopulos D T, Agou V D (2019) Stochastic local interaction model with sparse precision matrix for space-time interpolation. arXiv preprint arXiv:190207780
  • Journel and Kyriakidis (2004) Journel A G, Kyriakidis P C (2004) Evaluation of Mineral Reserves: A Simulation Approach. Oxford, UK: Oxford University Press
  • Lasinio et al. (2013) Lasinio G J, Mastrantonio G, Pollice A (2013) Discussing the “big n problem”. Statistical Methods & Applications 22(1):97–112
  • Lindgren et al. (2011) Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: The SPDE approach. Journal of the Royal Statistical Society, Series B 73(4):423–498
  • Mussardo (2010) Mussardo G (2010) Statistical Field Theory. Oxford University Press
  • Nadaraya (1964) Nadaraya E A (1964) On estimating regression. Theory of Probability and its Applications 9(1):141–142
  • Nocedal and Wright (2006) Nocedal J, Wright S (2006) Numerical Optimization. New York, NY: Springer Science & Business Media, 2nd edition
  • Olea (2012) Olea R A (2012) Geostatistics for Engineers and Earth Scientists. New York, NY, USA: Springer Science & Business Media
  • Rivoirard (1987) Rivoirard J (1987) Two key parameters when choosing the kriging neighborhood. Mathematical Geology 19(8):851–856
  • Rue and Held (2005) Rue H, Held L (2005) Gaussian Markov Random Fields: Theory and Applications. Boca Raton, FL: Chapman and Hall/CRC
  • Stricker et al. (2007) Stricker G D, Flores R M, Trippi M H, Ellis M S, Olson C M, Sullivan J E, Takahashi K I (2007) Coal quality and major, minor, and trace elements in the powder river, green river, and williston basins, wyoming and north dakota. Technical report, US Geological Survey
  • Sun et al. (2012) Sun Y, Li B, Genton M G (2012) Geostatistics for large datasets. In Porcu E, Montero J, Schlather M, editors, Advances and Challenges in Space-time Modelling of Natural Events, Lecture Notes in Statistics, Springer Berlin Heidelberg, 55–77
  • Ugray et al. (2007) Ugray Z, Lasdon L, Plummer J, Glover F, Kelly J, Martí R (2007) Scatter search and local NLP solvers: A multistart framework for global optimization. INFORMS Journal on Computing 19(3):328–340
  • Wackernagel (2003) Wackernagel H (2003) Multivariate Geostatistics. Berlin, Germany: Springer Verlag
  • Watson (1964)

    Watson G S (1964) Smooth regression analysis. Sankhya Ser A 26(1):359–372

  • Zhong et al. (2016) Zhong X, Kealy A, Duckham M (2016) Stream kriging: Incremental and recursive ordinary kriging over spatiotemporal data streams. Computers & Geosciences 90:134–143