I Introduction
Kriging and its various flavors are the golden rule in geostatistics Christakos (1992); Cressie (1993); Wackernagel (2003); Olea (2012); Chilès and Delfiner (2012) and in mining applications of geostatistics in particular Journel and Kyriakidis (2004); Armstrong (1998); Dowd and DareBryan (2007). However, the application of kriging to large datasets faces difficulties due to the computationally demanding inversion of the covariance matrix Sun et al. (2012); Zhong et al. (2016). Hence, various shortcuts and approximations have been developed that allow working with large datasets including the use of a finite search radius Rivoirard (1987), tapered covariance functions Furrer et al. (2006), and fixed rank kriging Cressie and Johannesson (2008). For reviews of the problem with large data see Sun et al. (2012); Lasinio et al. (2013). For data that are collected on rectangular grids there is an efficient solution which is based on Gaussian Markov random fields (GMRFs) Rue and Held (2005)
. However, this approach cannot be directly extended to scattered data, although it is possible to generalize GMRFs by linking them to stochastic partial differential equations
Lindgren et al. (2011).The motivation for stochastic local interaction (SLI) models is the possibility, known in the statistical field theories of physics and in Markov random field models, to construct spatial and spatiotemporal dependence based on energy functions with local interactions. In statistical physics this idea is present in the widelyused BoltzmannGibbs distributions that include the Gaussian field theory and the binary Ising model among others, e.g. Mussardo (2010).
Local interactions allow building sparse and explicit precision (inverse covariance) matrices for the spatial dependence. Hence, computationally fast algorithms can be designed for spatial prediction based on the conditional mean. The uncertainty of the estimates is easily assessed by means of the
conditional variance
which follows directly from the precision matrix. The explicit form of the precision matrix implies independence from the curse of covariance matrix inversion, which plagues the application of geostatistical methods to large datasets. Hence, such models can be used as computationally efficient alternatives to kriging.An SLI model that involves gradient and curvature terms was presented in Hristopulos (2015b). This model has a precision matrix that is not demonstrably nonnegative definite, even though it is in practice welldefined in most cases. Herein we present a simpler version of the model which does not include the curvature term and possesses a positive definite precision matrix.
The SLI model can be viewed as an extension of Gaussian Markov random fields (GMRFs) Rue and Held (2005) to scattered data by means of kernel functions. From a different viewpoint, the SLI model can be viewed as a discrete analogue of Spartan spatial random fields which are defined in continuum space Hristopulos (2003); Hristopulos and Elogne (2007); Hristopulos (2015a)
. In order to accommodate random sampling geometries, the SLI model incorporates ideas from machine learning such as the use of kernel functions. The range of the kernels is determined by a set of local bandwidths which is learned from the sample data. Hence, SLI prediction uses only a subset of the data for prediction at any given point. In kriging this restriction is imposed by empirically determining a search neighborhood by trial and error. In the case of SLI, the range of the kernel is automatically adjusted by the algorithm (see Section
II.3). An additional advantage of SLI over kriging is that the former does not require covariance matrix inversion to compute the prediction and prediction uncertainty.The remainder of this manuscript is structured as follows: Section II
presents the SLI methodology, including the SLI model, the equations for prediction and conditional standard deviation, and the leaveoneout cross validation procedure for model parameter estimation. In Section
III the Campbell coal dataset is briefly described. Section IV presents the SLI model parameter estimation, while Section V focuses on the estimation of the variogram for ordinary kriging. Section VI provides the SLIbased and krigingbased estimates of coal resources and the associated uncertainties. Section VII studies the stability of the SLI estimates and compares the SLI and ordinary kriging predictive performance using leavePout cross validation. Finally, in Section VIII we present our conclusions, discuss potential applications of the method and analogies with other approaches, and outline directions for future research.Ii Methodology
ii.1 Notation
In the following, denotes the sample values at the locations of the sampling set , where . The superscript denotes the matrix transpose. The prediction set involves the points which typically correspond to the nodes of a map grid . In general, and are disjoint sets. The SLI optimal prediction at these points will be denoted by . Assuming a constant mean, , fluctuations around the mean will be denoted by primes, i.e., . In case the mean is not homogeneous, a trend function is subtracted to obtain the fluctuations. The SLI model is defined is terms of the fluctuations. However, the constant mean or the trend function do not need to be known a priori, since they can be determined in the estimation stage.
ii.2 Kernel Functions
The SLI formulation uses ideas from kernel regression Nadaraya (1964); Watson (1964) in order to express the local spatial dependence. The interactions between neighboring points are expressed in terms of suitably selected weighting functions, which are supplied by kernel functions.
A kernel (weighting) function is a nonnegative function that assigns a real value to any pair of points and . The range of the kernel is determined by a characteristic length scale known as the bandwidth and denoted by . The bandwidth should tend to zero as , so that for large and dense datasets emphasis is placed on points that are close to the target site. For let represent the normalized distance. A kernel function is (a) nonnegative, i.e., for all ; (b) symmetric, i.e., for all ; (c) maximized at ; (d) a continuous function of .
A list of common kernel functions is given in Table 1. The first six functions of the table (triangular, Epanechnikov, quadratic, quartic, tricube, spherical, Cauchy) are compactly supported, while the last two functions (exponential, Gaussian) are infinitely extended but integrable.
ii.3 Kernel Weights
The kernel functions can be used to define weights for the interaction between two points and as follows
(1) 
where
is the distance vector, while
is a bandwidth specific to the point that reflects the sampling density around that point. As a result of the local bandwidth definition in (1), the kernel weights are not necessarily symmetric with respect to interchange of the indices, i.e., in general , since the local sampling density around the point can be quite different than the sampling density around .In general, one can consider different bandwidths. However, then the spatial model involves more parameters than data values, since the SLI model has other coefficients in addition to the bandwidths. It is better to define the local bandwidths in terms of a small set of free parameters. An intuitive and flexible choice is to use
(2) 
where is the bandwidth assigned to the point , is the distance between and its nearest neighbor in the sampling set , and is a positive bandwidth tuning parameter. The local distances can be efficiently computed using nearneighbor estimation algorithms, while the global coefficient becomes a model parameter that is determined from the data via the crossvalidation procedure as described in Section II.7. The above definition of the bandwidth implies that larger bandwidth values are assigned to points in areas where the sampling is sparse.
Name  Equation 

Triangular  
Epanechnikov  
Quadratic  
Quartic (biweight)  
Tricube  
Spherical  
Truncated Cauchy  
Exponential  
Gaussian 
ii.4 Gradientbased SLI Energy Function
The SLI model is defined in terms of a GibbsBoltzmann exponential joint density of the following form Chilès and Delfiner (2012)
where is the SLI parameter vector that needs to be determined from the data. The function is the energy of the SLI model which incorporates the local interactions. The denominator is known in physics as the partition function and represents a normalization constant Mussardo (2010). It will not be necessary to evaluate this constant in order to estimate the parameters or to predict at nonmeasured positions.
In the case of scattered data, a parsimonious SLI energy functional that includes squared fluctuation and squared “gradient” terms is given by Hristopulos (2015a)
(3) 
where is the vector of the local expected values (which is assumed constant herein), is a scale coefficient that is proportional to the overall variability of the data, is a positive rigidity coefficient that weighs the cost of gradients in the data, is the kernelaverage square gradient, is a vector of local kernel bandwidths that is used to determine the local bandwidths, and .
The gradientbased energy terms are given by the following kernel average
(4) 
ii.5 SLI Precision Matrix Formulation
The SLI energy function (3) is a quadratic function of the sample values. Hence, it can be equivalently expressed in the following form that involves the precision matrix :
(5) 
The precision matrix has the following structure
(6a)  
and depends on the reduced parameter vector . In the above, is the identity matrix which results from the energy of the squared fluctuations, and is the gradient precision submatrix which results from the sum of the squared differences and is given by 
(6b) 
where is an abbreviation of the kernel weights (1) and if while if is the Kronecker delta.
Theorem 1.
The SLI precision matrix defined by (6) is strictly positive definite for and . Moreover, it is diagonally dominant.
Proof.
The matrix is positive definite if it is realvalued and symmetric, every nonzero and every it holds that The matrix defined by (6) is realvalued and symmetric by construction. In addition, if we set , it follows from (5) that . This equation holds for any sample and for any , thus for all possible . However, the energy is equivalently expressed based on (3) as follows
It is straightforward to show based on (3) and (4) that since , the energy is strictly positive for any nonzero vector . If all the entries of the vector take the same value, the gradient term, which is proportional to , vanishes; however, the first term is still positive. Hence, the matrix is strictly positive definite.
Diagonal dominance requires that for all the following condition be satisfied
The diagonal dominance condition is stricter than positive definiteness. Hence, the former implies the latter (but not vice versa). From (6) it follows that
(7a)  
(7b) 
Based on the above and the nonnegativity of the weights, i.e., for all , it follows that
Hence the diagonal dominance of the SLI precision matrix is proved.
∎
ii.6 SLI Prediction
In this section we show how to use the SLI model to estimate the unknown values of the field at the sites . Let us consider that one the prediction points is added to the sampling network. The SLI energy function is then modified as follows:
(8) 
The first term in (8) is the sample energy (3), while the second and third terms represent the additional energy contribution that involves : the second term is the selfinteraction of the prediction point which contributes proportionally to the square of the fluctuation, and the third term involves the interaction of the prediction point with the sampling points. The latter represents local gradients that reflect differences between the fluctuation at and neighboring sample points.
The energy (8
) determines the conditional probability density function at the prediction point, which is given by
(9) 
The optimal SLI prediction at is equal to the value that minimizes the energy (8) and thus maximizes the conditional pdf. This value is a stationary point of the energy, i.e., it satisfies
where denotes the optimal parameter vector (excluding ) which is estimated from the data as shown in Section II.7. For now we assume that is known.
The first term in (8) excludes the prediction point and thus it drops out in the minimization. The firstorder derivatives of the next two energy terms lead to a linear equation for the stationary point, which admits the following unique solution
(10) 
In (10) we used the symmetry of the precision matrix, i.e., the fact that . The precision matrix entries that involve the prediction points are obtained from (7) as follows
(11a)  
(11b) 
In light of the kernel weight definition (1), the summands in (11a) are either zero (if the distance between and exceeds the local bandwidth) or positive (otherwise).
The predictor (10) is a stationary point of the energy. It remains to verify that it represents the minimum so that the prediction corresponds to the mode of the conditional pdf (9). The secondorder derivative of the energy (8) with respect to is given by . According to (11) it holds that since and the kernel weights are positive. Hence the predictor indeed corresponds to the global minimum of the energy.
Properties of the SLI predictor

Note that the prediction equation (10) is analogous to the equation that determines the conditional mean in GMRFs. This is not surprising since the SLI predictor finds the minimum energy at the prediction point, conditionally on the sample values.

The SLI predictor is unbiased as can be seen from (10): the second term on the right hand side, which is proportional to the fluctuations, vanishes upon calculating the ensemble average.

The SLI predictor is not an exact interpolator. According to (10), exactitude requires that if then , for all . This could only occur if the bandwidth tends to zero, or if it is sufficiently small to exclude all other points besides for all . However, the difference between the SLI prediction and the true value is usually negligible. This is due to the fact that the sample value at the nearest point to the prediction location has the largest linear SLI weight and hence the most impact on the prediction.

The prediction is independent of the scale coefficient . This becomes obvious upon inspection of (10), since the prediction involves the ratio of precision matrix elements.

The analogy of SLI with GMRFs allows us to calculate the conditional variance at the prediction point as
(12) The conditional variance endows the SLI method with a measure of prediction uncertainty which is analogous to the kriging variance. Just like the kriging variance, the SLI conditional variance does not explicitly depend on the data values. It only depends indirectly to the extent that the data influence the model parameters.
Computational efficiency:
The numerical cost of the prediction equation (10) is proportional to . Since the predictor is applied times to generate the map grid, the numerical complexity is . According to (11) the numerical complexity of calculating the precision matrix is per prediction point; therefore the overall complexity is again . However, the kernel weights (1) involve a double summation over the sampling points which has a computation complexity , where . The maximum complexity rate, , is obtained for infinitesupport kernels. For compactly supported kernels, rates can be achieved. In general, the upper bound of the SLI computational complexity is . The memory space required for storage of the precision matrix scales as , where is the sparsity of the precision matrix; for compactly supported kernels typically .
Kriging has a numerical complexity for the inversion of the covariance matrix, and for the calculation of the kriging weights and the prediction Zhong et al. (2016). The kriging complexity is practically dominated by the inversion of the covariance matrix. In addition, kriging requires the storage of the covariance matrix which consumes of memory space. The computational requirements of kriging can be reduced by introducing various modifications, e.g., using search neighborhoods with a small number of sampling points around each prediction point Olea (2012) or by means of tapered covariance functions that have compact supports Furrer et al. (2006).
ii.7 SLI Parameter Estimation
The SLI parameter vector is given by . In addition to the data, the estimated also depends on the kernel function. We use the following approach to estimate the SLI model parameters from the data:

The mean is set equal to the sample average. This is a reasonable choice in the absence of spatial trends, especially for large datasets. However, it is also possible to consider as a parameter to be determined from the data by means of cross validation as shown by Hristopulos and Agou (2019).

The order of the nearneighbor used to estimate the local bandwidths according to (2) is typically assigned an integer value determined by experimentation with the data. Herein we use ; we justify this choice below.

The optimal rigidity coefficient and global bandwidth parameter are determined by means of leaveoneout (LOO) crossvalidation (CV). In LOO CV sample points are removed one at a time. Their values are then estimated by means of the remaining sample points and the SLI prediction equation (10). After all the points are removed and estimated, a specified validation measure (e.g., mean absolute error, root mean square error) that quantifies the discrepancy between the estimates and the true values is calculated. The validation measure is used as the cost function for the optimization of the SLI parameters.

The optimal value of the scale coefficient is determined by means of Hristopulos (2015b)
(13) The value of is only used in determining the conditional variance but not for prediction.
The above procedure is repeated for different choices of the kernel function, and the kernel model which optimizes the cost function is selected.
Maximum likelihood estimation (MLE) is also possible within the SLI framework. However, in using MLE we are forced to estimate the partition function, , which involves calculating the determinant of the large, albeit sparse, precision matrix .
Iii Description of the Data and Study Area
We apply the SLI model to a set of coal thickness data from 11416 drillholes. The data come from the Gillette field in Campbell County, Wyoming (USA). The locations of the drillholes are shown in Fig. 1. Campbell County lies within the Powder River Basin and includes two of the largest coal mines in the world: North Antelope Rochelle coal mine and Black Thunder coal mine. The Gillette coal field contains significant quantities of exploitable coal resources that are low in sulfur and ash content.
The study domain extends over an area that covers approximately 9500 km^{2}. We focus on coal beds from the upper part of Fort Union Formation which include the Roland bed and coal in the WyodakAnderson zone. The stratigraphic crosssection showcasing the WyodakAnderson coal zone for part of the Gillette Field is shown in Fig. 2. The geological formations present in the Gillette field are shown in Fig. 3 Ellis et al. (1999). Coal from the Wyodak zone has higher mean gross calorific value (4760 kcal/kg), lower mean ash yield (5.8%), and lower mean total sulfur content (0.46%) compared with the respective values for the combined WyodakAnderson coal zones Ellis (2002).
The main statistics of the marginal distribution of the coal thickness data are listed in Table 2
. The frequency histogram which represents the empirical probability distribution of the thickness values is shown in Fig.
4. Both the statistics shown in Table 2 and the histogram in Fig. 4show evidence of mild deviation from the normal (Gaussian) distribution. The spatial distribution of the drillholes, and hence the sampling density, varies significantly in space as evidenced in Fig.
1. The median of the nearest distance between drillholes is 458 m.Mean (m)  Median (m)  Maximum (m)  Minimum (m)  Standard deviation (m)  Skewness 
7.83  7.71  22.67  0.33  2.99  0.13 
All the numerical tests below are conducted and timed on a desktop computer running Windows 10 operating system equipped with Intel(R) Core(TM) i76700K CPU 4.00GHz and 16.0 GB RAM installed. The numerical implementation was performed in the MATLAB programming environment (release R2016a).
Iv SLI Model Estimation for Gillette Field
To use SLI for interpolation we first need to estimate the model parameters from the Gillette field coal data. We perform this task using LOOCV as described in Section II.7.
Kernel weighting functions are utilized in the SLI predictions as explained in Section II.2 to determine the weights given by equation (1).
To find the optimal kernel function we use LOOCV on the entire dataset and select the kernel function that optimizes the CV cost function. Herein we use the mean absolute error (MAE) as the cost function. The optimization is performed in the Matlab environment using the constrained minimization function fmincon
with the interiorpoint algorithm Nocedal and Wright (2006).
iv.1 Kernel function selection
We compare the predictive performance of different compactly supported kernels from the list shown in Table 1. Compact kernels lead to sparse precision matrices (see Section II.5) that are computationally less intensive for big datasets. The comparison is based on the mean absolute error obtained by means of leaveoneout cross validation. The SLI predictions are based on the equation (10).
Table 3 shows the crossvalidation MAE for the different kernel functions obtained using neighbor order to define the distances in equation (2). As is evidenced Table 3, the parameter that controls the gradient precision submatrix (6a) takes values close to 115 for all the kernels. The bandwidth control parameter shows more dependence on the kernel function. The MAE takes very similar values for all the studied kernels, ranging between m and m (truncating at two decimal points). Based on these results we select the spherical kernel which gives one of the lowest MAE (slightly less than m). The Epanechnikov kernel also returns practically the same MAE value, but a higher value for ; as the latter determines the bandwidth, higher values of imply increased memory requirements and computational time. In general, we expect that any of the kernels listed in Table 3 (except for the uniform kernel which has a slightly higher MAE) will lead to comparable performance.
Kernel  MAE (m)  

Uniform  
Triangular  
Quadratic  
Biweight  
Tricubic  
Epanechnikov  
Spherical 
Neighbor order  MAE (m)  

2  115.0003  2.6795  0.1671  
3  115.0003  2.3675  0.1727  1.1199 
4  115.0002  2.0518  0.1717  1.1188 
iv.2 Distribution of SLI bandwidths
The distribution of bandwidths for the spherical kernel is shown in the histogram plot of Fig. 5. The bandwidths are obtained from (2) using and . As evidenced in the histogram plot, the bandwidth distribution is dispersed with a peak at m; however, there are also values as large as m. The dispersion of the distribution reflects the variations of the sampling density (cf. Fig. 1) in Gillette field: Larger (smaller) values of the bandwidth are obtained in areas of sparser (denser) sampling. The choice of neighbor order is based on Table 4, which compares the parameter estimates and the MAE values obtained for three different neighbor orders: . The table shows that gives the lowest MAE; however, all the MAE are equal within the second decimal place. In order to be impartial in our selection we choose . As the neighbor order increases, the optimal decreases (see Table 4), thus tending to keep the value of the optimal bandwidth constant. This behavior further demonstrates the ability of SLI to adjust the bandwidth adaptively without user input.
iv.3 Parameter estimation using global optimization
Note that the estimated is very close to the initial guess . This result suggests that the optimization, which is performed by means of the constrained optimization Matlab function fmincon
with an interiorpoint algorithm, may be stuck in a local minimum. To investigate this possibility we use the global search optimization function in Matlab, which can find global solutions to problems that contain multiple maxima or minima. This procedure is time intensive, but it allows for a better exploration of the SLI parameter space.
We use the Matlab GlobalSearch
solver which generates a sequence of starting points (initial guesses) in combination with the interiorpoint local optimization routine Ugray et al. (2007). The global optimizer is run with different initial values and bounds for .
The results of global optimization for four different combinations of initial values and bounds for are reported in Table 5. All runs take several hours to complete, due to the exhaustive search of the parameter space by the global optimizer. As evidenced in Table 5 the global optimizer gives different solutions depending on the initial conditions. These solutions are in a different region of parameter space than the local optimum . However, all the solutions lead to very similar results for the cost function (i.e., the MAE). This investigation shows that different local minima of the SLI cost function are essentially identical with respect to prediction performance. We further discuss this issue in Section VII. Note that both the global and local solutions give essentially the same estimate for . At the same time, the ratio is also approximately the same for both the global ( 670) and the local ( 676) estimates. This observation is further analyzed and justified in Section VII.1.
Constraints and Initialization  MAE (m)  

[eps, ],  670.48  
,  7.3092  2.3743  1.0900  669.72  
[eps, ],  8.3285  2.3762  1.2427  1.1196  669.91 
,  8.2394  2.3743  1.2287  1.1196  670.58 
V Variogram Modeling
In this section we estimate the variogram model that will be used for ordinary kriging interpolation and resources estimation. We use the robust variogram estimator Cressie and Hawkins (1980) for the empirical variogram of coal thickness which we then fit to five common variogram models.
The robust variogram estimator is given by
(14) 
where is the distance (lag) vector between two points and is the number of pairs separated by .
We used (14) to calculate the omnidirectional variogram. We then tested five different theoretical variogram models (Spherical, Cubic, Gaussian, Power, and Exponential) combined with a nugget term. The fits of the theoretical models to the empirical variogram are presented in Fig. 6. The best fitting model was selected by minimizing the sum of weighted squared errors given by (15). The sum of weighted squared errors for the five variogram models tested are shown in Table 6. The spherical model is shown to minimize the sum of weighted squared errors. The optimal parameters for the spherical variogram model are shown in Table 7.
(15) 
Variogram Model  Gaussian  Cubic  Spherical  Power  Exponential 

Fitting Error  11.55  2.67  1.92  6.79  3.37 
(m^{2})  (km)  (m^{2}) 

8.05  29.1  1.07 
The optimality of the spherical function for both the variogram model and the SLI spherical kernel is coincidental. The spherical variogram model implies a spherical covariance, while the spherical kernel in SLI enters in the precision matrix. Finally, we used the omnidirectional variogram estimator and did not consider anisotropic variogram models. We estimated geometric anisotropy using the method of covariance Hessian identity Hristopulos (2002); Chorti and Hristopulos (2008). This led to an estimate of the correlation length ratio along the two principal directions slightly less than two, which is considered as marginally isotropic.
Vi Estimation of Coal Resources
In this section we estimate the coal resources of Gillette field using SLI and ordinary kriging. Our goal is to compare the SLI predictions with those of ordinary kriging. This comparison allows assessing the interpolation performance of SLI against an established stochastic method. The estimate of the resources is based on estimated coal thickness values at the nodes of an interpolation grid. The interpolation domain (contained inside the redline border in Fig. 1) is covered by a map grid comprising orthogonal (approximately square) cells. The cell size is 114.7 m 134.7 m.
vi.1 SLI Estimation of Coal Resources
The coal deposits in the upper part of Fort Union Formation in Gillette field are estimated by SLI interpolation at a total volume equal to
(16) 
where is the surface area of the interpolation grid cell and the estimated coal thickness for the layers described in Section III at the center points of the grid. These estimates are based on the spherical kernel function and the model parameters of Table 3.
The SLI method estimates both the most probable coal thickness at each point and the respective prediction variance. The coal thickness prediction map is shown in Fig. 6(a). The standard deviation of the SLI prediction, i.e., the square root of the conditional variance of the SLI predictor given by (12), is shown in Fig. 7(a).
An interesting feature of the SLI standard deviation map is the low uncertainty values near the Northeastern border of the study area which is a region of low sampling density. However, as we show in Fig. 8, both the bandwidth and the number of SLI neighbors (i.e., sampling points that are coupled with the prediction point via the SLI model) are rather high (i.e., more than 600) near the boundary. Hence, the uncertainty in this area is constrained by a large, albeit distant, number of neighbors. The bandwidth for each prediction point is determined via the optimization process based on the neighbor order, the kernel function, and the parameter . The bandwidth values can thus vary considerably. This is in contrast with fixedradius kriging, which maintains a constant search radius for all the prediction points. The number of SLI neighbors for each prediction point is determined by the local bandwidth and the sampling point configuration. An inspection of the SLI standard deviation map (Fig. 7(a)) and the map of local neighbors (Fig. 7(b)) shows that the number of neighbors is correlated with the SLI standard deviation. The SLI uncertainty seems to behave more like the standard deviation derived from several stochastic simulations, in which the uncertainty is proportional both to the data spacing and to the complexity in the fluctuations, than like the kriging standard deviation.
vi.2 OK Estimation of Coal Resources
For the estimation of the coal resources in the study area and the visualization of the spatial distribution of coal thickness we use an OK interpolation grid identical with the SLI grid. We tested four different circular search neighborhoods. The respective search radii are equal to 1.5 km, 5.7 km, 9.1 km and 14.6 km. These values were guided by the correlation length of the selected spherical model ( km). We implemented LOOCV for each search radius, using the same variogram parameters (shown in Table 7).
Results for cross validation measures obtained with the search radii of 9.1 km and 14.6 km are shown in Table 8. There is no significant difference between the CV measures obtained with these radii and those obtained with radii of 1.5 km and 5.7 km. For the purpose of mapping the coal thickness we use the radius of 14.6 km because it allows almost full coverage of the map grid. The resulting kriging maps of coal thickness and its standard deviation are shown in Figure 9. Note that the colorbar scale next to the OK standard deviation extends to values less than 1 m, i.e., lower than the nugget standard deviation ( m). This is due to using common color scales for the OK and SLI maps. The values of the OK standard deviation are actually larger than the nugget standard deviation.
Based on ordinary kriging the coal deposits are estimated at 72.4 m^{3}, using the spherical variogram model and model parameters of Table 7 with neighborhood 14.6 km. The SLI prediction takes 12 min, while kriging with a search radius of 9.1 km (14.6 km) takes 42.7 min ( 319.2 min).
vi.3 Method Comparisons
In this section we compare the performance of SLI and OK interpolation in terms of (i) the LOOCV error of the coal thickness estimates (ii) various cross validation statistics and (iii) the interpolated maps of coal thickness.
First, we focus on the leaveoneout cross validation error which is given by
where is the SLI (OK) estimate at the point based on the remaining points and the parameter vector estimate derived from the points.
The histograms of the LOOCV errors are shown in Fig. 10 and exhibit overall good agreement. The error distributions are symmetric, reflecting the unbiasedness of the SLI and OK predictors. The vast majority of the error values are distributed between 5 m and 5 m. However, there are a few error values with magnitudes between 5m and 10 m as well as some values with larger magnitude.
Figure 10
also exhibits box plots of the LOOCV errors. The central mark of the box plots is the median of the empirical distribution, and the edges of the box correspond to the 25th and 75th percentiles. The whiskers extend to the most extreme values that are not considered outliers. Values are considered as outliers if they lie outside the range
, where andare the 25th and 75th percentiles of the sample data. This range corresponds to the 99.3% probability interval for the normal distribution. Outliers are plotted as individual points (marked by crosses). A comparison of the two box plots shows that the SLI error is less disperse, while the OK error has more weight in the tails of the distribution, including a couple of values with magnitudes above 15.
Next, we compare the predictive performance of SLI and OK in Table 8 by means of crossvalidation statistics. The following crossvalidation measures are considered:

Mean Error (ME) to represent bias.

Mean Absolute Error (MAE).

Root Mean Square Error (RMSE).

Pearson’s linear correlation coefficient ().

Maximum Absolute Error (MaxAE).

Mean Absolute Relative Error (MARE).

Root Mean Square Relative Error (RMSRE).
The SLI estimates have lower mean absolute error, root mean square error, maximum absolute error and higher correlation coefficient than the kriging estimates. On the other hand, kriging shows less bias, and lower relative MAE and RMSE.
Method  ME (m)  MAE (m)  MARE  RMSE (m)  RMSRE  MaxAE (m)  

OK (9.1 km S.R.)  4.0224  1.1788  0.2480  1.8067  0.8312  16.7286  80.27% 
OK (14.6 km S.R.)  4.0573  1.1787  0.2480  1.8066  0.8313  16.7290  80.27% 
SLI ()  5.6420  1.1199  0.2480  1.7033  0.8527  14.8781  82.23% 
Finally, the difference between the SLI and OK estimates of coal thickness over the map grid is illustrated in Fig. 11. The four intervals used are defined as follows: OKSLI corresponds to differences less than m (higher OK estimates), OK=SLI includes differences between m and 1m, SLIOK contains differences between 1 and 3.5m, and SLIOK differences higher than 3.5m. The difference between the SLI and OK predictions ranges from m (higher OK estimates) to 9.67 m (higher SLI estimates). Figure 10(a) shows that the SLI estimate of coal thickness at most map grid locations is approximately equal to OK estimates. At locations where this is not true, SLI estimates tend to exceed the OK estimates more frequently than the opposite. This leads to the overall higher SLIbased estimate of resources ( versus based on OK).
Regarding the spatial distribution of prediction uncertainty, Fig. 10(b) shows that the SLI standard deviation in most places exceeds the OK standard deviation. This is not surprising, since the OK predictor is constructed based on the minimization of the mean error variance while the SLI method does not constrain the variance. At the same time, there are spatial pockets where the SLI uncertainty is lower than the respective OK value. A juxtaposition of the plots in Fig. 10(b) and Fig. 8 shows that areas with lower SLI variance (e.g., near the northeastern border of the study domain) coincide with areas that have high bandwidth and, especially, high number of SLI neighbors. The presence of many neighbors tends to constrain the uncertainty, since the precision matrix elements include a higher number of positive terms and , according to (11a); this tends to increase and thus, according to (12), reduce the variance.
Vii Cross Validation Analysis
In this section we use the method of cross validation Arlot et al. (2010); Chilès and Delfiner (2012) with two goals in mind: (i) to investigate the stability of the SLI model parameter estimates and (ii) to better assess the predictive performance of SLI and Kriging.
In cross validation studies the dataset is split in two disjoint sets: the test set is used to estimate the model parameters, while the training set includes the points where the model performance is tested. The performance of the model is quantified by calculating a validation statistic (see below) that measures the difference between the data values in the test set and those estimated by means of the model. The split of the dataset is performed multiple times resulting in different configurations for the test and training sets. This leads to a set of validation statistics which can be averaged over all crossvalidation configurations, thus providing a statistical measure of the model’s performance.
Herein we use leavePout cross validation (LPOCV), where is an integer number which determines how many points are contained in the test set and how many points are contained in the training set. Since the Campbell dataset includes a large number of data points, we use , where is the floor function which returns the greatest integer that does not exceed , and is a rate that determines which percentage of the data are used to test the model. In the current study we use 100 repetitions (folds) for a small training set with (i.e., we keep ten percent of the data in the test set) and with a large training set with . The test and training sets are randomly selected and they are used in cross validation analysis with both kriging and SLI.
vii.1 SLI Parameter Stability Analysis
We investigate the dependence of the SLI parameters on the number and the configuration of the sampling points. We are motivated by the observations in Section IV.3, multiple local optima with very similar values of the crossvalidation cost function seem to exist. The parameter is not further investigated since its estimate is stable (cf. Section IV.3). The SLI parameter estimates are based on the spherical kernel function with neighbor order . The initial guesses for the parameter values are and . The (local) constrained minimization algorithm is used.
The SLI parameters and estimated for each of the 100 folds based on a small training set (1142 locations) are presented in Fig. 12. As evidenced in Fig. 11(a), 61 estimates of have values in the range of . However, there are also two estimates of that are of order , two estimates in the range , 34 estimates of that are of order
. While this behavior may seem odd in the first place, the ratio of
is approximately constant, i.e., as we show in Fig. 11(b). Note that similar behavior was observed in LOOCV (see Section IV.3), albeit the value of the ratio was different (. The value of the ratio in general depends on the configuration of the sampling points (cf. Table 5).In order to develop some understanding for this behavior, let us recast the SLI predictive equations (10) and (11) as follows
(17) 
where
Equation (17) shows that the prediction is independent of . In addition, if , the prediction essentially remains invariant if is replaced by , since the factors in the numerator and the denominator cancel out. Then, SLI energy is also multiplied by . Finally, the new value of according to (13) is also multiplied with since it is proportional to the energy. These arguments show that different values of and with constant ratio lead to nearly identical predictions, provided that the condition is fulfilled. Exact proportionality only holds if the above condition is fulfilled for all the prediction points.
We repeat the same experiment using large training sets that include (i.e., ) of the points. In this case the estimates of are close to the initial guess for all the subsets. Similarly, estimates are in the range . The results are shown in Fig. 13. In this case the estimates of are considerably more stable and there is no apparent trend. This is explained by the higher sampling density of larger training sets, and the fact that the training sets share many points (since of the total number of points are used in each training set).
vii.2 Comparison of CrossValidation Statistics
In this section we compare the interpolation performance of SLI and OK in terms of the crossvalidation statistics, using results from both the small and large training sets. In the SLI analysis the spherical kernel with neighbor order is used while in OK the spherical model with nugget is fitted to the empirical variogram and a kriging search radius of 9.1 km is used to reduce computational time. This radius is sufficient for crossvalidation analysis.
In Fig. 14 we compare the box plots for ME, MAE, RMSE and R for the two methods based on cross validation analysis of small training sets. In addition, Fig. 15 extends the comparison to large training sets. The main patterns evidenced in the box plots between the SLI and the OK cross validation statistics are the following:

The SLI MAE and RMSE have lower median values than the respective OK measures (for both small and large training sets).

The correlation coefficient has a higher median value for SLI than for OK (also for both small and large training sets).

In the case of small training sets, the ME median for OK is closer to zero (0.0055) than for SLI (0.0155). In addition, the interquartile range is higher for SLI (0.0898) than for OK (0.0770).

This pattern is reversed for large training sets: the OKbased ME median is while the SLIbased ME median is . The interquartile range is higher for OK () than for SLI (). So, it seems that SLI makes more efficient use of the higher sampling density.
Viii Conclusions
This paper presents a stochastic local interaction (SLI) model for spatial estimation which is based on a generalized concept of the square gradient for scattered data. The SLI spatial prediction is expressed, conditionally on the data, in terms of a sparse precision matrix, which is demonstrably positive definite. The precision matrix allows fast estimation of the mean and variance of the conditional distribution at the prediction locations. The sparse SLI precision matrix is a feature analogous to that of Gaussian Markov random fields defined on regular grids. Hence, SLI extends the original GMRF idea by means of kernel functions that implement local interactions between the nodes of unstructured networks. This implementation is suitable for modeling scattered spatial data. In addition, SLI allows efficient processing of large datasets, because it incorporates spatial correlations in the form of sparse precision matrices. For example, for the current study the sparsity index is , i.e., less than 1% of the precision matrix elements are nonzero. The sparsity feature implies considerably reduced memory space requirements. In addition, matrix inversion is not required to derive the SLI estimates and their standard deviations at unmeasured locations, thus reducing the necessary computational time.
With respect to the analysis of the Campbell coal dataset, SLI gives comparable (slightly better) crossvalidation statistics compared to those obtained with Ordinary Kriging with a finite search radius. In addition, SLI estimates a higher volume of coal in the study area than OK. Regarding the prediction uncertainty, the SLI estimates of conditional variance are overall higher than those of Ordinary Kriging. This is not surprising, since the SLI predictor is not based on optimization of the error variance. In terms of computational speed, filling the interpolation grid with the SLI predictor is about 325 times faster than with ordinary kriging ( 12 min with SLI versus 42 min with a search radius of 9.1 km and 5 hr with a search radius of 14.6 km). Overall, we have shown that SLI is a competitive method for the spatial interpolation of scattered data which can be applied to the estimation of natural resources, providing computational benefits for large datasets. In conclusion, we believe that SLI will provide a competitive tool for spatial analysis of large spatial datasets.
In terms of future research, anisotropic and nonstationary extensions of SLI are possible, as well as the development of SLIbased conditional simulation capabilities that can take advantage of the sparse precision matrix structure. Geometric anisotropy can be handled using a directional measure of distance in the kernel functions. Nonstationarity can be addressed in two ways: firstly, by introducing a variable mean function instead of a constant in the SLI energy (3), and secondly by replacing the rigidity parameter with a rigidity function . The application of SLI to spatiotemporal data is also possible, and a first step in this direction is presented in Hristopulos and Agou (2019). More general SLI models with additional terms in the energy function and multiple variates can also be developed. Finally, the positivedefiniteness of the precision matrix is maintained even if nonEuclidean measures of distance are used in the kernel functions. This is a potential advantage compared to standard geostatistical methods which are based on covariance function models, since the latter are not necessarily permissible for nonEuclidean distance metrics.
Acknowledgment
We would like to thank Ricardo Olea (United States Geological Survey, Reston, Virginia, USA) who provided the coal data analyzed and graciously read a draft of this manuscript, offering valuable comments and insights.
References
References
 Arlot et al. (2010) Arlot S, Celisse A, et al. (2010) A survey of crossvalidation procedures for model selection. Statistics Surveys 4:40–79
 Armstrong (1998) Armstrong M (1998) Basic Linear Geostatistics. Berlin, Germany: Springer
 Chilès and Delfiner (2012) Chilès J P, Delfiner P (2012) Geostatistics: Modeling Spatial Uncertainty. New York, NY, USA: John Wiley & Sons, 2nd edition
 Chorti and Hristopulos (2008) Chorti A, Hristopulos D T (2008) Nonparametric identification of anisotropic (elliptic) correlations in spatially distributed data sets. IEEE Transactions on Signal Processing 56(10):4738–4751
 Christakos (1992) Christakos G (1992) Random Field Models in Earth Sciences. San Diego: Academic Press
 Cressie (1993) Cressie N (1993) Spatial Statistics. New York, NY, USA: John Wiley & Sons
 Cressie and Hawkins (1980) Cressie N, Hawkins M (1980) Robust estimation of the variogram: I. Mathematical Geology 12(2):115–25
 Cressie and Johannesson (2008) Cressie N, Johannesson G (2008) Fixed rank kriging for very large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70(1):209–226
 Dowd and DareBryan (2007) Dowd P A, DareBryan P C (2007) Planning, designing and optimising production using geostatistical simulation. In Dimitrakopoulos R, editor, Orebody Modelling and Strategic Mine Planning, Carlton, Victoria, Australia: The Australasian Institute of Mining and Metallurgy, Spectrum Series, 2nd edition, 363–378
 Ellis (2002) Ellis M S (2002) Quality of Economically Extractable Coal Beds in the Gillette Coal Field as Compared With Other Tertiary Coal Beds in the Powder River Basin, Wyoming and Montana. Technical Report No. 2002174, U.S. Geological Survey
 Ellis et al. (1999) Ellis M S, Flores R M, Ochs A M, Stricker G D, Gunther G L, Rossi G S, Bader L R, Schuenemeyer J H, Power H C (1999) Gillette Coalfield, Powder River Basin: Geology, Coal Quality, and Coal Resources. Technical report, U.S. Geological Survey Professional Paper 1625A, Chapter PG
 Furrer et al. (2006) Furrer R, Genton M G, Nychka D (2006) Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics 15(3):502–523
 Hristopulos (2003) Hristopulos D (2003) Spartan Gibbs random field models for geostatistical applications. SIAM Journal on Scientific Computing 24(6):2125–2162
 Hristopulos and Elogne (2007) Hristopulos D, Elogne S (2007) Analytical properties and covariance functions for a new class of generalized gibbs random fields. IEEE Transactions on Information Theory 53(12):4667–4679

Hristopulos (2002)
Hristopulos D T (2002) New anisotropic covariance models and estimation of anisotropic parameters based on the covariance tensor identity. Stochastic Environmental Research and Risk Assessment 16(1):43–62
 Hristopulos (2015a) Hristopulos D T (2015a) Covariance functions motivated by spatial random field models with local interactions. Stochastic Environmental Research and Risk Assessment 29(3):739–754
 Hristopulos (2015b) Hristopulos D T (2015b) Stochastic local interaction (SLI) model: Bridging machine learning and geostatistics. Computers and Geosciences 85(Part B):26–37
 Hristopulos and Agou (2019) Hristopulos D T, Agou V D (2019) Stochastic local interaction model with sparse precision matrix for spacetime interpolation. arXiv preprint arXiv:190207780
 Journel and Kyriakidis (2004) Journel A G, Kyriakidis P C (2004) Evaluation of Mineral Reserves: A Simulation Approach. Oxford, UK: Oxford University Press
 Lasinio et al. (2013) Lasinio G J, Mastrantonio G, Pollice A (2013) Discussing the “big n problem”. Statistical Methods & Applications 22(1):97–112
 Lindgren et al. (2011) Lindgren F, Rue H, Lindström J (2011) An explicit link between Gaussian fields and Gaussian Markov random fields: The SPDE approach. Journal of the Royal Statistical Society, Series B 73(4):423–498
 Mussardo (2010) Mussardo G (2010) Statistical Field Theory. Oxford University Press
 Nadaraya (1964) Nadaraya E A (1964) On estimating regression. Theory of Probability and its Applications 9(1):141–142
 Nocedal and Wright (2006) Nocedal J, Wright S (2006) Numerical Optimization. New York, NY: Springer Science & Business Media, 2nd edition
 Olea (2012) Olea R A (2012) Geostatistics for Engineers and Earth Scientists. New York, NY, USA: Springer Science & Business Media
 Rivoirard (1987) Rivoirard J (1987) Two key parameters when choosing the kriging neighborhood. Mathematical Geology 19(8):851–856
 Rue and Held (2005) Rue H, Held L (2005) Gaussian Markov Random Fields: Theory and Applications. Boca Raton, FL: Chapman and Hall/CRC
 Stricker et al. (2007) Stricker G D, Flores R M, Trippi M H, Ellis M S, Olson C M, Sullivan J E, Takahashi K I (2007) Coal quality and major, minor, and trace elements in the powder river, green river, and williston basins, wyoming and north dakota. Technical report, US Geological Survey
 Sun et al. (2012) Sun Y, Li B, Genton M G (2012) Geostatistics for large datasets. In Porcu E, Montero J, Schlather M, editors, Advances and Challenges in Spacetime Modelling of Natural Events, Lecture Notes in Statistics, Springer Berlin Heidelberg, 55–77
 Ugray et al. (2007) Ugray Z, Lasdon L, Plummer J, Glover F, Kelly J, Martí R (2007) Scatter search and local NLP solvers: A multistart framework for global optimization. INFORMS Journal on Computing 19(3):328–340
 Wackernagel (2003) Wackernagel H (2003) Multivariate Geostatistics. Berlin, Germany: Springer Verlag

Watson (1964)
Watson G S (1964) Smooth regression analysis. Sankhya Ser A 26(1):359–372
 Zhong et al. (2016) Zhong X, Kealy A, Duckham M (2016) Stream kriging: Incremental and recursive ordinary kriging over spatiotemporal data streams. Computers & Geosciences 90:134–143
Comments
There are no comments yet.