Bayesian Spatial Analysis of Hardwood Tree Counts in Forests via MCMC

07/03/2018 ∙ by Reihaneh Entezari, et al. ∙ UNIVERSITY OF TORONTO 0

In this paper, we perform Bayesian Inference to analyze spatial tree count data from the Timiskaming and Abitibi River forests in Ontario, Canada. We consider a Bayesian Generalized Linear Geostatistical Model and implement a Markov Chain Monte Carlo algorithm to sample from its posterior distribution. How spatial predictions for new sites in the forests change as the amount of training data is reduced is studied and compared with a Logistic Regression model without a spatial effect. Finally, we discuss a stratified sampling approach for selecting subsets of data that allows for potential better predictions.



There are no comments yet.


page 6

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 The forest inventory problem

The forest industry has a significant impact on the economy of countries such as Canada, making forests an important financial asset. The monetary value of forest assets are mainly determined by their timber, the value of which depends on different features of trees such as size, species, age, defects, etc. Tree species have different types of wood with different qualities, and hence influence the timber value.

Tree species have two main categories, hardwood (deciduous) trees and softwood (coniferous) trees, with hardwood trees generally having wider leaves that are lost annually, while softwood trees have smaller leaves and retain their leaves throughout the year. Hardwood trees provide much longer lasting wood compared to softwood trees, with slower growth rates which makes them more expensive compared to softwood. Hence, knowing the number of hardwood trees in a forest is valuable information. Collecting data on forests requires hiring workers to travel to different sites around the forests and measure the quantities needed, which can be costly and time consuming.

Remote sensing technologies can overcome this issue. Although they are cheap and efficient and can cover a wide range of geographical areas, they can suffer from lack of accuracy. Geostatistical models are powerful tools for analyzing and predicting such spatial data, and can be used to calibrate remotely sensed data (see Curran & Atkinson, 1998). Existing literatures by Giorgi et al. (2017); Shaby & Reich (2012); Abellan et al. (2007) are examples of the importance of statistical models for spatial analysis. The focus of this paper will also be to take advantage of statistical tools to predict the number of hardwood trees using geostatistical models that take into account the spatial factor.

1.2 Model-based geostatistics

In the past few decades, spatial statistics has become an established field of statistics with well developed models applied to many real-world problems. Conventional geostatistical models for Gaussian spatial data were first popularized by Matheron (1962) and later on built upon by Cressie (1993). The generalization of these models for non-Gaussian data were introduced by Diggle et al. (1998).

Let be the observed spatial data at location , with arbitrary distribution that has mean and possible additional parameters . Consider as the covariates at location . Modelling this data with the Generalized Linear Geostatistical Model (GLGM) described in Diggle et al. (1998) and Diggle & Ribeiro (2007), will be as following:



is the link function (i.e. logit or log). Here

is a Gaussian random field evaluated at location

, which is characterized by the joint multivariate normal distribution:

where the elements of are defined by a spatial correlation function as

where is a range parameter and

is a vector of other possible parameters. The range parameter

controls the rate at which the correlation decreases with distance. There are many possible parametric functions for , with Matérn correlation function (see Stein, 1999) being the most commonly used. The Matérn correlation is defined as:


where is the gamma function and is the modified Bessel function of the second kind of order ( being a shape parameter). This function is particularly interesting, as it is flexible in the differentiability of the Gaussian process by adjusting (Stein, 1999).

In the case where the data is Gaussian, Maximum Likelihood Estimates (MLEs) can be used as point estimates for the model parameters. However, when the data is non-Gaussian, because of the unobserved latent variables

present in the model, the likelihood function becomes intractable and it is difficult to calculate the MLEs. Performing Bayesian inference on these models via Markov Chain Monte Carlo (MCMC) methods (Brooks et al., 2011; Craiu & Rosenthal, 2014) has many advantages (as discussed in Diggle et al. (1998)). The Integrated Nested Laplace Approximation (INLA) algorithm introduced by Rue et al. (2009), is an alternative to MCMC for Bayesian Inference on latent Gaussian models. However this method has some drawbacks as it approximates marginal posterior distributions rather than joint posterior distributions. There are facilities in the R-INLA software for producing approximate joint posterior samples, but the properties of these samples have yet to be explored.

In this paper, we will analyze the spatial hardwood tree count data collected from the Timiskaming & Abitibi River forests in Ontario, Canada. Our analysis is constructed in a Bayesian framework for a binomial geostatistical model to predict the proportion of hardwood trees from remotely sensed elevation and vegetation data. For posterior simulations, we implement an MCMC method using the Langevin-Hastings (see Roberts & Rosenthal, 1998) and the Random-Walk Metropolis Hastings (see Roberts et al., 1997; Roberts & Rosenthal, 2001) algorithms. By reducing the amount of training data fitted to the model, and predicting for the same validation set, we are able to answer questions related to the accuracy of predictions given small amounts of ground truth data collected. We will show that with training data size as small as 10 spatial locations, despite the increase in uncertainty, the true number of hardwood trees lies within a 95% prediction interval. This conclusion is very valuable as it will significantly reduce costs of collecting ground truth data. We will also compare our results with the logistic regression model where there is no spatial effect. Furthermore, we explore a stratified sampling approach in choosing the training data that will show a potential improvement in predictions.

The paper is organized as follows. The spatial data from the Timiskaming & Abitibi River Forests are described in section 1.3. Section 2 describes the geostatistical model used for our data and the MCMC algorithm applied to perform Bayesian Inference. In addition, we explain our stratified approach and describe the measurements we will use to compare and assess predictions. Section 3 discusses the numerical results from fitting the data, where comparisons are also made with the Logistic Regression. At last, we summarize our results in Section 4. The Appendix includes results from different simulations.

(a) Sample locations
(b) Proportions of hardwood trees
Figure 1: Locations of 162 forest plots in the Timiskaming and Abitibi River Forests.

1.3 Description of Data

The Timiskaming and Abitibi River forests are geographically located next to one-another in northern Ontario, Canada. The First Resource Management Group Inc. has provided detailed data from 162 individual forest plots inside these adjacent forests. Each forest plot is in radius to provide a surface. The geographical locations of these 162 sites are shown in Figure 1.

The data from each site consists of information on the total number of trees, whether each tree is living or dead, and the species of each tree. Figure 0(b) shows the proportion of live trees which are hardwood from the 162 sites. As can be seen, many sites have no hardwood trees and such sites are scattered throughout the forests.

(a) Elevation
(b) SkyForest vegetation index
Figure 2: Elevation & Vegetation index around the Timiskaming and Abitibi River Forests (Background ©Stamen Design).

The remotely sensed data considered includes elevation values from satellite data provided by the SRTM program (Figure 1(a)). A measure of forest vegetation was provided by the First Resource Management Group Inc. using the proprietary remote sensing technology “SkyForest”, which is shown in Figure 1(b). This vegetation measure is predicted by SkyForest across the forest landscape by selecting an arithmetic transformation of spectral bands (ATSB) from a candidate list of ATSBs. The ATSBs are constructed similarly to well known vegetation indices such as the Normalized Difference Vegetation Index (NDVI), with some of them being multi-temporal. It is thus expected that hardwood trees are located where this measure is high.

In the next section, we will describe the geostatistical model for our dataset, along with the steps taken to perform a Bayesian analysis.

2 Methods

2.1 Logistic Regression

Before describing the full geostatistical model for our data, a simple Logistic Regression model with binomial response will be outlined. Consider to be the count of hardwood trees in forest plot , and write , where is the total number of live trees at site () and

is the probability of a tree in plot

being hardwood. Elevation and the SkyForest index are covariates in the model. The SkyForest

covariate is treated as a linear effect with change point at 0.3 (approximately its average value), giving some additional flexibility to this covariate in the regression model. The elevation values are also centered at the average value of about 320. For computational reasons, we normalize the covariates by dividing by the standard deviation. The model is:


Writing as the SRTM-measured altitude at location and as the SkyForest vegetation index, the normalized vector of covariates is constructed by:

2.2 The geostatistical model

Spatial dependence in the prevalence of hardwood trees should be expected as sites in the forests close to one another may benefit from the same soil, weather, etc, and hence may have similar tree types. Thus we expect a geographical effect to play an important role in explaining such data with a more sophisticated model such as the Generalized Linear Geostatistical Model (GLGM). A geostatistical model for our spatial data will have an extra spatial term and an independent term compared to the model in (3), resulting:



This model is equivalent to (1) where is Binomial and is a logit link function.

2.3 Random Sampling vs Stratified Sampling

For our analysis, we explore reducing the size of the training data fitted to the model, to observe and examine the trade-off between prediction accuracy and costs of collecting ground truth data. More specifically, if we were only given data from 25 or 10 plots on the ground, could useful predictions still be made? To answer this question, the 162 plots in the dataset were divided into 100 training and 62 validations sets. Keeping the 62 validation set fixed, we examine the performance of results generated by fitting 100, 25, and 10 training data to the model. For this purpose, we can do this by two different approaches, 1) choosing random subsets of data and 2) choosing stratified subsets of data. Since the spatial data is correlated, choosing the subset of data with a stratified approach should be expected to improve the results, as it can force the training plots to be as scattered as possible. Both elevation and vegetation covariates are taken into account for choosing the 25 and 10 dataset from the 100. Hence, we begin by looking at the elevation from all the 100 training data (first simulation) as shown in Figure 2(a).

(a) Elevation for 100 training data
(b) Stratified regions
Figure 3: Plots of elevation from 100 training data, along with the plot of stratified regions.

The 100 plots are stratified into three groups as shown in Figure 2(b), and both location of the points as well as elevation values are taken into account equally. Keeping the proportion of the data from each strata constant, we systematically sample 25 plots from the 100 by sorting the vegetation index in each strata and taking every element depending on the number of data needed (similarly for the 10 data points from the 25). The Results section will explore how stratified sampling can (potentially) improve prediction accuracy with smaller training data fitted to the model, compared to random sampling.

2.4 Inference

We will apply a Bayesian approach to the model in (LABEL:spatial), and this methodology will be referred to as the Bayesian Generalized Linear Geostatistical Model (BGLGM). Let , , and with , be the three sets of parameters. We treat as fixed at , since it is not of direct interest and according to Zhang (2004), not all the parameters (, , and ) are consistently estimable. We define priors for each parameter as

with joint posterior distribution given as:


where is the likelihood function. Here is the covariance matrix with diagonal elements equal to and off-diagonal elements of where is the Matérn correlation function. We consider Exponential(0.5) priors for and , and a Gamma(3,35) prior for .

There are a number of R packages available for posterior estimation of the BGLGM. The geostatsp package by Brown (2015) uses INLA to approximate the marginal posterior distributions, while the recent PrevMap package (Giorgi & Diggle, 2017) uses an MCMC method to generate joint posterior draws. In this paper, our focus will be on using MCMC methods to generate joint posterior samples of BGLGM. However, PrevMap performed poorly with this dataset when the number of data points was very small, and a bespoke MCMC algorithm was developed as a result.

As reparamterization and standardization help reduce correlation between variables, they will be play an important role in improving the mixing and convergence of MCMC algorithms. The transformations applied to all the model parameters in (5) follow the recommendations of Christensen et al. (2006).

Let be a diagonal matrix with elements for , and denote . Assuming a prior for , let and . Then by factorizing the posterior distribution in (5) into two parts: , we will be able to simplify the second factor as following:


where the first expression is derived from the Taylor expansion of around . From equation (6), we can simply use the transformations:


where and

are now approximately uncorrelated with mean zero and variance one. These parameters are also uncorrelated with

and hence there will be no posterior dependence between and . However, according to Christensen et al. (2006) and Giorgi & Diggle (2017), there is posterior dependence within the parameters of , and hence a reparameterization is proposed as following:

In general, using these transformations will help facilitate the choice of proposal densities as well as reducing the correlation between variables that will significantly improve mixing and convergence of the MCMC algorithm.

Using these reparameterizations, we have implemented a Metropolis-Hastings-within-Gibbs sampling method that updates each blocks of and at a time. However, for high-dimensional parameters, it is more suitable to use the Langevin-Hastings algorithm as they will have much faster convergence rates (Roberts & Rosenthal, 1998; Roberts & Tweedie, 1996; Møller et al., 1998). For our model and data, has the highest dimension, hence we will use Langevin-Hastings algorithm to update . For the remaining blocks we will use the Random-Walk Metropolis Hastings (RWMH) algorithm. The summary of the steps used to run the MCMC algorithm is shown in the diagram below.

1 Initialize and
2 Transform to and
3 Update and using a RWMH, each with standard deviation calculated iteratively as:
where and are constants, and is the acceptance probability up to iteration with optimal acceptance probability of .
4 Update using a RWMH
5 Update with a Langevin-Hastings algorithm, i.e. where is recommended to be .
6 Repeat steps 3-5 until the desired number of samples are collected.
Transform samples of and back to and .
Algorithm 1 MCMC algorithm

2.5 Prediction & Assessment

After running our MCMC algorithm on the BGLGM, we will combine the posterior samples for each parameter to generate posterior distributions for hardwood probabilities at each of the 62 validation locations. We will then emphasize on assessing the predictions from the number of hardwood counts rather than proportions, since the observed proportions are often 0 or 1, while predictions are . Below we describe the various assessments we have considered:

  1. Coverage Probability

    : For each of the 62 validation points, we generate posterior samples of hardwood counts from the corresponding posterior probability samples, then examine whether the true hardwood count is inside the (say) 95% posterior interval. The coverage probability will be the proportion of 62 points that are inside their posterior intervals, i.e.:

  2. RMSE (root mean squared error): We will also compare RMSE of hardwood probabilities from both BGLGM and GLM (Logistic Regression), calculated as:

    where is the true proportion of hardwoods in plot (often 0 or 1) and is the predicted hardwood probability in GLM and posterior mean in BGLGM.

  3. Total hardwood count distribution: We also consider the distribution of the total number of hardwoods in all 62 validation sites and examine whether the true total hardwood counts is covered within the 95% posterior interval. Unlike the posterior distributions of hardwood counts in each of the 62 plot, the total count has a reasonably symmetric distribution. In addition, we have compared this to the corresponding distribution generated from GLM via bootstrapping.

3 Results

For the main analysis we have ran the MCMC algorithm for 2,000,000 iterations with 1,000,000 burnin and 100 thinning. Runs consist of fitting 100, 25, and 10 sites as training data, both via random and stratified sampling, with predictions made for the 62 validation data. We have repeated this procedure for five different simulations by randomly choosing five different validation sets of size 62.

(a) PrevMap, Intercept
(b) Bespoke MCMC, Intercept
(c) PrevMap, elevation
(d) Bespoke MCMC, elevation
Figure 4: Comparing trace plots of and from the bespoke MCMC implementation and the PrevMap package.

3.1 MCMC Convergence and Mixing

Figure 4 is showing the trace plots of posterior samples for generated from the bespoke MCMC implementation from section 2.4 versus PrevMap package, using 10 training sites. The bespoke MCMC mixes well and has converged, while PrevMap trace plots have not converged although they have been ran for the same number of iterations. Different tuning parameters for running PrevMap were considered without the chains being improved. Thus we will be using the bespoke MCMC implementation for the rest of the analysis in this paper.

(a) 100 training sites —
(b) 25 training sites —
(c) 10 training sites —
Figure 5: Trace plots of 10,000 MCMC posterior samples for (simulation 1).

Figures 4(a), 4(b), and 4(c) are showing the MCMC trace plots for only the parameter with 100, 25, and 10 data fitted to the model respectively. All trace plots show that the MCMC is mixing well and thus, the chains have converged. In addition, the trace plots show larger variability with less training data fitted. The remaining trace plots for other parameters as well as other simulations are included in the appendix.

Parameters # of training Mean quantile quantile
100 -3.47 -4.33 -2.65
Intercept - 25 -3.54 -5.67 -1.66
10 -2.37 -6.38 1.31
100 0.53 0.07 0.99
Elevation - 25 0.12 -1.03 1.13
10 2.16 -0.96 6.38
100 2.89 1.19 4.87
SkyF0.3 - 25 2.09 -0.50 5.26
10 3.38 -1.39 10.42
100 2.61 2.10 3.17
SkyF0.3 - 25 3.02 1.86 4.44
10 4.20 1.85 7.70
100 0.04 0.02 0.11
Spatial sd - 25 0.04 0.02 0.12
10 0.06 0.02 0.17
100 1.98 1.52 2.55
Indep. sd - 25 2.38 1.36 4.05
10 3.09 1.04 7.23
100 104.94 21.89 255.14
Range(km) - 25 105.42 22.00 252.93
10 105.06 21.30 255.27
Table 1: Comparison of posterior mean, 2.5 %, and 97.5 % quantiles of model parameters, for different sizes of training data. These results are from only the first of five training samples.

For quantitively verifying this variability between different training data size, we have compared the numerical values of posterior mean, 2.5 %, and 97.5 % quantiles of all model parameters in Table 1. While the posterior means remain almost unchanged, the 95% posterior intervals for each model parameter (except ), become wider with less training data fitted to the model, indicating more uncertainty in parameter estimation.

3.2 Parameter posteriors & spatial surfaces

The prior and posterior densities of model parameters from the first simulation are shown in Figures 6 and 7. From these figures we can ascertain that with fewer training data, posterior densities become wider and hence result in more uncertainty of predictions. The posterior distributions of suggest small spatial random effects for this dataset, as they have modes concentrated at smaller values. Posterior densities of are all similar and remain unchanged for different training data, as small causes weak spatial signal which can’t identify .

One surprising feature of Figure 6(b), is the posterior density with 10 training data points does not resemble the prior. Even the smallest training dataset considered provides clear evidence that there is more variation in the observed counts than the covariates predict, which is manifest in the results as has a posterior distribution concentrated away from zero. There is also evidence that this extra variation is not spatially structured, since is clearly much smaller than .

(a) - Intercept
(b) - elevation
(c) - SkyForest vegetation below
(d) - SkyForest vegetation above
Figure 6: Prior and posterior distributions of parameters from the first simulation.
(a) - spatial sd
(b) - indep sd
(c) (in km) - range
Figure 7: Priors and posteriors from the first simulation.

The main goal is to predict the composition of trees at unmeasured sites in the forests via simulating posterior samples of for new locations , conditional on MCMC posteriors . Considering a grid with cells inside the forests as our new locations, we can simulate using the RFsimulate function in the “RandomFields” package and make predictions for hardwood probabilities for each cell. The RandomFields package has very efficient algorithms for simulating from conditional distributions of spatial processes without using the full variance matrix. Thus assuming we have grid cells , we simulate and independent along with the use of other posterior samples to generate .

(a) 100 train - Sample 1
(b) 25 train - Sample 1
(c) 10 train - Sample 1
(d) 100 train - Sample 2
(e) 25 train - Sample 2
(f) 10 train - Sample 2
(g) 100 train - Sample 3
(h) 25 train - Sample 3
(i) 10 train - Sample 3
(j) 100 train - Sample mean
(k) 25 train - Sample mean
(l) 10 train - Sample mean
Figure 8: Three posterior samples of the hardwood proportion surface along with their posterior means from different training data sizes (Background ©Stamen Design).

Figure 8 shows images of three different posterior samples along with posterior means (in each row) generated from fitting different training data sizes. With fewer training data the posterior rasters appear to become smoother, possibly indicating less precise predictions.

The 62 validation sites with their ground truth number of hardwood trees are used to evaluate predictions by summarizing results over all corresponding sites. The number of hardwood trees in each validation site is predicted and their coverage probabilities calculated from posterior intervals of hardwood counts. Table 2

shows the corresponding coverage probabilities of 95%, 80%, and 50% Posterior Credible Intervals (CI) for different training data size, averaged over five different simulations. Note that many observed proportions are 0 or 1, and the hardwood count posteriors will not be symmetric. To illustrate this, Figure

9 shows the histograms of hardwood count posteriors for two validation plots where in one all are hardwoods and in the other none. We calculate the narrowest credible intervals for each validation plot, and compute their average coverages and widths as shown in Table 2.

Empirical Coverage of CI Average CI Width
#ofTrain 95 % 80 % 50 % 95 % 80 % 50 %
100 Sites 97 % 87 % 59 % 19.98 11.42 4.41
25 Sites 96 % 86 % 55 % 21.91 12.12 4.49
10 Sites 95 % 78 % 55 % 26.64 13.71 4.35
Table 2: Empirical Coverage of Posterior Credible Intervals and their Average Width. All results are averaged over 5 different simulations.
(a) All hardwoods, plot 37
(b) No hardwoods, plot 15
Figure 9: Posterior distributions of hardwood counts from two validation plots.

The empirical coverage probabilities tend to exceed their theoretical values, meaning the intervals provided are on the conservative side. Overall, coverage probabilities are all at a desirable value. Table 2 also includes the average width of the posterior intervals, which shows on average wider intervals with fewer training data, as expected.

3.3 Comparison of BGLGM with Logistic Regression

In this section we will show the difference in performances between the BGLGM and a simple Logistic Regression. Fitting a Logistic Regression model to this dataset using the function glm in R is a frequentist way of analyzing this dataset, while BGLGM is a Bayesian approach. We will compare them both through their performance in point estimations via RMSEs, as well as their performance of distributions.

Table 3 is reporting the RMSEs of hardwood probabilities for the 62 validation sites, computed from runs with 100, 25, and 10 training data, for five different simulations. The RMSEs of BGLGM are calculated using posterior means. As it is observed, on average, RMSEs of BGLGM are smaller compared to Logistic Regression (GLM), indicating more accurate predictions. RMSEs increase with less ground truth data fitted to the model; verifying the results shown in the previous section.

#ofTrain Sim 1 Sim 2 Sim 3 Sim 4 Sim 5
100(BGLGM) 0.2276 0.2072 0.2096 0.1983 0.1569 0.1999
100(GLM) 0.2333 0.2060 0.2137 0.1969 0.1575 0.2015
25(BGLGM) 0.2284 0.2106 0.2335 0.1985 0.2268 0.2196
25(GLM) 0.2493 0.2256 0.2455 0.2005 0.2569 0.2356
10(BGLGM) 0.2912 0.2435 0.3639 0.2080 0.2472 0.2708
10(GLM) 0.3509 0.2796 0.4805 0.2357 0.3168 0.3327
Table 3: RMSE of predicted hardwood probabilities

To compare the predictive distributions of the GLM and BGLGM, we simulated 10,000 hardwood counts for each validation site using the estimated probabilities from GLM. These are compared to 10,000 MCMC posterior samples from the BGLGM using the distributions of the total hardwood counts from all 62 validation sites. Figure 10 is showing the corresponding distributions from BGLGM and GLM for only the first simulation. The distributions from GLM are significantly narrower compared to the ones from BGLGM, as should be expected, since the GLM is ignoring errors in the parameter estimates. The BGLGM posterior distributions with all training data sizes capture the true value shown in green within their 95% intervals, while GLM with 10 and even 100 training data points fails to do so. In addition, from Figure 9(a), we also observe that the posterior distributions become wider with less training data as expected. In conclusion, the BGLGM is a more reliable method compared to the GLM, in terms of both prediction accuracy and the ability of explaining uncertainties. Note that this process has been repeated for four other simulations with figures shown in the Appendix.

(b) GLM
Figure 10: Comparing BGLGM posterior distributions of total number of hardwood trees to the frequentist distributions from GLM.

Overall, a Bayesian approach is more reliable compared to a frequentist approach, since more types of uncertainty are taken into account. The simple Logistic Regression has artificially narrow prediction intervals, while BGLGM includes the true value within its 95% intervals for this dataset.

3.3.1 Stratified Sampling of training data

(a) Random
(b) Stratified
Figure 11: Comparing random vs stratified sampling for total hardwood posterior distributions.

Figure 11 compares the posterior distributions of the total hardwood trees from both random sampling and stratified sampling on the first of five simulations. Prediction intervals from all five simulations are shown in Figure 12. The posterior distributions all contain the true value within their 95% posterior interval, however the uncertainty is generally less under stratified sampling in most cases. In Figure 11 it is notable that the stratified posterior with 10 training data contains the true value near its mode, while with the random posterior it is covered around the tail area.

Figure 12: 95% posterior intervals of Random sampling vs Stratified sampling from all five simulations.

In simulations 2 and 4 results are roughly comparable, while in simulation 3 the stratified posterior with 10 data points captures the true value around its mode. On the other hand, in simulation 5, the stratified posterior with 10 data points becomes more dispersed while with 25 more narrow. Overall, the stratified sampling approach shows only potential in improving results and thus may not be of significant improvements.

More details on the MCMC trace plots and the posterior distributions of each model parameter for the stratified sampling is included in the Appendix (for all simulations).

4 Discussion

In this paper, we have analyzed the spatial data from the Timiskaming and Abitibi River forests in Ontario, Canada. We have studied the prediction of hardwood tree counts from elevation and vegetation index. We implemented a bespoke MCMC algorithm for posterior simulation of a Bayesian Generalized Linear Geostatistical Model (BGLGM), in order to make spatial predictions for new sites in the forests. The bespoke MCMC performed well with this dataset while the general purpose “PrevMap” package struggled. We compared the Bayesian model with the frequentist Logistic Regression model. Although the dataset is imbalanced and contains many zero hardwood counts, the Bayesian approach provided unbiased estimates with reasonable uncertainties, while the overly simplistic Logistic Regression underestimated the uncertainty associated with the predictions. More importantly, with ground truth data as small as 10 points, BGLGM captured the true value of hardwood tree counts within its 95% posterior intervals, while the Logistic Regression failed even with 100 training points. This suggests with fewer ground truth data collected and hence reduction in expenses, good estimates of hardwood counts are still present and can capture the true value but perhaps with more uncertainties involved. This result is fairly important in terms of saving time and money for companies to gather such data.

Furthermore, a stratified sampling strategy of choosing the subset training data showed potential improvements in terms of predictions and uncertainties. However, these improvements are not always guaranteed.

As future work, one can further extend this model for multiple forests, where forests with similar features are considered to have high correlation indicated within priors and hence facilitate future spatial predictions for similar forests. This will significantly help reduce the redundant collection of data from similar forests.


The authors thank Philip E. J. Green, M.Sc., and the First Resource Management Group for introducing us to this problem, providing scientific and technical advice, and making the data available.

Figures 1(b) and 8 have map tiles by Stamen Design under CC BY 3.0. Data by OpenStreetMap available under the Open Database License.

The second and third authors hold Discovery grants from the Natural Sciences and Engineering Research Council of Canada.


  • Abellan et al. (2007) Abellan, J. J., Fecht, D., Best, N., Richardson, S., & Briggs, D. J. (2007). Bayesian analysis of the multivariate geographical distribution of the socio-economic environment in england. Environmetrics, 18(7), 745–758.
  • Brooks et al. (2011) Brooks, S., Gelman, A., Jones, G., & Meng, X.-L. (2011). Handbook of Markov Chain Monte Carlo. CRC press.
  • Brown (2015) Brown, P. E. (2015). Model-Based Geostatistics the Easy Way. Journal of Statistical Software, 63(12), 1–24.
  • Christensen et al. (2006) Christensen, O. F., Roberts, G. O., & Sköld, M. (2006).

    Robust Markov Chain Monte Carlo methods for spatial generalized linear mixed models.

    Journal of Computational and Graphical Statistics, 15(1), 1–17.
  • Craiu & Rosenthal (2014) Craiu, R. V. & Rosenthal, J. S. (2014). Bayesian computation via Markov Chain Monte Carlo. Annual Review of Statistics and Its Application, 1, 179–201.
  • Cressie (1993) Cressie, N. (1993). Statistics for Spatial Data. John Wiley & Sons.
  • Curran & Atkinson (1998) Curran, P. J. & Atkinson, P. M. (1998). Geostatistics and remote sensing. Progress in Physical Geography, 22(1), 61–78.
  • Diggle & Ribeiro (2007) Diggle, P. & Ribeiro, P. (2007). Model-based Geostatistics. Springer Series in Statistics. Springer.
  • Diggle et al. (1998) Diggle, P. J., Tawn, J., & Moyeed, R. (1998). Model-based geostatistics. Journal of the Royal Statistical Society: Series C (Applied Statistics), 47(3), 299–350.
  • Giorgi & Diggle (2017) Giorgi, E. & Diggle, P. J. (2017). PrevMap: an R package for prevalence mapping. Journal of Statis.
  • Giorgi et al. (2017) Giorgi, E., Schlüter, D. K., & Diggle, P. J. (2017). Bivariate geostatistical modelling of the relationship between loa loa prevalence and intensity of infection. Environmetrics.
  • Matheron (1962) Matheron, G. (1962). Traité de géostatistique appliquée. 1 (1962), volume 1. Editions Technip.
  • Møller et al. (1998) Møller, J., Syversveen, A. R., & Waagepetersen, R. P. (1998). Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25(3), 451–482.
  • Roberts et al. (1997) Roberts, G. O., Gelman, A., & Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. The Annals of Applied Probability, 7(1), 110–120.
  • Roberts & Rosenthal (1998) Roberts, G. O. & Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1), 255–268.
  • Roberts & Rosenthal (2001) Roberts, G. O. & Rosenthal, J. S. (2001). Optimal scaling for various Metropolis-Hastings algorithms. Statistical Science, 16(4), 351–367.
  • Roberts & Tweedie (1996) Roberts, G. O. & Tweedie, R. L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4), 341–363.
  • Rue et al. (2009) Rue, H., Martino, S., & Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B (statistical methodology), 71(2), 319–392.
  • Shaby & Reich (2012) Shaby, B. A. & Reich, B. J. (2012). Bayesian spatial extreme value analysis to assess the changing risk of concurrent high temperatures across large portions of european cropland. Environmetrics, 23(8), 638–648.
  • Stein (1999) Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer-Verlag, New York.
  • Zhang (2004) Zhang, H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. Journal of the American Statistical Association, 99(465), 250–261.