Collecting large quantities of spatial and spatiotemporal data is now commonplace in many fields. In ecology and forestry, very large datasets are collected using satellite imaging and other remote sensing instruments such as LiDAR that periodically record high-resolution images. Unfortunately, clouds frequently obstruct the view, resulting in large regions with missing information. Figure 1
shows this phenomenon using Normalized Difference Vegetation Index (NDVI) data from the Serengeti region. Filling such gaps in the data is thus an important goal, as is quantifying the associated uncertainty in predictions. This goal can be achieved via probabilistic modeling of the underlying phenomenon, which involves the specification of a spatial or spatiotemporal process that characterizes dependence among any finite set of random variables. Gaussian processes (GP) are a common, flexible choice to characterize spatial dependence, but a standard implementation is notoriously burdened by theircomputational complexity. As a consequence, intense research has been devoted in recent years to the development of scalable models for large spatial datasets – see in-depth reviews by sunligenton and sudipto_ba17.
Computational complexity can be reduced by considering low-rank models; among these, knot-based methods motivated by kriging ideas enjoy some optimality properties but oversmooth the estimates of spatial random effects unless the number of knots is large, and require corrections to avoid overestimation of the nugget(gp_predictive_process; frk; gp_pp_biasadj; pp_adaptive_knots; pp_spacetime). Other methods reduce the computational burden by introducing sparsity in the covariance matrix; strategies include tapering (taper1; taper2) or partitioning of the spatial domain into regions with a typical assumption of independence across regions (fsa; stein2014). These can be improved by considering a recursive partitioning scheme, resulting in a multi-resolution approximation (MRA; katzfuss_jasa17). Other assumptions on conditional independence assumptions also have a good track record in terms of scalability to large spatial datasets: Gaussian random Markov random fields (GMRF; grmfields), composite likelihood methods (block_composite_likelihood), and neighbor-based likelihood approximations (vecchia88) belong to this family.
In fact, the recent literature has witnessed substantial activity surrounding the so called Vecchia approximation (vecchia88). This approximation can be looked upon as a special case of the GMRF approximations with a simplified neighborhood structure motivated from a directed acyclic graphical (DAG) representation of a Gaussian process likelihood. Extensions leading to well-defined spatial processes to accommodate inference at arbitrary locations by extending the DAG representation to the entire domain include Nearest neighbor Gaussian processes (NNGP; nngp; nngp_aoas) and further generalizations and enhancements by constructing DAGs over the augmented space of outcomes and spatial effects (katzfuss_vecchia). These approaches render computational scalability by introducing sparsity in the precision matrix. The DAG relies upon a specific topological ordering of the locations, which also determine the construction of the neighborhood sets, and certain orderings tend to deliver improved performance of such models (katzfuss_vecchia; guinness_techno).
When inference on the latent process is sought, Bayesian inference has the benefits of providing direct probability statements based upon the posterior distribution of the process. Inference based on asymptotic approximations are avoided, but there remain challenges in computing the posterior distribution given that inference is sought on a very high-dimensional parameter space (including the realizations of the latent process). One possibility, available for Gaussian first-stage likelihoods, is to work with a collapsed or marginalized likelihood by integrating out the spatial random effects. However, Gibbs samplers and other MCMC algorithms for the collapsed models can be inexorably slow and are impractical when data are in the millions. A sequential Gibbs sampler that updates the latent spatial effects(nngp)
is faster in updating the parameters but suffers from high autocorrelation and slow mixing. Another possibility emerges when interest lies in prediction or imputation of the outcome variable only and not the latent process. Here, a so called “response” model that models the outcome itself using an NNGP can be constructed. This model is much faster and enjoys superior convergence properties, but we lose inference on the latent process and its predictive performance tends to be inferior to the latent process model. Furthermore, these options are unavailable in non-Gaussian first-stage hierarchical models or when the focus is not uniquely on prediction. A detailed comparison of different approaches for computing Bayesian NNGP models is presented innngp_algos.