Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains

by   Michele Peruzzi, et al.

We introduce a class of scalable Bayesian hierarchical models for the analysis of massive geostatistical datasets. The underlying idea combines ideas on high-dimensional geostatistics by partitioning the spatial domain and modeling the regions in the partition using a sparsity-inducing directed acyclic graph (DAG). We extend the model over the DAG to a well-defined spatial process, which we call the Meshed Gaussian Process (MGP). A major contribution is the development of a MGPs on tessellated domains, accompanied by a Gibbs sampler for the efficient recovery of spatial random effects. In particular, the cubic MGP (Q-MGP) can harness high-performance computing resources by executing all large-scale operations in parallel within the Gibbs sampler, improving mixing and computing time compared to sequential updating schemes. Unlike some existing models for large spatial data, a Q-MGP facilitates massive caching of expensive matrix operations, making it particularly apt in dealing with spatiotemporal remote-sensing data. We compare Q-MGPs with large synthetic and real world data against state-of-the-art methods. We also illustrate using Normalized Difference Vegetation Index (NDVI) data from the Serengeti park region to recover latent multivariate spatiotemporal random effects at millions of locations. The source code is available at https://github.com/mkln/meshgp.



page 2

page 22

page 25

page 26

page 32

page 35


A Scalable Partitioned Approach to Model Massive Nonstationary Non-Gaussian Spatial Datasets

Nonstationary non-Gaussian spatial data are common in many disciplines, ...

Spatial meshing for general Bayesian multivariate models

Quantifying spatial and/or temporal associations in multivariate geoloca...

Spatial Multivariate Trees for Big Data Bayesian Regression

High resolution geospatial data are challenging because standard geostat...

Bag of DAGs: Flexible Scalable Modeling of Spatiotemporal Dependence

We propose a computationally efficient approach to construct a class of ...

Block Nearest Neighboor Gaussian processes for large datasets

This work develops a valid spatial block-Nearest Neighbor Gaussian proce...

Grid-Parametrize-Split (GriPS) for Improved Scalable Inference in Spatial Big Data Analysis

Rapid advancements in spatial technologies including Geographic Informat...

Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics

Understanding sub-cellular protein localisation is an essential componen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Collecting large quantities of spatial and spatiotemporal data is now commonplace in many fields. In ecology and forestry, very large datasets are collected using satellite imaging and other remote sensing instruments such as LiDAR that periodically record high-resolution images. Unfortunately, clouds frequently obstruct the view, resulting in large regions with missing information. Figure 1

shows this phenomenon using Normalized Difference Vegetation Index (NDVI) data from the Serengeti region. Filling such gaps in the data is thus an important goal, as is quantifying the associated uncertainty in predictions. This goal can be achieved via probabilistic modeling of the underlying phenomenon, which involves the specification of a spatial or spatiotemporal process that characterizes dependence among any finite set of random variables. Gaussian processes (GP) are a common, flexible choice to characterize spatial dependence, but a standard implementation is notoriously burdened by their

computational complexity. As a consequence, intense research has been devoted in recent years to the development of scalable models for large spatial datasets – see in-depth reviews by sunligenton and sudipto_ba17.

Figure 1: Left: NDVI in the Serengeti region on 2016-12-17. White areas correspond to missing data due to cloud cover. Right: elevation data for the same region.

Computational complexity can be reduced by considering low-rank models; among these, knot-based methods motivated by kriging ideas enjoy some optimality properties but oversmooth the estimates of spatial random effects unless the number of knots is large, and require corrections to avoid overestimation of the nugget

(gp_predictive_process; frk; gp_pp_biasadj; pp_adaptive_knots; pp_spacetime). Other methods reduce the computational burden by introducing sparsity in the covariance matrix; strategies include tapering (taper1; taper2) or partitioning of the spatial domain into regions with a typical assumption of independence across regions (fsa; stein2014). These can be improved by considering a recursive partitioning scheme, resulting in a multi-resolution approximation (MRA; katzfuss_jasa17). Other assumptions on conditional independence assumptions also have a good track record in terms of scalability to large spatial datasets: Gaussian random Markov random fields (GMRF; grmfields), composite likelihood methods (block_composite_likelihood), and neighbor-based likelihood approximations (vecchia88) belong to this family.

In fact, the recent literature has witnessed substantial activity surrounding the so called Vecchia approximation (vecchia88). This approximation can be looked upon as a special case of the GMRF approximations with a simplified neighborhood structure motivated from a directed acyclic graphical (DAG) representation of a Gaussian process likelihood. Extensions leading to well-defined spatial processes to accommodate inference at arbitrary locations by extending the DAG representation to the entire domain include Nearest neighbor Gaussian processes (NNGP; nngp; nngp_aoas) and further generalizations and enhancements by constructing DAGs over the augmented space of outcomes and spatial effects (katzfuss_vecchia). These approaches render computational scalability by introducing sparsity in the precision matrix. The DAG relies upon a specific topological ordering of the locations, which also determine the construction of the neighborhood sets, and certain orderings tend to deliver improved performance of such models (katzfuss_vecchia; guinness_techno).

When inference on the latent process is sought, Bayesian inference has the benefits of providing direct probability statements based upon the posterior distribution of the process. Inference based on asymptotic approximations are avoided, but there remain challenges in computing the posterior distribution given that inference is sought on a very high-dimensional parameter space (including the realizations of the latent process). One possibility, available for Gaussian first-stage likelihoods, is to work with a collapsed or marginalized likelihood by integrating out the spatial random effects. However, Gibbs samplers and other MCMC algorithms for the collapsed models can be inexorably slow and are impractical when data are in the millions. A sequential Gibbs sampler that updates the latent spatial effects


is faster in updating the parameters but suffers from high autocorrelation and slow mixing. Another possibility emerges when interest lies in prediction or imputation of the outcome variable only and not the latent process. Here, a so called “response” model that models the outcome itself using an NNGP can be constructed. This model is much faster and enjoys superior convergence properties, but we lose inference on the latent process and its predictive performance tends to be inferior to the latent process model. Furthermore, these options are unavailable in non-Gaussian first-stage hierarchical models or when the focus is not uniquely on prediction. A detailed comparison of different approaches for computing Bayesian NNGP models is presented in