A Scalable Partitioned Approach to Model Massive Nonstationary Non-Gaussian Spatial Datasets

by   Benjamin Seiyon Lee, et al.

Nonstationary non-Gaussian spatial data are common in many disciplines, including climate science, ecology, epidemiology, and social sciences. Examples include count data on disease incidence and binary satellite data on cloud mask (cloud/no-cloud). Modeling such datasets as stationary spatial processes can be unrealistic since they are collected over large heterogeneous domains (i.e., spatial behavior differs across subregions). Although several approaches have been developed for nonstationary spatial models, these have focused primarily on Gaussian responses. In addition, fitting nonstationary models for large non-Gaussian datasets is computationally prohibitive. To address these challenges, we propose a scalable algorithm for modeling such data by leveraging parallel computing in modern high-performance computing systems. We partition the spatial domain into disjoint subregions and fit locally nonstationary models using a carefully curated set of spatial basis functions. Then, we combine the local processes using a novel neighbor-based weighting scheme. Our approach scales well to massive datasets (e.g., 1 million samples) and can be implemented in nimble, a popular software environment for Bayesian hierarchical modeling. We demonstrate our method to simulated examples and two large real-world datasets pertaining to infectious diseases and remote sensing.


page 9

page 19

page 26

page 30

page 32

page 33

page 34

page 35


Highly Scalable Bayesian Geostatistical Modeling via Meshed Gaussian Processes on Partitioned Domains

We introduce a class of scalable Bayesian hierarchical models for the an...

Conjugate Nearest Neighbor Gaussian Process Models for Efficient Statistical Interpolation of Large Spatial Data

A key challenge in spatial statistics is the analysis for massive spatia...

SXL: Spatially explicit learning of geographic processes with auxiliary tasks

From earth system sciences to climate modeling and ecology, many of the ...

Practical Bayesian Modeling and Inference for Massive Spatial Datasets On Modest Computing Environments

With continued advances in Geographic Information Systems and related co...

Nonstationary Spatial Modeling of Massive Global Satellite Data

Earth-observing satellite instruments obtain a massive number of observa...

Combining Heterogeneous Spatial Datasets with Process-based Spatial Fusion Models: A Unifying Framework

In modern spatial statistics, the structure of data that is collected ha...

P-spline smoothing for spatial data collected worldwide

Spatial data collected worldwide at a huge number of locations are frequ...