Exploring the Efficacy of Statistical and Deep Learning Methods for Large Spatial Datasets: A Case Study

08/10/2023
by   Arnab Hazra, et al.
0

Increasingly large and complex spatial datasets pose massive inferential challenges due to high computational and storage costs. Our study is motivated by the KAUST Competition on Large Spatial Datasets 2023, which tasked participants with estimating spatial covariance-related parameters and predicting values at testing sites, along with uncertainty estimates. We compared various statistical and deep learning approaches through cross-validation and ultimately selected the Vecchia approximation technique for model fitting. To overcome the constraints in the R package GpGp, which lacked support for fitting zero-mean Gaussian processes and direct uncertainty estimation-two things that are necessary for the competition, we developed additional functions. Besides, we implemented certain subsampling-based approximations and parametric smoothing for skewed sampling distributions of the estimators. Our team DesiBoys secured victory in two out of four sub-competitions, validating the effectiveness of our proposed strategies. Moreover, we extended our evaluation to a large real spatial satellite-derived dataset on total precipitable water, where we compared the predictive performances of different models using multiple diagnostics.

READ FULL TEXT

page 8

page 27

research
06/19/2021

Discussion on Competition for Spatial Statistics for Large Datasets

We discuss the experiences and results of the AppStatUZH team's particip...
research
10/24/2017

Multi-resolution approximations of Gaussian processes for large spatial datasets

Gaussian processes are popular and flexible models for spatial, temporal...
research
04/05/2020

Graphical outputs and Spatial Cross-validation for the R-INLA package using INLAutils

Statistical analyses proceed by an iterative process of model fitting an...
research
12/31/2019

Parallel cross-validation: a scalable fitting method for Gaussian process models

Gaussian process (GP) models are widely used to analyze spatially refere...
research
03/01/2021

Statistical learning and cross-validation for point processes

This paper presents the first general (supervised) statistical learning ...
research
11/26/2020

A Scalable Partitioned Approach to Model Massive Nonstationary Non-Gaussian Spatial Datasets

Nonstationary non-Gaussian spatial data are common in many disciplines, ...
research
06/06/2019

Deep Compositional Spatial Models

Nonstationary, anisotropic spatial processes are often used when modelli...

Please sign up or login with your details

Forgot password? Click here to reset