Assessing the performance of spatial cross-validation approaches for models of spatially structured data

03/13/2023
by   Michael J Mahoney, et al.
0

Evaluating models fit to data with internal spatial structure requires specific cross-validation (CV) approaches, because randomly selecting assessment data may produce assessment sets that are not truly independent of data used to train the model. Many spatial CV methodologies have been proposed to address this by forcing models to extrapolate spatially when predicting the assessment set. However, to date there exists little guidance on which methods yield the most accurate estimates of model performance. We conducted simulations to compare model performance estimates produced by five common CV methods fit to spatially structured data. We found spatial CV approaches generally improved upon resubstitution and V-fold CV estimates, particularly when approaches which combined assessment sets of spatially conjunct observations with spatial exclusion buffers. To facilitate use of these techniques, we introduce the `spatialsample` package which provides tooling for performing spatial CV as part of the broader tidymodels modeling framework.

READ FULL TEXT

page 7

page 11

page 13

page 14

research
05/28/2020

Estimating the Prediction Performance of Spatial Models via Spatial k-Fold Cross Validation

In machine learning one often assumes the data are independent when eval...
research
03/20/2023

waywiser: Ergonomic Methods for Assessing Spatial Models

Assessing predictive models can be challenging. Modelers must navigate a...
research
08/21/2019

Importance of spatial predictor variable selection in machine learning applications – Moving from data reproduction to spatial prediction

Machine learning algorithms find frequent application in spatial predict...
research
02/16/2018

Bayesian cross-validation of geostatistical models

The problem of validating or criticising models for georeferenced data i...
research
04/05/2020

Graphical outputs and Spatial Cross-validation for the R-INLA package using INLAutils

Statistical analyses proceed by an iterative process of model fitting an...
research
03/27/2022

Improving The Diagnosis of Thyroid Cancer by Machine Learning and Clinical Data

Thyroid cancer is a common endocrine carcinoma that occurs in the thyroi...
research
06/30/2018

chemmodlab: A Cheminformatics Modeling Laboratory for Fitting and Assessing Machine Learning Models

The goal of chemmodlab is to streamline the fitting and assessment pipel...

Please sign up or login with your details

Forgot password? Click here to reset