Cross validation for model selection: a primer with examples from ecology

03/09/2022
by   Luke Yates, et al.
0

The growing use of model-selection principles in ecology for statistical inference is underpinned by information criteria (IC) and cross-validation (CV) techniques. Although IC techniques, such as Akaike's Information Criterion, have been historically more popular in ecology, CV is a versatile and increasingly used alternative. CV uses data splitting to estimate model scores based on (out-of-sample) predictive performance, which can be used even when it is not possible to derive a likelihood (e.g., machine learning) or count parameters precisely (e.g., mixed-effects models and penalised regression). Here we provide a primer to understanding and applying CV in ecology. We review commonly applied variants of CV, including approximate methods, and make recommendations for their use based on the statistical context. We explain some important – but often overlooked – technical aspects of CV, such as bias correction, estimation uncertainty, score selection, and parsimonious selection rules. We also address misconceptions (and truths) about impediments to the use of CV, including computational cost and ease of implementation, and clarify the relationship between CV and information-theoretic approaches to model selection. The paper includes two ecological case studies which illustrate the application of the techniques. We conclude that CV-based model selection should be widely applied to ecological analyses, because of its robust estimation properties and the broad range of situations for which it is applicable. In particular, we recommend using leave-one-out (LOO) or approximate LOO CV to minimise bias, or otherwise K-fold CV using bias correction if K<10. To mitigate overfitting, we recommend calibrated selection via the modified one-standard-error rule which accounts for the predominant cause of overfitting: score-estimation uncertainty.

READ FULL TEXT

page 9

page 12

page 19

research
06/26/2018

LOO and WAIC as Model Selection Methods for Polytomous Items

Watanabe-Akaike information criterion (WAIC; Watanabe, 2010) and leave-o...
research
01/19/2023

Cross-validatory model selection for Bayesian autoregressions with exogenous regressors

Bayesian cross-validation (CV) is a popular method for predictive model ...
research
09/07/2023

Efficient estimation and correction of selection-induced bias with order statistics

Model selection aims to identify a sufficiently well performing model th...
research
01/02/2019

An Introductory Guide to Fano's Inequality with Applications in Statistical Estimation

Information theory plays an indispensable role in the development of alg...
research
12/08/2012

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variet...
research
11/27/2014

Convex Techniques for Model Selection

We develop a robust convex algorithm to select the regularization parame...
research
06/26/2012

Predictive Approaches For Gaussian Process Classifier Model Selection

In this paper we consider the problem of Gaussian process classifier (GP...

Please sign up or login with your details

Forgot password? Click here to reset