Computing AIC for black-box models using Generalised Degrees of Freedom: a comparison with cross-validation

03/09/2016
by   Severin Hauenstein, et al.
0

Generalised Degrees of Freedom (GDF), as defined by Ye (1998 JASA 93:120-131), represent the sensitivity of model fits to perturbations of the data. As such they can be computed for any statistical model, making it possible, in principle, to derive the number of parameters in machine-learning approaches. Defined originally for normally distributed data only, we here investigate the potential of this approach for Bernoulli-data. GDF-values for models of simulated and real data are compared to model complexity-estimates from cross-validation. Similarly, we computed GDF-based AICc for randomForest, neural networks and boosted regression trees and demonstrated its similarity to cross-validation. GDF-estimates for binary data were unstable and inconsistently sensitive to the number of data points perturbed simultaneously, while at the same time being extremely computer-intensive in their calculation. Repeated 10-fold cross-validation was more robust, based on fewer assumptions and faster to compute. Our findings suggest that the GDF-approach does not readily transfer to Bernoulli data and a wider range of regression approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2023

Bootstrapping the Cross-Validation Estimate

Cross-validation is a widely used technique for evaluating the performan...
research
11/15/2017

Accelerating Cross-Validation in Multinomial Logistic Regression with ℓ_1-Regularization

We develop an approximate formula for evaluating a cross-validation esti...
research
07/06/2022

Degrees of Freedom and Information Criteria for the Synthetic Control Method

We provide an analytical characterization of the model flexibility of th...
research
06/16/2020

On parametric tests of relativity with false degrees of freedom

General relativity can be tested by comparing the binary-inspiral signal...
research
01/18/2021

The Violating Assumptions Series: Simulated demonstrations to illustrate how assumptions can affect statistical estimates

When teaching and discussing statistical assumptions, our focus is often...
research
03/23/2018

A Concept Learning Tool Based On Calculating Version Space Cardinality

In this paper, we proposed VeSC-CoL (Version Space Cardinality based Con...
research
03/24/2021

Loss based prior for the degrees of freedom of the Wishart distribution

In this paper we propose a novel method to deal with Vector Autoregressi...

Please sign up or login with your details

Forgot password? Click here to reset