Credibility of high R^2 in regression problems: a permutation approach

05/04/2023
by   Michał Ciszewski, et al.
0

The question of whether Y can be predicted based on X often arises and while a well adjusted model may perform well on observed data, the risk of overfitting always exists, leading to poor generalization error on unseen data. This paper proposes a rigorous permutation test to assess the credibility of high R^2 values in regression models, which can also be applied to any measure of goodness of fit, without the need for sample splitting, by generating new pairings of (X_i, Y_j) and providing an overall interpretation of the model's accuracy. It introduces a new formulation of the null hypothesis and justification for the test, which distinguishes it from previous literature. The theoretical findings are applied to both simulated data and sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test, showing that the less informative the predictors, the lower the probability of rejecting the null hypothesis, and emphasizing that detecting weaker dependence between variables requires a sufficient sample size.

READ FULL TEXT

page 9

page 10

page 15

page 16

page 17

page 18

research
03/06/2023

Model checking for high-dimensional parametric regressions: the conditionally studentized test

This paper studies model checking for general parametric regression mode...
research
12/13/2017

A Permutation Test on Complex Sample Data

Permutation tests are a distribution free way of performing hypothesis t...
research
09/07/2020

Permutation Testing for Dependence in Time Series

Given observations from a stationary time series, permutation tests allo...
research
11/01/2019

Exact model comparisons in the plausibility framework

Plausibility is a formalization of exact tests for parametric models and...
research
05/22/2020

The probability of a robust inference for internal validity and its applications in regression models

The internal validity of observational study is often subject to debate....
research
06/09/2023

Null/No Information Rate (NIR): a statistical test to assess if a classification accuracy is significant for a given problem

In many research contexts, especially in the biomedical field, after stu...

Please sign up or login with your details

Forgot password? Click here to reset