Confidence intervals for the Cox model test error from cross-validation

01/26/2022
by   Min Woo Sun, et al.
5

Cross-validation (CV) is one of the most widely used techniques in statistical learning for estimating the test error of a model, but its behavior is not yet fully understood. It has been shown that standard confidence intervals for test error using estimates from CV may have coverage below nominal levels. This phenomenon occurs because each sample is used in both the training and testing procedures during CV and as a result, the CV estimates of the errors become correlated. Without accounting for this correlation, the estimate of the variance is smaller than it should be. One way to mitigate this issue is by estimating the mean squared error of the prediction error instead using nested CV. This approach has been shown to achieve superior coverage compared to intervals derived from standard CV. In this work, we generalize the nested CV idea to the Cox proportional hazards model and explore various choices of test error for this setting.

READ FULL TEXT

page 6

page 7

research
04/01/2021

Cross-validation: what does it estimate and how well does it do it?

Cross-validation is a widely-used technique to estimate prediction error...
research
01/09/2018

Test Error Estimation after Model Selection Using Validation Error

When performing supervised learning with the model selected using valida...
research
12/29/2021

Application of the Pythagorean Expected Wins Percentage and Cross-Validation Methods in Estimating Team Quality

The Pythagorean Expected Wins Percentage Model was developed by Bill Jam...
research
02/10/2023

The out-of-sample R^2: estimation and inference

Out-of-sample prediction is the acid test of predictive models, yet an i...
research
10/12/2019

Spatio-Temporal Mixed Models to Predict Coverage Error Rates at Local Areas

Despite of the great efforts during the censuses, occurrence of some non...
research
08/22/2023

Toward Generalizable Machine Learning Models in Speech, Language, and Hearing Sciences: Power Analysis and Sample Size Estimation

This study's first purpose is to provide quantitative evidence that woul...
research
01/29/2021

Regularizing Double Machine Learning in Partially Linear Endogenous Models

We estimate the linear coefficient in a partially linear model with conf...

Please sign up or login with your details

Forgot password? Click here to reset