On resampling methods for model assessment in penalized and unpenalized logistic regression

01/19/2021
by   Angelika Geroldinger, et al.
0

Penalized logistic regression methods are frequently used to investigate the relationship between a binary outcome and a set of explanatory variables. The model performance can be assessed by measures such as the concordance statistic (c-statistic), the discrimination slope and the Brier score. Often, data resampling techniques, e.g. crossvalidation, are employed to correct for optimism in these model performance criteria. Especially with small samples or a rare binary outcome variable, leave-one-out crossvalidation is a popular choice. Using simulations and a real data example, we compared the effect of different resampling techniques on the estimation of c-statistics, discrimination slopes and Brier scores for three estimators of logistic regression models, including the maximum likelihood and two maximum penalized-likelihood estimators. Our simulation study confirms earlier studies reporting that leave-one-out crossvalidated c-statistics can be strongly biased towards zero. In addition, our study reveals that this bias is more pronounced for estimators shrinking predicted probabilities towards the observed event rate, such as ridge regression. Leave-one-out crossvalidation also provided pessimistic estimates of the discrimination slope but nearly unbiased estimates of the Brier score. We recommend to use leave-pair-out crossvalidation, five-fold crossvalidation with repetition, the enhanced or the .632+ bootstrap to estimate c-statistics and leave-pair-out or five-fold crossvalidation to estimate discrimination slopes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2021

Firth's logistic regression with rare events: accurate effect estimates AND predictions?

Firth-type logistic regression has become a standard approach for the an...
research
02/18/2022

The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression

Methods to correct class imbalance, i.e. imbalance between the frequency...
research
02/17/2022

Conjugate priors and bias reduction for logistic regression models

Logistic regression models for binomial responses are routinely used in ...
research
09/21/2021

Network meta-analysis of rare events using penalized likelihood regression

Network meta-analysis (NMA) of rare events has attracted little attentio...
research
05/02/2021

Zero-inflated generalized extreme value regression model for binary data and application in health study

Logistic regression model is widely used in many studies to investigate ...
research
01/27/2021

To tune or not to tune, a case study of ridge logistic regression in small or sparse datasets

For finite samples with binary outcomes penalized logistic regression su...

Please sign up or login with your details

Forgot password? Click here to reset