The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression

02/18/2022
by   Ruben van den Goorbergh, et al.
0

Methods to correct class imbalance, i.e. imbalance between the frequency of outcome events and non-events, are receiving increasing interest for developing prediction models. We examined the effect of imbalance correction on the performance of standard and penalized (ridge) logistic regression models in terms of discrimination, calibration, and classification. We examined random undersampling, random oversampling and SMOTE using Monte Carlo simulations and a case study on ovarian cancer diagnosis. The results indicated that all imbalance correction methods led to poor calibration (strong overestimation of the probability to belong to the minority class), but not to better discrimination in terms of the area under the receiver operating characteristic curve. Imbalance correction improved classification in terms of sensitivity and specificity, but similar results were obtained by shifting the probability threshold instead. Our study shows that outcome imbalance is not a problem in itself, and that imbalance correction may even worsen model performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/19/2021

On resampling methods for model assessment in penalized and unpenalized logistic regression

Penalized logistic regression methods are frequently used to investigate...
research
03/26/2021

Predictive and explanatory models might miss informative features in educational data

We encounter variables with little variation often in educational data m...
research
06/20/2022

A Comparative Study on Application of Class-Imbalance Learning for Severity Prediction of Adverse Events Following Immunization

In collaboration with the Liaoning CDC, China, we propose a prediction s...
research
11/22/2019

Investigating bankruptcy prediction models in the presence of extreme class imbalance and multiple stages of economy

In the area of credit risk analytics, current Bankruptcy Prediction Mode...
research
10/16/2021

Minding non-collapsibility of odds ratios when recalibrating risk prediction models

In clinical prediction modeling, model updating refers to the practice o...
research
02/11/2020

A Non-Intrusive Correction Algorithm for Classification Problems with Corrupted Data

A novel correction algorithm is proposed for multi-class classification ...

Please sign up or login with your details

Forgot password? Click here to reset