Comparison of methods for early-readmission prediction in a high-dimensional heterogeneous covariates and time-to-event outcome framework

07/25/2018
by   Simon Bussy, et al.
4

Background: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (based on an arbitrarily chosen delay) or within a survival setting, but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies. Methods: Using a high-dimensional case study on a sickle-cell disease (SCD) cohort, we compare 8 statistical methods. In the binary outcome setting, we consider logistic regression (LR), support vector machine (SVM), random forest (RF), gradient boosting (GB) and neural network (NN); while on the survival analysis setting, we consider the Cox Proportional Hazards (PH), the CURE and the C-mix models. We then compare performances of all methods both in terms of risk prediction and variable selection, with a focus on the use of Elastic-Net regularization technique. Results: Among all assessed statistical methods assessed, the C-mix model yields the better performances in both the two considered settings, as well as interesting interpretation aspects. There is some consistency in selected covariates across methods within a setting, but not much across the two settings. Conclusions: It appears that learning withing the survival setting first, and then going back to a binary prediction using the survival estimates significantly enhance binary predictions.

READ FULL TEXT

page 1

page 6

page 10

research
09/04/2022

Copula Entropy based Variable Selection for Survival Analysis

Variable selection is an important problem in statistics and machine lea...
research
03/29/2022

Towards Filling the Gaps around Recurrent Events in High-Dimensional Framework: Literature Review and Early Comparison

Background Study individuals may face repeated events overtime. However,...
research
01/16/2022

A review and recommendations on variable selection methods in regression models for binary data

The selection of essential variables in logistic regression is vital bec...
research
10/26/2020

Accurate Prediction of Neuroblastoma Outcome based on miRNA Expression Profiles

For neuroblastoma, the most common extracranial tumour of childhood, ide...
research
09/08/2022

BatMan: Mitigating Batch Effects via Stratification for Survival Outcome Prediction

Reproducible translation of transcriptomics data has been hampered by th...
research
05/21/2008

An ensemble approach to improved prediction from multitype data

We have developed a strategy for the analysis of newly available binary ...

Please sign up or login with your details

Forgot password? Click here to reset