Similarity-based Random Survival Forest
Predicting the time to a clinical outcome for patients in intensive care units (ICUs) helps to support critical medical treatment decisions. The time to an event of interest could be, for example, survival time or time to recovery from a disease/ailment observed within the ICU. The massive health datasets generated from the uptake of Electronic Health Records (EHRs) are diverse in variety as patients can be quite dissimilar in their relationship between the feature vector and the outcome, adding more noise than information to prediction. We propose a modified random forest method for survival data that identifies similar cases and improves prediction accuracy. We also introduce an adaptation of our methodology in the case of dependent censoring. Our proposed method is demonstrated in the Medical Information Mart for Intensive Care (MIMIC-III) database, and we also present properties of our methodology through a comprehensive simulation study. Introducing similarity to the random survival forest method indeed provides additional predictive accuracy compared to random survival forest alone in the various analyses we undertook.
READ FULL TEXT