Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data

03/24/2018
by   Kahkashan Afrin, et al.
0

Accuracies of survival models for life expectancy prediction as well as lifesaving critical-care applications are significantly compromised due to the sparsity of samples and extreme imbalance between the survival and mortality classes in addition to the invalidity of the popular proportional hazard assumption. An imbalance in data results in an underestimation (overestimation) of the hazard of the mortality (survival) classes. Balanced random survival forests (BRSF) model, based on training random survival forests with balanced data generated from a synthetic minority sampling scheme is presented to address this gap. Theoretical findings on the improvement of survival prediction after balancing are corroborated using extensive empirical evaluations. Benchmarking studies consider five data sets of different levels of class imbalance from public repositories and an imbalanced survival data set of 267 ST-elevated myocardial infarction (STEMI) patients collected over a period of one year at Heart, Artery, and Vein Center of Fresno, CA. Investigations suggest BRSF provides a better discriminatory strength between the censored and the mortality classes and improves survival prediction of the minority. BRSF outperformed both optimized Cox (without and with balancing) and RSF with a 55 over the next best alternative.

READ FULL TEXT

page 12

page 14

page 19

research
02/05/2019

Survival Forests under Test: Impact of the Proportional Hazards Assumption on Prognostic and Predictive Forests for ALS Survival

We investigate the effect of the proportional hazards assumption on prog...
research
09/04/2023

Survival Prediction from Imbalance colorectal cancer dataset using hybrid sampling methods and tree-based classifiers

Background and Objective: Colorectal cancer is a high mortality cancer. ...
research
09/16/2017

Some variations on Random Survival Forest with application to Cancer Research

Random survival forest can be extremely time consuming for large data se...
research
06/28/2019

Estimating adult death rates from sibling histories: A network approach

Hundreds of millions of people live in countries that do not have comple...
research
10/17/2021

Real-time Mortality Prediction Using MIMIC-IV ICU Data Via Boosted Nonparametric Hazards

Electronic Health Record (EHR) systems provide critical, rich and valuab...
research
01/21/2022

To SMOTE, or not to SMOTE?

In imbalanced binary classification problems the objective metric is oft...
research
03/11/2018

The shortness of human life constitutes its limit

In this paper, we affirm our earlier findings of evidence for a limit to...

Please sign up or login with your details

Forgot password? Click here to reset