Ensemble Risk Modeling Method for Robust Learning on Scarce Data

08/13/2011
by   Marina Sapir, et al.
0

In medical risk modeling, typical data are "scarce": they have relatively small number of training instances (N), censoring, and high dimensionality (M). We show that the problem may be effectively simplified by reducing it to bipartite ranking, and introduce new bipartite ranking algorithm, Smooth Rank, for robust learning on scarce data. The algorithm is based on ensemble learning with unsupervised aggregation of predictors. The advantage of our approach is confirmed in comparison with two "gold standard" risk modeling methods on 10 real life survival analysis datasets, where the new approach has the best results on all but two datasets with the largest ratio N/M. For systematic study of the effects of data scarcity on modeling by all three methods, we conducted two types of computational experiments: on real life data with randomly drawn training sets of different sizes, and on artificial data with increasing number of features. Both experiments demonstrated that Smooth Rank has critical advantage over the popular methods on the scarce data; it does not suffer from overfitting where other methods do.

READ FULL TEXT
research
11/09/2015

PAC-Bayesian High Dimensional Bipartite Ranking

This paper is devoted to the bipartite ranking problem, a classical stat...
research
09/26/2013

Stochastic Rank Aggregation

This paper addresses the problem of rank aggregation, which aims to find...
research
10/28/2010

Random Graph Generator for Bipartite Networks Modeling

The purpose of this article is to introduce a new iterative algorithm wi...
research
08/31/2020

PT-Ranking: A Benchmarking Platform for Neural Learning-to-Rank

Deep neural networks has become the first choice for researchers working...
research
01/07/2021

Kullback-Leibler-Based Discrete Relative Risk Models for Integration of Published Prediction Models with New Dataset

Existing literature for prediction of time-to-event data has primarily f...
research
10/06/2014

Top Rank Optimization in Linear Time

Bipartite ranking aims to learn a real-valued ranking function that orde...
research
12/18/2013

Systematic and multifactor risk models revisited

Systematic and multifactor risk models are revisited via methods which w...

Please sign up or login with your details

Forgot password? Click here to reset