Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling

07/11/2022
by   Cathy Shyr, et al.
3

Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study learning strategies: 1) merging all the studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and ensembling the resulting predictions. In the regression setting, we provide theoretical guidelines based on an analytical transition point to determine whether it is more beneficial to merge or to ensemble for boosting with linear learners. In addition, we characterize a bias-variance decomposition of estimation error for boosting with component-wise linear learners. We verify the theoretical transition point result in simulation and illustrate how it can guide the decision on merging vs. ensembling in an application to breast cancer gene expression data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/17/2019

Merging versus Ensembling in Multi-Study Machine Learning: Theoretical Insight from Random Effects

A critical decision point when training predictors using multiple studie...
research
06/04/2021

On Ensembling vs Merging: Least Squares and Random Forests under Covariate Shift

It has been postulated and observed in practice that for prediction prob...
research
08/08/2016

Boosting as a kernel-based method

Boosting combines weak (biased) learners to obtain effective learning al...
research
07/16/2017

An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit

Our work focuses on the problem of predicting the transfer of pediatric ...
research
03/12/2019

Neyman-Pearson Criterion (NPC): A Model Selection Criterion for Asymmetric Binary Classification

We propose a new model selection criterion, the Neyman-Pearson criterion...
research
07/09/2023

Ensemble learning for blending gridded satellite and gauge-measured precipitation data

Regression algorithms are regularly used for improving the accuracy of s...
research
07/24/2013

Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

The development of molecular signatures for the prediction of time-to-ev...

Please sign up or login with your details

Forgot password? Click here to reset