Optimal Ensemble Construction for Multi-Study Prediction with Applications to COVID-19 Excess Mortality Estimation

09/19/2021
by   Gabriel Loewinger, et al.
5

It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets and applying standard statistical learning methods can result in poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied work has shown multi-study ensembling to be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multi-study ensembling uses a two-stage stacking strategy which fits study-specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model-fitting stage, potentially resulting in a loss of efficiency. We therefore propose optimal ensemble construction, an all-in-one approach to multi-study stacking whereby we jointly estimate ensemble weights as well as parameters associated with each study-specific model. We prove that limiting cases of our approach yield existing methods such as multi-study stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the proposed loss function. We compare our approach to standard methods by applying it to a multi-country COVID-19 dataset for baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. Importantly, our approach outperforms multi-study stacking and other standard methods in this application. We further characterize the method's performance in simulations. Our method remains competitive with or outperforms multi-study stacking and other earlier methods across a range of between-study heterogeneity levels.

READ FULL TEXT

page 20

page 29

page 30

research
09/14/2022

Estimating the impact of the COVID-19 pandemic using granular mortality data

We present an extension of the Li and Lee model to quantify mortality in...
research
07/24/2020

Cross-study learning for generalist and specialist predictions

Jointly using data from multiple similar sources for the training of pre...
research
05/18/2022

Estimating Global and Country-Specific Excess Mortality During the COVID-19 Pandemic

Estimating the true mortality burden of COVID-19 for every country in th...
research
06/10/2020

Rethinking Case Fatality Ratios for COVID-19 from a data-driven viewpoint

The case fatality ratio (CFR) for COVID-19 is difficult to estimate. One...
research
07/21/2019

Infant Mortality Prediction using Birth Certificate Data

The Infant Mortality Rate (IMR) is the number of infants per 1000 that d...
research
03/08/2022

On the intrinsic dimensionality of Covid-19 data: a global perspective

This paper aims to develop a global perspective of the complexity of the...

Please sign up or login with your details

Forgot password? Click here to reset