Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

by   Jue Hou, et al.

Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors. In this paper, we develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors, leveraging a large unlabeled data on candidate predictors and surrogates of outcome, as well as a small labeled data with annotated outcomes. The SAS procedure borrows information from surrogates along with candidate predictors to impute the unobserved outcomes via a sparse working imputation model with moment conditions to achieve robustness against mis-specification in the imputation model and a one-step bias correction to enable interval estimation for the predicted risk. We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model, even when the underlying risk prediction model is dense and the risk model is mis-specified. We present an extensive simulation study to demonstrate the superiority of our SSL approach compared to existing supervised methods. We apply the method to derive genetic risk prediction of type-2 diabetes mellitus using a EHR biobank cohort.



There are no comments yet.


page 1

page 2

page 3

page 4


Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance

In many modern machine learning applications, the outcome is expensive o...

Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

There is strong interest in conducting comparative effectiveness researc...

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping

Electronic Health Records (EHR) data, a rich source for biomedical resea...

Risk Prediction with Imperfect Survival Outcome Information from Electronic Health Records

Readily available proxies for time of disease onset such as time of the ...

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the...

Supervised Autoencoders Learn Robust Joint Factor Models of Neural Activity

Factor models are routinely used for dimensionality reduction in modelin...

Model-assisted estimation in high-dimensional settings for survey data

Model-assisted estimators have attracted a lot of attention in the last ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.