Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

05/04/2021
by   Jue Hou, et al.
0

Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors. In this paper, we develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors, leveraging a large unlabeled data on candidate predictors and surrogates of outcome, as well as a small labeled data with annotated outcomes. The SAS procedure borrows information from surrogates along with candidate predictors to impute the unobserved outcomes via a sparse working imputation model with moment conditions to achieve robustness against mis-specification in the imputation model and a one-step bias correction to enable interval estimation for the predicted risk. We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model, even when the underlying risk prediction model is dense and the risk model is mis-specified. We present an extensive simulation study to demonstrate the superiority of our SSL approach compared to existing supervised methods. We apply the method to derive genetic risk prediction of type-2 diabetes mellitus using a EHR biobank cohort.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2017

Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance

In many modern machine learning applications, the outcome is expensive o...
research
09/12/2022

Semi-supervised Triply Robust Inductive Transfer Learning

In this work, we propose a semi-supervised triply robust inductive trans...
research
02/09/2023

Surrogate-Assisted Federated Learning of high dimensional Electronic Health Record Data

Surrogate variables in electronic health records (EHR) play an important...
research
03/07/2021

Risk Prediction with Imperfect Survival Outcome Information from Electronic Health Records

Readily available proxies for time of disease onset such as time of the ...
research
03/31/2018

Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

There is strong interest in conducting comparative effectiveness researc...
research
10/10/2022

Risk Automatic Prediction for Social Economy Companies using Camels

Governments have to supervise and inspect social economy enterprises (SE...
research
04/06/2021

Semi-supervised empirical Bayes group-regularized factor regression

The features in high dimensional biomedical prediction problems are ofte...

Please sign up or login with your details

Forgot password? Click here to reset