Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

05/04/2021
by   Jue Hou, et al.
0

Risk modeling with EHR data is challenging due to a lack of direct observations on the disease outcome, and the high dimensionality of the candidate predictors. In this paper, we develop a surrogate assisted semi-supervised-learning (SAS) approach to risk modeling with high dimensional predictors, leveraging a large unlabeled data on candidate predictors and surrogates of outcome, as well as a small labeled data with annotated outcomes. The SAS procedure borrows information from surrogates along with candidate predictors to impute the unobserved outcomes via a sparse working imputation model with moment conditions to achieve robustness against mis-specification in the imputation model and a one-step bias correction to enable interval estimation for the predicted risk. We demonstrate that the SAS procedure provides valid inference for the predicted risk derived from a high dimensional working model, even when the underlying risk prediction model is dense and the risk model is mis-specified. We present an extensive simulation study to demonstrate the superiority of our SSL approach compared to existing supervised methods. We apply the method to derive genetic risk prediction of type-2 diabetes mellitus using a EHR biobank cohort.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/15/2017

Semi-Supervised Approaches to Efficient Evaluation of Model Prediction Performance

In many modern machine learning applications, the outcome is expensive o...
03/31/2018

Efficient and Robust Semi-Supervised Estimation of Average Treatment Effects in Electronic Medical Records Data

There is strong interest in conducting comparative effectiveness researc...
03/26/2020

Prior Adaptive Semi-supervised Learning with Application to EHR Phenotyping

Electronic Health Records (EHR) data, a rich source for biomedical resea...
03/07/2021

Risk Prediction with Imperfect Survival Outcome Information from Electronic Health Records

Readily available proxies for time of disease onset such as time of the ...
09/14/2020

Semi-supervised learning and the question of true versus estimated propensity scores

A straightforward application of semi-supervised machine learning to the...
04/10/2020

Supervised Autoencoders Learn Robust Joint Factor Models of Neural Activity

Factor models are routinely used for dimensionality reduction in modelin...
12/14/2020

Model-assisted estimation in high-dimensional settings for survey data

Model-assisted estimators have attracted a lot of attention in the last ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.