Semi-supervised Triply Robust Inductive Transfer Learning

09/12/2022
by   Tianxi Cai, et al.
0

In this work, we propose a semi-supervised triply robust inductive transfer learning (STRIFLE) approach, which integrates heterogeneous data from label rich source population and label scarce target population to improve the learning accuracy in the target population. Specifically, we consider a high dimensional covariate shift setting and employ two nuisance models, a density ratio model and an imputation model, to combine transfer learning and surrogate-assisted semi-supervised learning strategies organically and achieve triple robustness. While the STRIFLE approach requires the target and source populations to share the same conditional distribution of outcome Y given both the surrogate features S and predictors X, it allows the true underlying model of Y|X to differ between the two populations due to the potential covariate shift in S and X. Different from double robustness, even if both nuisance models are misspecified or the distribution of Y|S,X is not the same between the two populations, when the transferred source population and the target population share enough similarities, the triply robust STRIFLE estimator can still partially utilize the source population, and it is guaranteed to be no worse than the target-only surrogate-assisted semi-supervised estimator with negligible errors. These desirable properties of our estimator are established theoretically and verified in finite-sample via extensive simulation studies. We utilize the STRIFLE estimator to train a Type II diabetes polygenic risk prediction model for the African American target population by transferring knowledge from electronic health records linked genomic data observed in a larger European source population.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/16/2022

Semi-supervised Transfer Learning for Evaluation of Model Classification Performance

In modern machine learning applications, frequent encounters of covariat...
research
08/10/2022

Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features

Due to label scarcity and covariate shift happening frequently in real-w...
research
06/28/2023

Efficient and Multiply Robust Risk Estimation under General Forms of Dataset Shift

Statistical machine learning methods often face the challenge of limited...
research
07/09/2023

Doubly Flexible Estimation under Label Shift

In studies ranging from clinical medicine to policy research, complete d...
research
05/04/2021

Surrogate Assisted Semi-supervised Inference for High Dimensional Risk Prediction

Risk modeling with EHR data is challenging due to a lack of direct obser...
research
09/05/2023

Distributionally Robust Machine Learning with Multi-source Data

Classical machine learning methods may lead to poor prediction performan...
research
02/09/2023

Surrogate-Assisted Federated Learning of high dimensional Electronic Health Record Data

Surrogate variables in electronic health records (EHR) play an important...

Please sign up or login with your details

Forgot password? Click here to reset