Doubly Robust Augmented Model Accuracy Transfer Inference with High Dimensional Features
Due to label scarcity and covariate shift happening frequently in real-world studies, transfer learning has become an essential technique to train models generalizable to some target populations using existing labeled source data. Most existing transfer learning research has been focused on model estimation, while there is a paucity of literature on transfer inference for model accuracy despite its importance. We propose a novel 𝐃oubly 𝐑obust 𝐀ugmented 𝐌odel 𝐀ccuracy 𝐓ransfer 𝐈nferen𝐂e (DRAMATIC) method for point and interval estimation of commonly used classification performance measures in an unlabeled target population using labeled source data. Specifically, DRAMATIC derives and evaluates the risk model for a binary response Y against some low dimensional predictors 𝐀 on the target population, leveraging Y from source data only and high dimensional adjustment features 𝐗 from both the source and target data. The proposed estimators are doubly robust in the sense that they are n^1/2 consistent when at least one model is correctly specified and certain model sparsity assumptions hold. Simulation results demonstrate that the point estimation have negligible bias and the confidence intervals derived by DRAMATIC attain satisfactory empirical coverage levels. We further illustrate the utility of our method to transfer the genetic risk prediction model and its accuracy evaluation for type II diabetes across two patient cohorts in Mass General Brigham (MGB) collected using different sampling mechanisms and at different time points.
READ FULL TEXT