Privacy Preserving Integrative Regression Analysis of High-dimensional Heterogeneous Data

02/16/2019
by   Tianxi Cai, et al.
0

Meta-analyzing multiple studies, enabling more precise estimation and investigation of generalizability, is important for evidence based decision making. Integrative analysis of multiple heterogeneous studies is, however, highly challenging in the high dimensional setting. The challenge is even more pronounced when the individual level data cannot be shared across studies due to privacy concerns. Under ultra high dimensional sparse regression models, we propose in this paper a novel integrative estimation procedure by aggregating and debiasing local estimators (ADeLE), which allows us to base solely on the derived data to perform estimation with general loss functions. The ADeLE procedure accommodates between study heterogeneity in both the covariate distribution and model parameters, and attains consistent variable selection. Furthermore, the prediction and estimation errors incurred by aggregating derived data is negligible compared to the statistical minimax rate. In addition, the ADeLE estimator is shown to be asymptotically equivalent in prediction and estimation to the ideal estimator obtained by sharing all data. The finite-sample performance of the ADeLE procedure is studied via extensive simulations. We further illustrate the utility of the ADeLE procedure to derive phenotyping algorithms for coronary artery disease using electronic health records data from multiple disease cohorts.

READ FULL TEXT
research
04/02/2020

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Identifying informative predictors in a high dimensional regression mode...
research
10/21/2019

High-dimensional robust approximated M-estimators for mean regression with asymmetric data

Asymmetry along with heteroscedasticity or contamination often occurs wi...
research
12/13/2020

Inference for the Case Probability in High-dimensional Logistic Regression

Labeling patients in electronic health records with respect to their sta...
research
03/23/2022

Treatment Effect Estimation with Efficient Data Aggregation

Data aggregation, also known as meta analysis, is widely used to synthes...
research
10/14/2022

Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models

Various privacy-preserving frameworks that respect the individual's priv...
research
01/14/2020

Nonparametric regression for multiple heterogeneous networks

We study nonparametric methods for the setting where multiple distinct n...
research
10/07/2019

A Distributed and Integrated Method of Moments for High-Dimensional Correlated Data Analysis

This paper is motivated by a regression analysis of electroencephalograp...

Please sign up or login with your details

Forgot password? Click here to reset