Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

04/02/2020
by   Molei Liu, et al.
0

Identifying informative predictors in a high dimensional regression model is a critical step for association analysis and predictive modeling. Signal detection in the high dimensional setting often fails due to the limited sample size. One approach to improve power is through meta-analyzing multiple studies on the same scientific question. However, integrative analysis of high dimensional data from multiple studies is challenging in the presence of between study heterogeneity. The challenge is even more pronounced with additional data sharing constraints under which only summary data but not individual level data can be shared across different sites. In this paper, we propose a novel data shielding integrative large-scale testing (DSILT) approach to signal detection by allowing between study heterogeneity and not requiring sharing of individual level data. Assuming the underlying high dimensional regression models of the data differ across studies yet share similar support, the DSILT approach incorporates proper integrative estimation and debiasing procedures to construct test statistics for the overall effects of specific covariates. We also develop a multiple testing procedure to identify significant effects while controlling for false discovery rate (FDR) and false discovery proportion (FDP). Theoretical comparisons of the DSILT procedure with the ideal individual–level meta–analysis (ILMA) approach and other distributed inference methods are investigated. Simulation studies demonstrate that the DSILT procedure performs well in both false discovery control and attaining power. The proposed method is applied to a real example on detecting interaction effect of the genetic variants for statins and obesity on the risk for Type 2 Diabetes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2022

A Decorrelating and Debiasing Approach to Simultaneous Inference for High-Dimensional Confounded Models

Motivated by the simultaneous association analysis with the presence of ...
research
02/16/2019

Privacy Preserving Integrative Regression Analysis of High-dimensional Heterogeneous Data

Meta-analyzing multiple studies, enabling more precise estimation and in...
research
08/31/2022

Two-stage Hypothesis Tests for Variable Interactions with FDR Control

In many scenarios such as genome-wide association studies where dependen...
research
01/11/2022

Estimation and Inference with Proxy Data and its Genetic Applications

Existing high-dimensional statistical methods are largely established fo...
research
10/13/2019

Five Shades of Grey: Phase Transitions in High-dimensional Multiple Testing

We are motivated by marginal screenings of categorical variables, and st...
research
12/23/2022

Sufficient Dimension Reduction for Populations with Structured Heterogeneity

A key challenge in building effective regression models for large and di...
research
09/23/2021

Joint Estimation and Inference for Multi-Experiment Networks of High-Dimensional Point Processes

Modern high-dimensional point process data, especially those from neuros...

Please sign up or login with your details

Forgot password? Click here to reset