Two-stage Hypothesis Tests for Variable Interactions with FDR Control

08/31/2022
by   Jingyi Duan, et al.
5

In many scenarios such as genome-wide association studies where dependences between variables commonly exist, it is often of interest to infer the interaction effects in the model. However, testing pairwise interactions among millions of variables in complex and high-dimensional data suffers from low statistical power and huge computational cost. To address these challenges, we propose a two-stage testing procedure with false discovery rate (FDR) control, which is known as a less conservative multiple-testing correction. Theoretically, the difficulty in the FDR control dues to the data dependence among test statistics in two stages, and the fact that the number of hypothesis tests conducted in the second stage depends on the screening result in the first stage. By using the Cramér type moderate deviation technique, we show that our procedure controls FDR at the desired level asymptotically in the generalized linear model (GLM), where the model is allowed to be misspecified. In addition, the asymptotic power of the FDR control procedure is rigorously established. We demonstrate via comprehensive simulation studies that our two-stage procedure is computationally more efficient than the classical BH procedure, with a comparable or improved statistical power. Finally, we apply the proposed method to a bladder cancer data from dbGaP where the scientific goal is to identify genetic susceptibility loci for bladder cancer.

READ FULL TEXT
research
04/02/2020

Integrative High Dimensional Multiple Testing with Heterogeneity under Data Sharing Constraints

Identifying informative predictors in a high dimensional regression mode...
research
04/16/2020

Smaller p-values in genomics studies using distilled historical information

Medical research institutions have generated massive amounts of biologic...
research
07/27/2022

Model-Free, Monotone Invariant and Computationally Efficient Feature Screening with Data-adaptive Threshold

Feature screening for ultrahigh-dimension, in general, proceeds with two...
research
12/04/2020

Derandomizing Knockoffs

Model-X knockoffs is a general procedure that can leverage any feature i...
research
11/28/2017

Latent Association Mining in Binary Data

We consider the problem of identifying groups of mutually associated var...
research
09/13/2023

Simultaneous inference for generalized linear models with unmeasured confounders

Tens of thousands of simultaneous hypothesis tests are routinely perform...
research
12/14/2017

A Two-stage Online Monitoring Procedure for High-Dimensional Data Streams

Advanced computing and data acquisition technologies have made possible ...

Please sign up or login with your details

Forgot password? Click here to reset