The Terminating-Knockoff Filter: Fast High-Dimensional Variable Selection with False Discovery Rate Control

10/12/2021
by   Jasin Machkour, et al.
0

We propose the Terminating-Knockoff (T-Knock) filter, a fast variable selection method for high-dimensional data. The T-Knock filter controls a user-defined target false discovery rate (FDR) while maximizing the number of selected true positives. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original data and multiple sets of randomly generated knockoff variables. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations show that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the knockoffs can be sampled from any univariate distribution. The computational complexity of the proposed method is derived and it is demonstrated via numerical simulations that the sequential computation time is multiple orders of magnitude lower than that of the strongest benchmark methods in sparse high-dimensional settings. The T-Knock filter outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its computation time is more than two orders of magnitude lower than that of the strongest benchmark methods.

READ FULL TEXT
research
07/03/2018

Controlling the False Discovery Rate via Knockoff for High Dimensional Ising Model Variable Selection

In high dimensional data analysis, it is important to effectively contro...
research
04/20/2018

Variable Selection via Adaptive False Negative Control in High-Dimensional Regression

In high-dimensional regression, variable selection methods have been dev...
research
08/23/2021

StarTrek: Combinatorial Variable Selection with False Discovery Rate Control

Variable selection on the large-scale networks has been extensively stud...
research
11/06/2022

Iterative variable selection for high-dimensional data with binary outcomes

We propose an iterative variable selection scheme for high-dimensional d...
research
11/13/2008

P-values for high-dimensional regression

Assigning significance in high-dimensional regression is challenging. Mo...
research
10/15/2018

Exploratory Mediation Analysis with Many Potential Mediators

Social and behavioral scientists are increasingly employing technologies...
research
03/12/2018

False Discovery Rate Control via Debiased Lasso

We consider the problem of variable selection in high-dimensional statis...

Please sign up or login with your details

Forgot password? Click here to reset