StarTrek: Combinatorial Variable Selection with False Discovery Rate Control

08/23/2021
by   Lu Zhang, et al.
0

Variable selection on the large-scale networks has been extensively studied in the literature. While most of the existing methods are limited to the local functionals especially the graph edges, this paper focuses on selecting the discrete hub structures of the networks. Specifically, we propose an inferential method, called StarTrek filter, to select the hub nodes with degrees larger than a certain thresholding level in the high dimensional graphical models and control the false discovery rate (FDR). Discovering hub nodes in the networks is challenging: there is no straightforward statistic for testing the degree of a node due to the combinatorial structures; complicated dependence in the multiple testing problem is hard to characterize and control. In methodology, the StarTrek filter overcomes this by constructing p-values based on the maximum test statistics via the Gaussian multiplier bootstrap. In theory, we show that the StarTrek filter can control the FDR by providing accurate bounds on the approximation errors of the quantile estimation and addressing the dependence structures among the maximal statistics. To this end, we establish novel Cramér-type comparison bounds for the high dimensional Gaussian random vectors. Comparing to the Gaussian comparison bound via the Kolmogorov distance established by <cit.>, our Cramér-type comparison bounds establish the relative difference between the distribution functions of two high dimensional Gaussian random vectors. We illustrate the validity of the StarTrek filter in a series of numerical experiments and apply it to the genotype-tissue expression dataset to discover central regulator genes.

READ FULL TEXT

page 18

page 19

research
09/13/2020

Central Limit Theorem and Bootstrap Approximation in High Dimensions with Near 1/√(n) Rates

Non-asymptotic bounds for Gaussian and bootstrap approximation have rece...
research
10/12/2021

The Terminating-Knockoff Filter: Fast High-Dimensional Variable Selection with False Discovery Rate Control

We propose the Terminating-Knockoff (T-Knock) filter, a fast variable se...
research
09/12/2021

Differentially Private Variable Selection via the Knockoff Filter

The knockoff filter, recently developed by Barber and Candes, is an effe...
research
10/16/2020

Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

As the power of FDR control methods for high-dimensional variable select...
research
06/07/2020

False (and Missed) Discoveries in Financial Economics

Multiple testing plagues many important questions in finance such as fun...
research
10/21/2020

Transfer Learning in Large-scale Gaussian Graphical Models with False Discovery Rate Control

Transfer learning for high-dimensional Gaussian graphical models (GGMs) ...
research
03/14/2023

Robust Multiple Testing under High-dimensional Dynamic Factor Model

Large-scale multiple testing under static factor models is commonly used...

Please sign up or login with your details

Forgot password? Click here to reset