False Discovery Rate Control via Data Splitting

02/20/2020
by   Chenguang Dai, et al.
0

Selecting relevant features associated with a given response variable is an important issue in many scientific fields. Quantifying quality and uncertainty of the selection via the false discovery rate (FDR) control has been of recent interest. This paper introduces a way of using data-splitting strategies to asymptotically control FDR for various feature selection techniques while maintaining high power. For each feature, the method estimates two independent significance coefficients via data splitting, and constructs a contrast statistic. The FDR control is achieved by taking advantage of the statistic's property that, for any null feature, its sampling distribution is symmetric about 0. We further propose a strategy to aggregate multiple data splits (MDS) to stabilize the selection result and boost the power. Interestingly, this multiple data-splitting approach appears capable of overcoming the power loss caused by data splitting with FDR still under control. The proposed framework is applicable to canonical statistical models including linear models, Gaussian graphical models, and deep neural networks. Simulation results, as well as a real data application, show that the proposed approaches, especially the multiple data-splitting strategy, control FDR well and are often more powerful than existing methods including the Benjamini-Hochberg procedure and the knockoff filter.

READ FULL TEXT
research
07/02/2020

A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

The generalized linear models (GLM) have been widely used in practice to...
research
08/05/2022

Feature Selection for Machine Learning Algorithms that Bounds False Positive Rate

The problem of selecting a handful of truly relevant variables in superv...
research
03/30/2021

Controlling the False Discovery Rate in Structural Sparsity: Split Knockoffs

Controlling the False Discovery Rate (FDR) in a variable selection proce...
research
11/12/2021

Bayesian Knockoff Generators for Robust Inference Under Complex Data Structure

The recent proliferation of medical data, such as genetics and electroni...
research
08/30/2019

Nodewise Knockoffs: False Discovery Rate Control for Gaussian Graphical Models

Controlling the false discovery rate (FDR) is important for obtaining re...
research
10/16/2020

Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

As the power of FDR control methods for high-dimensional variable select...
research
02/03/2021

Splitting strategies for post-selection inference

We consider the problem of providing valid inference for a selected para...

Please sign up or login with your details

Forgot password? Click here to reset