A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

by   Chenguang Dai, et al.

The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are of interest to perform controlled feature selection in order to simplify the downstream analysis. This paper introduces a new framework for feature selection in GLMs that can achieve false discovery rate (FDR) control in two asymptotic regimes. The key step is to construct a mirror statistic to measure the importance of each feature, which is based upon two (asymptotically) independent estimates of the corresponding true coefficient obtained via either the data-splitting method or the Gaussian mirror method. The FDR control is achieved by taking advantage of the mirror statistic's property that, for any null feature, its sampling distribution is (asymptotically) symmetric about 0. In the moderate-dimensional setting in which the ratio between the dimension (number of features) p and the sample size n converges to a fixed value, we construct the mirror statistic based on the maximum likelihood estimation. In the high-dimensional setting where p is much larger than n, we use the debiased Lasso to build the mirror statistic. Compared to the Benjamini-Hochberg procedure, which crucially relies on the asymptotic normality of the Z statistic, the proposed methodology is scale free as it only hinges on the symmetric property, thus is expected to be more robust in finite-sample cases. Both simulation results and a real data application show that the proposed methods are capable of controlling the FDR, and are often more powerful than existing methods including the Benjamini-Hochberg procedure and the knockoff filter.



There are no comments yet.


page 1

page 2

page 3

page 4


False Discovery Rate Control via Data Splitting

Selecting relevant features associated with a given response variable is...

Error-based Knockoffs Inference for Controlled Feature Selection

Recently, the scheme of model-X knockoffs was proposed as a promising so...

Whiteout: when do fixed-X knockoffs fail?

A core strength of knockoff methods is their virtually limitless customi...

Powerful Knockoffs via Minimizing Reconstructability

Model-X knockoffs allows analysts to perform feature selection using alm...

On approximate validation of models: A Kolmogorov-Smirnov based approach

Classical tests of fit typically reject a model for large enough real da...

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

Power and reproducibility are key to enabling refined scientific discove...

Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

As the power of FDR control methods for high-dimensional variable select...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.