A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

07/02/2020
by   Chenguang Dai, et al.
0

The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are of interest to perform controlled feature selection in order to simplify the downstream analysis. This paper introduces a new framework for feature selection in GLMs that can achieve false discovery rate (FDR) control in two asymptotic regimes. The key step is to construct a mirror statistic to measure the importance of each feature, which is based upon two (asymptotically) independent estimates of the corresponding true coefficient obtained via either the data-splitting method or the Gaussian mirror method. The FDR control is achieved by taking advantage of the mirror statistic's property that, for any null feature, its sampling distribution is (asymptotically) symmetric about 0. In the moderate-dimensional setting in which the ratio between the dimension (number of features) p and the sample size n converges to a fixed value, we construct the mirror statistic based on the maximum likelihood estimation. In the high-dimensional setting where p is much larger than n, we use the debiased Lasso to build the mirror statistic. Compared to the Benjamini-Hochberg procedure, which crucially relies on the asymptotic normality of the Z statistic, the proposed methodology is scale free as it only hinges on the symmetric property, thus is expected to be more robust in finite-sample cases. Both simulation results and a real data application show that the proposed methods are capable of controlling the FDR, and are often more powerful than existing methods including the Benjamini-Hochberg procedure and the knockoff filter.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/20/2020

False Discovery Rate Control via Data Splitting

Selecting relevant features associated with a given response variable is...
03/09/2022

Error-based Knockoffs Inference for Controlled Feature Selection

Recently, the scheme of model-X knockoffs was proposed as a promising so...
06/30/2021

Whiteout: when do fixed-X knockoffs fail?

A core strength of knockoff methods is their virtually limitless customi...
11/30/2020

Powerful Knockoffs via Minimizing Reconstructability

Model-X knockoffs allows analysts to perform feature selection using alm...
03/20/2019

On approximate validation of models: A Kolmogorov-Smirnov based approach

Classical tests of fit typically reject a model for large enough real da...
08/31/2017

RANK: Large-Scale Inference with Graphical Nonlinear Knockoffs

Power and reproducibility are key to enabling refined scientific discove...
10/16/2020

Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

As the power of FDR control methods for high-dimensional variable select...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.