A Scale-free Approach for False Discovery Rate Control in Generalized Linear Models

07/02/2020
by   Chenguang Dai, et al.
0

The generalized linear models (GLM) have been widely used in practice to model non-Gaussian response variables. When the number of explanatory features is relatively large, scientific researchers are of interest to perform controlled feature selection in order to simplify the downstream analysis. This paper introduces a new framework for feature selection in GLMs that can achieve false discovery rate (FDR) control in two asymptotic regimes. The key step is to construct a mirror statistic to measure the importance of each feature, which is based upon two (asymptotically) independent estimates of the corresponding true coefficient obtained via either the data-splitting method or the Gaussian mirror method. The FDR control is achieved by taking advantage of the mirror statistic's property that, for any null feature, its sampling distribution is (asymptotically) symmetric about 0. In the moderate-dimensional setting in which the ratio between the dimension (number of features) p and the sample size n converges to a fixed value, we construct the mirror statistic based on the maximum likelihood estimation. In the high-dimensional setting where p is much larger than n, we use the debiased Lasso to build the mirror statistic. Compared to the Benjamini-Hochberg procedure, which crucially relies on the asymptotic normality of the Z statistic, the proposed methodology is scale free as it only hinges on the symmetric property, thus is expected to be more robust in finite-sample cases. Both simulation results and a real data application show that the proposed methods are capable of controlling the FDR, and are often more powerful than existing methods including the Benjamini-Hochberg procedure and the knockoff filter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

False Discovery Rate Control via Data Splitting

Selecting relevant features associated with a given response variable is...
research
01/16/2020

A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models

Feature selection is important for modeling high-dimensional data, where...
research
06/30/2021

Whiteout: when do fixed-X knockoffs fail?

A core strength of knockoff methods is their virtually limitless customi...
research
12/17/2022

Asymptotically Optimal Knockoff Statistics via the Masked Likelihood Ratio

This paper introduces a class of asymptotically most powerful knockoff s...
research
11/30/2020

Powerful Knockoffs via Minimizing Reconstructability

Model-X knockoffs allows analysts to perform feature selection using alm...
research
05/09/2019

Conformal prediction for exponential families and generalized linear models

Conformal prediction methods construct prediction regions for iid data t...
research
10/16/2020

Power of FDR Control Methods: The Impact of Ranking Algorithm, Tampered Design, and Symmetric Statistic

As the power of FDR control methods for high-dimensional variable select...

Please sign up or login with your details

Forgot password? Click here to reset