Feature-specific inference for penalized regression using local false discovery rates

09/14/2018
by   Ryan Miller, et al.
0

Penalized regression methods, most notably the lasso, are a popular approach to analyzing high-dimensional data. An attractive property of the lasso is that it naturally performs variable selection. An important area of concern, however, is the reliability of these variable selections. Motivated by local false discovery rate methodology from the large-scale hypothesis testing literature, we propose a method for calculating a local false discovery rate for each variable under consideration by the lasso model. These rates can be used to assess the reliability of an individual feature, or to estimate the model's overall false discovery rate. The method can be used for all values of λ. This is particularly useful for models with a few highly significant features but a high overall Fdr, which are a relatively common occurrence when using cross validation to select λ. It is also flexible enough to be applied to many varieties of penalized likelihoods including GLM and Cox models, and a variety of penalties, including MCP and SCAD. We demonstrate the validity of this approach and contrast it with other inferential methods for penalized regression as well as with local false discovery rates for univariate hypothesis tests. Finally, we show the practical utility of our method by applying it to two case studies involving high dimensional genetic data.

READ FULL TEXT

page 2

page 13

research
06/06/2022

Local False Discovery Rate Estimation with Competition-Based Procedures for Variable Selection

Multiple hypothesis testing has been widely applied to problems dealing ...
research
04/09/2007

High-dimensional variable selection

This paper explores the following question: what kind of statistical gua...
research
01/12/2014

Inference in High Dimensions with the Penalized Score Test

In recent years, there has been considerable theoretical development reg...
research
10/13/2020

Spike-and-Slab Meets LASSO: A Review of the Spike-and-Slab LASSO

High-dimensional data sets have become ubiquitous in the past few decade...
research
08/27/2021

Multiple Hypothesis Testing Framework for Spatial Signals

The problem of identifying regions of spatially interesting, different o...
research
02/07/2020

Subsampling Winner Algorithm for Feature Selection in Large Regression Data

Feature selection from a large number of covariates (aka features) in a ...
research
05/30/2023

Identifying the Complete Correlation Structure in Large-Scale High-Dimensional Data Sets with Local False Discovery Rates

The identification of the dependent components in multiple data sets is ...

Please sign up or login with your details

Forgot password? Click here to reset