P-values for high-dimensional regression

11/13/2008
by   Nicolai Meinshausen, et al.
0

Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2018

False Discovery Rate Control via Debiased Lasso

We consider the problem of variable selection in high-dimensional statis...
research
12/31/2017

Estimation and Inference of Treatment Effects with L_2-Boosting in High-Dimensional Settings

Boosting algorithms are very popular in Machine Learning and have proven...
research
05/30/2022

Derandomized knockoffs: leveraging e-values for false discovery rate control

Model-X knockoffs is a flexible wrapper method for high-dimensional regr...
research
10/12/2021

The Terminating-Knockoff Filter: Fast High-Dimensional Variable Selection with False Discovery Rate Control

We propose the Terminating-Knockoff (T-Knock) filter, a fast variable se...
research
09/17/2019

Variable selection with false discovery rate control in deep neural networks

Deep neural networks (DNNs) are famous for their high prediction accurac...
research
03/30/2021

Controlling the False Discovery Rate in Structural Sparsity: Split Knockoffs

Controlling the False Discovery Rate (FDR) in a variable selection proce...

Please sign up or login with your details

Forgot password? Click here to reset