False Discoveries Occur Early on the Lasso Path

11/05/2015
by   Weijie Su, et al.
0

In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity---meaning that the fraction of variables with a non-vanishing effect tends to a constant, however small---this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2020

The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

In high-dimensional linear regression, would increasing effect sizes alw...
research
08/10/2017

When Does the First Spurious Variable Get Selected by Sequential Regression Procedures?

Applied statisticians use sequential regression procedures to produce a ...
research
12/18/2017

A Power and Prediction Analysis for Knockoffs with Lasso Statistics

Knockoffs is a new framework for controlling the false discovery rate (F...
research
07/30/2020

A Power Analysis for Knockoffs with the Lasso Coefficient-Difference Statistic

In a linear model with possibly many predictors, we consider variable se...
research
03/29/2019

The False Positive Control Lasso

In high dimensional settings where a small number of regressors are expe...
research
07/21/2020

The Complete Lasso Tradeoff Diagram

A fundamental problem in the high-dimensional regression is to understan...
research
10/15/2019

Iterative procedure for network inference

When a network is reconstructed from data, two types of errors can occur...

Please sign up or login with your details

Forgot password? Click here to reset