When Does the First Spurious Variable Get Selected by Sequential Regression Procedures?

08/10/2017
by   Weijie Su, et al.
0

Applied statisticians use sequential regression procedures to produce a ranking of explanatory variables and, in settings of low correlations between variables and strong true effect sizes, expect that variables at the very top of this ranking are true. In a regime of certain sparsity levels, however, three examples of sequential procedures---forward stepwise, the lasso, and least angle regression---are shown to include the first spurious variable unexpectedly early. We derive a rigorous, sharp prediction of the rank of the first spurious variable for the three procedures, demonstrating that the first spurious variable occurs earlier and earlier as the regression coefficients get denser. This counterintuitive phenomenon persists for independent Gaussian random designs and an arbitrarily large magnitude of the true effects. We further gain a better understanding of the phenomenon by identifying the underlying cause and then leverage the insights to introduce a simple visualization tool termed the "double-ranking diagram" to improve on sequential methods. As a byproduct of these findings, we obtain the first provable result certifying the exact equivalence between the lasso and least angle regression in the early stages of solution paths beyond orthogonal designs. This equivalence can seamlessly carry over many important model selection results concerning the lasso to least angle regression.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2015

False Discoveries Occur Early on the Lasso Path

In regression settings where explanatory variables have very low correla...
research
07/01/2020

The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

In high-dimensional linear regression, would increasing effect sizes alw...
research
06/16/2018

Post-Lasso Inference for High-Dimensional Regression

Among the most popular variable selection procedures in high-dimensional...
research
07/21/2020

The Complete Lasso Tradeoff Diagram

A fundamental problem in the high-dimensional regression is to understan...
research
06/28/2019

Multiple Testing and Variable Selection along Least Angle Regression's path

In this article we investigate the outcomes of the standard Least Angle ...
research
02/07/2008

Least angle and ℓ_1 penalized regression: A review

Least Angle Regression is a promising technique for variable selection a...
research
03/22/2022

Pattern recovery by SLOPE

LASSO and SLOPE are two popular methods for dimensionality reduction in ...

Please sign up or login with your details

Forgot password? Click here to reset