The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

07/01/2020
by   Hua Wang, et al.
0

In high-dimensional linear regression, would increasing effect sizes always improve model selection, while maintaining all the other conditions unchanged (especially fixing the sparsity of regression coefficients)? In this paper, we answer this question in the negative in the regime of linear sparsity for the Lasso method, by introducing a new notion we term effect size heterogeneity. Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the "competition" among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.

READ FULL TEXT

page 3

page 23

research
11/05/2015

False Discoveries Occur Early on the Lasso Path

In regression settings where explanatory variables have very low correla...
research
05/27/2021

Characterizing the SLOPE Trade-off: A Variational Perspective and the Donoho-Tanner Limit

Sorted l1 regularization has been incorporated into many methods for sol...
research
08/10/2017

When Does the First Spurious Variable Get Selected by Sequential Regression Procedures?

Applied statisticians use sequential regression procedures to produce a ...
research
05/07/2020

High-Dimensional Inference Based on the Leave-One-Covariate-Out LASSO Path

We propose a new measure of variable importance in high-dimensional regr...
research
09/15/2023

Heteroscedastic sparse high-dimensional linear regression with a partitioned empirical Bayes ECM algorithm

Sparse linear regression methods for high-dimensional data often assume ...
research
09/01/2023

Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

High-dimensional linear regression is important in many scientific field...
research
09/18/2019

Evaluating Effects of Tuition Fees: Lasso for the Case of Germany

We study the effect of the introduction of university tuition fees on th...

Please sign up or login with your details

Forgot password? Click here to reset