High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

01/05/2022
by   Saptarshi Roy, et al.
0

We study the problem of exact support recovery for high-dimensional sparse linear regression when the signals are weak, rare and possibly heterogeneous. Specifically, we fix the minimum signal magnitude at the information-theoretic optimal rate and investigate the asymptotic selection accuracy of best subset selection (BSS) and marginal screening (MS) procedures under independent Gaussian design. Despite of the ideal setup, somewhat surprisingly, marginal screening can fail to achieve exact recovery with probability converging to one in the presence of heterogeneous signals, whereas BSS enjoys model consistency whenever the minimum signal strength is above the information-theoretic threshold. To mitigate the computational issue of BSS, we also propose a surrogate two-stage algorithm called ETS (Estimate Then Screen) based on iterative hard thresholding and gradient coordinate screening, and we show that ETS shares exactly the same asymptotic optimality in terms of exact recovery as BSS. Finally, we present a simulation study comparing ETS with LASSO and marginal screening. The numerical results echo with our asymptotic theory even for realistic values of the sample size, dimension and sparsity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/03/2020

When is best subset selection the "best"?

Best subset selection (BSS) is fundamental in statistics and machine lea...
research
04/29/2012

Optimality of Graphlet Screening in High Dimensional Variable Selection

Consider a linear regression model where the design matrix X has n rows ...
research
02/08/2019

Penalized linear regression with high-dimensional pairwise screening

In variable selection, most existing screening methods focus on marginal...
research
07/14/2021

On the early solution path of best subset selection

The early solution path, which tracks the first few variables that enter...
research
03/22/2019

On the support recovery of marginal regression

Leading methods for support recovery in high-dimensional regression, suc...
research
02/23/2014

Exact Post Model Selection Inference for Marginal Screening

We develop a framework for post model selection inference, via marginal ...
research
02/23/2023

Variable selection in linear regression models: choosing the best subset is not always the best choice

Variable selection in linear regression settings is a much discussed pro...

Please sign up or login with your details

Forgot password? Click here to reset