Tale of two c(omplex)ities

01/16/2023
by   Saptarshi Roy, et al.
0

For decades, best subset selection (BSS) has eluded statisticians mainly due to its computational bottleneck. However, until recently, modern computational breakthroughs have rekindled theoretical interest in BSS and have led to new findings. Recently, Guo et al. (2020) showed that the model selection performance of BSS is governed by a margin quantity that is robust to the design dependence, unlike modern methods such as LASSO, SCAD, MCP, etc. Motivated by their theoretical results, in this paper, we also study the variable selection properties of best subset selection for high-dimensional sparse linear regression setup. We show that apart from the identifiability margin, the following two complexity measures play a fundamental role in characterizing the margin condition for model consistency: (a) complexity of residualized features, (b) complexity of spurious projections. In particular, we establish a simple margin condition that only depends only on the identifiability margin quantity and the dominating one of the two complexity measures. Furthermore, we show that a similar margin condition depending on similar margin quantity and complexity measures is also necessary for model consistency of BSS. For a broader understanding of the complexity measures, we also consider some simple illustrative examples to demonstrate the variation in the complexity measures which broadens our theoretical understanding of the model selection performance of BSS under different correlation structures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2021

On the early solution path of best subset selection

The early solution path, which tracks the first few variables that enter...
research
07/03/2020

When is best subset selection the "best"?

Best subset selection (BSS) is fundamental in statistics and machine lea...
research
02/23/2023

Variable selection in linear regression models: choosing the best subset is not always the best choice

Variable selection in linear regression settings is a much discussed pro...
research
04/18/2008

Margin-adaptive model selection in statistical learning

A classical condition for fast learning rates is the margin condition, f...
research
06/04/2023

Optimal neighbourhood selection in structural equation models

We study the optimal sample complexity of neighbourhood selection in lin...
research
01/30/2013

A note on selection stability: combining stability and prediction

Recently, many regularized procedures have been proposed for variable se...
research
08/15/2021

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

Bregman proximal point algorithm (BPPA), as one of the centerpieces in t...

Please sign up or login with your details

Forgot password? Click here to reset