When is best subset selection the "best"?

07/03/2020
by   Jianqing Fan, et al.
0

Best subset selection (BSS) is fundamental in statistics and machine learning. Despite the intensive studies of it, the fundamental question of when BSS is truly the "best", namely yielding the oracle estimator, remains partially answered. In this paper, we address this important issue by giving a weak sufficient condition and a strong necessary condition for BSS to exactly recover the true model. We also give a weak sufficient condition for BSS to achieve the sure screening property. On the optimization aspect, we find that the exact combinatorial minimizer for BSS is unnecessary: all the established statistical properties for the best subset carry over to any sparse model whose residual sum of squares is close enough to that of the best subset. In particular, we show that an iterative hard thresholding (IHT) algorithm can find a sparse subset with the sure screening property within logarithmic steps; another round of BSS within this set can recover the true model. The simulation studies and real data examples show that IHT yields lower false discovery rates and higher true positive rates than the competing approaches including LASSO, SCAD and SIS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/14/2021

On the early solution path of best subset selection

The early solution path, which tracks the first few variables that enter...
research
01/05/2022

High-dimensional variable selection with heterogeneous signals: A precise asymptotic perspective

We study the problem of exact support recovery for high-dimensional spar...
research
01/16/2023

Tale of two c(omplex)ities

For decades, best subset selection (BSS) has eluded statisticians mainly...
research
12/04/2012

Better subset regression

To find efficient screening methods for high dimensional linear regressi...
research
06/04/2023

Optimal neighbourhood selection in structural equation models

We study the optimal sample complexity of neighbourhood selection in lin...
research
06/11/2020

Probabilistic Best Subset Selection via Gradient-Based Optimization

In high-dimensional statistics, variable selection is an optimization pr...
research
06/22/2021

Sparsistent Model Discovery

Discovering the partial differential equations underlying a spatio-tempo...

Please sign up or login with your details

Forgot password? Click here to reset