Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso

07/27/2017
by   Trevor Hastie, et al.
0

In exciting new work, Bertsimas et al. (2016) showed that the classical best subset selection problem in regression modeling can be formulated as a mixed integer optimization (MIO) problem. Using recent advances in MIO algorithms, they demonstrated that best subset selection can now be solved at much larger problem sizes that what was thought possible in the statistics community. They presented empirical comparisons of best subset selection with other popular variable selection procedures, in particular, the lasso and forward stepwise selection. Surprisingly (to us), their simulations suggested that best subset selection consistently outperformed both methods in terms of prediction accuracy. Here we present an expanded set of simulations to shed more light on these comparisons. The summary is roughly as follows: (a) neither best subset selection nor the lasso uniformly dominate the other, with best subset selection generally performing better in high signal-to-noise (SNR) ratio regimes, and the lasso better in low SNR regimes; (b) best subset selection and forward stepwise perform quite similarly throughout; (c) the relaxed lasso (actually, a simplified version of the original relaxed estimator defined in Meinshausen, 2007) is the overall winner, performing just about as well as the lasso in low SNR scenarios, and as well as best subset selection in high SNR scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2017

Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

We study the behavior of a fundamental tool in sparse statistical modeli...
research
02/23/2023

Variable selection in linear regression models: choosing the best subset is not always the best choice

Variable selection in linear regression settings is a much discussed pro...
research
05/05/2022

COMBSS: Best Subset Selection via Continuous Optimization

We consider the problem of best subset selection in linear regression, w...
research
10/27/2022

Exhuming nonnegative garrote from oblivion using suitable initial estimates- illustration in low and high-dimensional real data

The nonnegative garrote (NNG) is among the first approaches that combine...
research
05/07/2023

Provable Identifiability of Two-Layer ReLU Neural Networks via LASSO Regularization

LASSO regularization is a popular regression tool to enhance the predict...
research
12/05/2011

On best subset regression

In this paper we discuss the variable selection method from ℓ0-norm cons...
research
06/11/2020

Probabilistic Best Subset Selection via Gradient-Based Optimization

In high-dimensional statistics, variable selection is an optimization pr...

Please sign up or login with your details

Forgot password? Click here to reset