Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

03/05/2018
by   Hussein Hazimeh, et al.
1

We consider the canonical L_0-regularized least squares problem (aka best subsets) which is generally perceived as a `gold-standard' for many sparse learning regimes. In spite of worst-case computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain near-optimal solutions to this problem for instances where the number of features p ≈ 10^3. While these methods lead to estimators with excellent statistical properties, often there is a price to pay in terms of a steep increase in computation times, especially when compared to highly efficient popular algorithms for sparse learning (e.g., based on L_1-regularization) that scale to much larger problem sizes. Bridging this gap is a main goal of this paper. We study the computational aspects of a family of L_0-regularized least squares problems with additional convex penalties. We propose a hierarchy of necessary optimality conditions for these problems. We develop new algorithms, based on coordinate descent and local combinatorial optimization schemes, and study their convergence properties. We demonstrate that the choice of an algorithm determines the quality of solutions obtained; and local combinatorial optimization-based algorithms generally result in solutions of superior quality. We show empirically that our proposed framework is relatively fast for problem instances with p≈ 10^6 and works well, in terms of both optimization and statistical properties (e.g., prediction, estimation, and variable selection), compared to simpler heuristic algorithms. A version of our algorithm reaches up to a three-fold speedup (with p up to 10^6) when compared to state-of-the-art schemes for sparse learning such as glmnet and ncvreg.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2020

Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

We consider a discrete optimization based approach for learning sparse c...
research
04/14/2021

Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

We present a new algorithmic framework for grouped variable selection th...
research
08/08/2015

The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

We propose a novel high-dimensional linear regression estimator: the Dis...
research
12/23/2014

Pathwise Coordinate Optimization for Sparse Learning: Algorithm and Theory

The pathwise coordinate optimization is one of the most important comput...
research
04/17/2020

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

We consider the least squares regression problem, penalized with a combi...
research
02/05/2019

Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

In many learning settings, it is beneficial to augment the main features...
research
11/18/2017

Scalable Relaxations of Sparse Packing Constraints: Optimal Biocontrol in Predator-Prey Network

Cascades represent rapid changes in networks. A cascading phenomenon of ...

Please sign up or login with your details

Forgot password? Click here to reset