Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

by   Antoine Dedieu, et al.

We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) ℓ_0-regularized problems at scales much larger than what was conventionally considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to relatively mature algorithms based on ℓ_1-regularization and relatives. We aim to bridge this computational gap by developing new MIP-based algorithms for ℓ_0-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle p≈ 50,000 features in a few minutes, and approximate algorithms that can address instances with p≈ 10^6 in times comparable to fast ℓ_1-based algorithms. Our exact algorithm is based on the novel idea of integrality generation, which solves the original problem (with p binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of ℓ_0-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially, variable selection) when compared to competing toolkits.


page 1

page 2

page 3

page 4


Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

We present a new algorithmic framework for grouped variable selection th...

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

We consider the canonical L_0-regularized least squares problem (aka bes...

Sparse PCA: A New Scalable Estimator Based On Integer Programming

We consider the Sparse Principal Component Analysis (SPCA) problem under...

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

We consider the least squares regression problem, penalized with a combi...

Learning Mixed-Integer Linear Programs from Contextual Examples

Mixed-integer linear programs (MILPs) are widely used in artificial inte...

Bayesian Network Learning via Topological Order

We propose a mixed integer programming (MIP) model and iterative algorit...

The CCP Selector: Scalable Algorithms for Sparse Ridge Regression from Chance-Constrained Programming

Sparse regression and variable selection for large-scale data have been ...