Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

01/17/2020
by   Antoine Dedieu, et al.
8

We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) ℓ_0-regularized problems at scales much larger than what was conventionally considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to relatively mature algorithms based on ℓ_1-regularization and relatives. We aim to bridge this computational gap by developing new MIP-based algorithms for ℓ_0-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle p≈ 50,000 features in a few minutes, and approximate algorithms that can address instances with p≈ 10^6 in times comparable to fast ℓ_1-based algorithms. Our exact algorithm is based on the novel idea of integrality generation, which solves the original problem (with p binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of ℓ_0-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially, variable selection) when compared to competing toolkits.

READ FULL TEXT

page 1

page 2

page 3

page 4

04/14/2021

Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

We present a new algorithmic framework for grouped variable selection th...
03/05/2018

Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

We consider the canonical L_0-regularized least squares problem (aka bes...
09/23/2021

Sparse PCA: A New Scalable Estimator Based On Integer Programming

We consider the Sparse Principal Component Analysis (SPCA) problem under...
04/17/2020

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

We consider the least squares regression problem, penalized with a combi...
07/15/2021

Learning Mixed-Integer Linear Programs from Contextual Examples

Mixed-integer linear programs (MILPs) are widely used in artificial inte...
01/20/2017

Bayesian Network Learning via Topological Order

We propose a mixed integer programming (MIP) model and iterative algorit...
06/11/2018

The CCP Selector: Scalable Algorithms for Sparse Ridge Regression from Chance-Constrained Programming

Sparse regression and variable selection for large-scale data have been ...