Efficient Regularized Regression for Variable Selection with L0 Penalty

07/28/2014
by   Zhenqiu Liu, et al.
0

Variable (feature, gene, model, which we use interchangeably) selections for regression with high-dimensional BIGDATA have found many applications in bioinformatics, computational biology, image processing, and engineering. One appealing approach is the L0 regularized regression which penalizes the number of nonzero features in the model directly. L0 is known as the most essential sparsity measure and has nice theoretical properties, while the popular L1 regularization is only a best convex relaxation of L0. Therefore, it is natural to expect that L0 regularized regression performs better than LASSO. However, it is well-known that L0 optimization is NP-hard and computationally challenging. Instead of solving the L0 problems directly, most publications so far have tried to solve an approximation problem that closely resembles L0 regularization. In this paper, we propose an efficient EM algorithm (L0EM) that directly solves the L0 optimization problem. L_0EM is efficient with high dimensional data. It also provides a natural solution to all Lp p in [0,2] problems. The regularized parameter can be either determined through cross-validation or AIC and BIC. Theoretical properties of the L0-regularized estimator are given under mild conditions that permit the number of variables to be much larger than the sample size. We demonstrate our methods through simulation and high-dimensional genomic data. The results indicate that L0 has better performance than LASSO and L0 with AIC or BIC has similar performance as computationally intensive cross-validation. The proposed algorithms are efficient in identifying the non-zero variables with less-bias and selecting biologically important genes and pathways with high dimensional BIGDATA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/11/2008

On the ℓ_1-ℓ_q Regularized Regression

In this paper we consider the problem of grouped variable selection in h...
research
06/27/2012

A Dantzig Selector Approach to Temporal Difference Learning

LSTD is a popular algorithm for value function approximation. Whenever t...
research
04/10/2021

Analytic and Bootstrap-after-Cross-Validation Methods for Selecting Penalty Parameters of High-Dimensional M-Estimators

We develop two new methods for selecting the penalty parameter for the ℓ...
research
02/18/2020

Estimating the Penalty Level of ℓ_1-minimization via Two Gaussian Approximation Methods

In this paper, we aim to give a theoretical approximation for the penalt...
research
05/11/2016

High dimensional thresholded regression and shrinkage effect

High-dimensional sparse modeling via regularization provides a powerful ...
research
07/19/2019

Reluctant Interaction Modeling

Including pairwise interactions between the predictors of a regression m...
research
10/07/2021

Solving Multistage Stochastic Linear Programming via Regularized Linear Decision Rules: An Application to Hydrothermal Dispatch Planning

The solution of multistage stochastic linear problems (MSLP) represents ...

Please sign up or login with your details

Forgot password? Click here to reset