Simultaneous Factors Selection and Fusion of Their Levels in Penalized Logistic Regression

by   Lea Kaufmann, et al.

Nowadays, several data analysis problems require for complexity reduction, mainly meaning that they target at removing the non-influential covariates from the model and at delivering a sparse model. When categorical covariates are present, with their levels being dummy coded, the number of parameters included in the model grows rapidly, fact that emphasizes the need for reducing the number of parameters to be estimated. In this case, beyond variable selection, sparsity is also achieved through fusion of levels of covariates which do not differentiate significantly in terms of their influence on the response variable. In this work a new regularization technique is introduced, called L_0-Fused Group Lasso (L_0-FGL) for binary logistic regression. It uses a group lasso penalty for factor selection and for the fusion part it applies an L_0 penalty on the differences among the levels' parameters of a categorical predictor. Using adaptive weights, the adaptive version of L_0-FGL method is derived. Theoretical properties, such as the existence, √(n) consistency and oracle properties under certain conditions, are established. In addition, it is shown that even in the diverging case where the number of parameters p_n grows with the sample size n, √(n) consistency and a consistency in variable selection result are achieved. Two computational methods, PIRLS and a block coordinate descent (BCD) approach using quasi Newton, are developed and implemented. A simulation study supports that L_0-FGL shows an outstanding performance, especially in the high dimensional case.


page 1

page 2

page 3

page 4


Group Lasso merger for sparse prediction with high-dimensional categorical data

Sparse prediction with categorical data is challenging even for a modera...

Modelling High-Dimensional Categorical Data Using Nonconvex Fusion Penalties

We propose a method for estimation in high-dimensional linear models wit...

Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression

In traditional logistic regression models, the link function is often as...

The CDF penalty:sparse and quasi unbiased estimation in regression models

In high-dimensional regression modelling, the number of candidate covari...

A Note on Coding and Standardization of Categorical Variables in (Sparse) Group Lasso Regression

Categorical regressor variables are usually handled by introducing a set...

A regularization approach for stable estimation of loss development factors

In this article, we show that a new penalty function, which we call log-...

A Sparse β-Model with Covariates for Networks

Data in the form of networks are increasingly encountered in modern scie...

Please sign up or login with your details

Forgot password? Click here to reset