Group Lasso merger for sparse prediction with high-dimensional categorical data

12/21/2021
by   Szymon Nowakowski, et al.
0

Sparse prediction with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm for selection continuous or categorical variables, but all estimates related to a selected factor usually differ, so a fitted model may not be sparse. To make the Group Lasso solution sparse, we propose to merge levels of the selected factor, if a difference between its corresponding estimates is less than some predetermined threshold. We prove that under weak conditions our algorithm, called GLAMER for Group LAsso MERger, recovers the true, sparse linear or logistic model even for the high-dimensional scenario, that is when a number of parameters is greater than a learning sample size. To our knowledge, selection consistency has been proven many times for different algorithms fitting sparse models with categorical variables, but our result is the first for the high-dimensional scenario. Numerical experiments show the satisfactory performance of the GLAMER.

READ FULL TEXT
research
10/25/2022

Improving Group Lasso for high-dimensional categorical data

Sparse modelling or model selection with categorical data is challenging...
research
12/20/2022

Simultaneous Factors Selection and Fusion of Their Levels in Penalized Logistic Regression

Nowadays, several data analysis problems require for complexity reductio...
research
03/11/2016

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for varia...
research
05/17/2018

A Note on Coding and Standardization of Categorical Variables in (Sparse) Group Lasso Regression

Categorical regressor variables are usually handled by introducing a set...
research
05/16/2023

Sparse-group SLOPE: adaptive bi-level selection with FDR-control

In this manuscript, a new high-dimensional approach for simultaneous var...
research
12/31/2020

Inference post Selection of Group-sparse Regression Models

Conditional inference provides a rigorous approach to counter bias when ...
research
01/29/2015

High-Dimensional Longitudinal Classification with the Multinomial Fused Lasso

We study regularized estimation in high-dimensional longitudinal classif...

Please sign up or login with your details

Forgot password? Click here to reset