Improving Group Lasso for high-dimensional categorical data

10/25/2022
by   Szymon Nowakowski, et al.
0

Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm for selection continuous or categorical variables, but all estimates related to a selected factor usually differ. Therefore, a fitted model may not be sparse, which makes the model interpretation difficult. To obtain a sparse solution of the Group Lasso we propose the following two-step procedure: first, we reduce data dimensionality using the Group Lasso; then to choose the final model we use an information criterion on a small family of models prepared by clustering levels of individual factors. We investigate selection correctness of the algorithm in a sparse high-dimensional scenario. We also test our method on synthetic as well as real datasets and show that it performs better than the state of the art algorithms with respect to the prediction accuracy or model dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2021

Group Lasso merger for sparse prediction with high-dimensional categorical data

Sparse prediction with categorical data is challenging even for a modera...
research
05/16/2023

Sparse-group SLOPE: adaptive bi-level selection with FDR-control

In this manuscript, a new high-dimensional approach for simultaneous var...
research
11/04/2019

Quantile regression: a penalization approach

Sparse group LASSO (SGL) is a penalization technique used in regression ...
research
10/18/2020

Prediction of daily maximum ozone levels using Lasso sparse modeling method

This paper applies modern statistical methods in the prediction of the n...
research
06/03/2022

Multivariate Sparse Group Lasso Joint Model for Radiogenomics Data

Radiogenomics is an emerging field in cancer research that combines medi...
research
03/25/2015

Stable Feature Selection from Brain sMRI

Neuroimage analysis usually involves learning thousands or even millions...
research
04/02/2022

Structural randomised selection

An important problem in the analysis of high-dimensional omics data is t...

Please sign up or login with your details

Forgot password? Click here to reset