A Note on Coding and Standardization of Categorical Variables in (Sparse) Group Lasso Regression

05/17/2018
by   Felicitas J. Detmer, et al.
0

Categorical regressor variables are usually handled by introducing a set of indicator variables, and imposing a linear constraint to ensure identifiability in the presence of an intercept, or equivalently, using one of various coding schemes. As proposed in Yuan and Lin [J. R. Statist. Soc. B, 68 (2006), 49-67], the group lasso is a natural and computationally convenient approach to perform variable selection in settings with categorical covariates. As pointed out by Simon and Tibshirani [Stat. Sin., 22 (2011), 983-1001], "standardization" by means of block-wise orthonormalization of column submatrices each corresponding to one group of variables can substantially boost performance. In this note, we study the aspect of standardization for the special case of categorical predictors in detail. The main result is that orthonormalization is not required; column-wise scaling of the design matrix followed by re-scaling and centering of the coefficients is shown to have exactly the same effect. Similar reductions can be achieved in the case of interactions. The extension to the so-called sparse group lasso, which additionally promotes within-group sparsity, is considered as well. The importance of proper standardization is illustrated via extensive simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2022

Sparse-group boosting – Unbiased group and variable selection

In the presence of grouped covariates, we propose a framework for boosti...
research
11/16/2016

ROS Regression: Integrating Regularization and Optimal Scaling Regression

In this paper we combine two important extensions of ordinary least squa...
research
12/21/2021

Group Lasso merger for sparse prediction with high-dimensional categorical data

Sparse prediction with categorical data is challenging even for a modera...
research
12/20/2022

Simultaneous Factors Selection and Fusion of Their Levels in Penalized Logistic Regression

Nowadays, several data analysis problems require for complexity reductio...
research
03/29/2019

Variable selection and estimation in multivariate functional linear regression via the lasso

In more and more applications, a quantity of interest may depend on seve...
research
11/04/2021

Nonparametric Regression and Classification with Functional, Categorical, and Mixed Covariates

We consider nonparametric prediction with multiple covariates, in partic...
research
04/07/2017

A Brief Introduction to the Temporal Group LASSO and its Potential Applications in Healthcare

The Temporal Group LASSO is an example of a multi-task, regularized regr...

Please sign up or login with your details

Forgot password? Click here to reset