DeepAI AI Chat
Log In Sign Up

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

by   Niharika Gauraha, et al.

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the original problem. The ACL is a three-stage procedure where, at the first stage, we use the Lasso(or its adaptive or thresholded version) to do initial selection, then we also include those variables which are not selected by the Lasso but are strongly correlated with the variables selected by the Lasso. At the second stage we cluster the variables based on the reduced set of predictors and in the third stage we perform sparse estimation such as Lasso on cluster representatives or the group Lasso based on the structures generated by clustering procedure. We show that our procedure is consistent and efficient in finding true underlying population group structure(under assumption of irrepresentable and beta-min conditions). We also study the group selection consistency of our method and we support the theory using simulated and pseudo-real dataset examples.


page 1

page 2

page 3

page 4


A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

When fitting statistical models, some predictors are often found to be c...

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Penalized regression is an attractive framework for variable selection p...

Grouping effects of sparse CCA models in variable selection

The sparse canonical correlation analysis (SCCA) is a bi-multivariate as...

Ultra high dimensional generalized additive model: Unified Theory and Methods

Generalized additive model is a powerful statistical learning and predic...

A Comparison of Hamming Errors of Representative Variable Selection Methods

Lasso is a celebrated method for variable selection in linear models, bu...

Clustering of variables for enhanced interpretability of predictive models

A new strategy is proposed for building easy to interpret predictive mod...

Group Lasso merger for sparse prediction with high-dimensional categorical data

Sparse prediction with categorical data is challenging even for a modera...