A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

03/24/2021
by   Zhiyuan Li, et al.
0

When fitting statistical models, some predictors are often found to be correlated with each other, and functioning together. Many group variable selection methods are developed to select the groups of predictors that are closely related to the continuous or categorical response. These existing methods usually assume the group structures are well known. For example, variables with similar practical meaning, or dummy variables created by categorical data. However, in practice, it is impractical to know the exact group structure, especially when the variable dimensional is large. As a result, the group variable selection results may be selected. To solve the challenge, we propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem. The variable clustering stage uses information from the data to find a group structure, which improves the performance of the existing group variable selection methods. For ultrahigh dimensional data, where the predictors are much larger than observations, we incorporated a variable screening method in the first stage and shows the advantages of such an approach. In this article, we compared and discussed the performance of four existing group variable selection methods under different simulation models, with and without the variable clustering stage. The two-stage method shows a better performance, in terms of the prediction accuracy, as well as in the accuracy to select active predictors. An athlete's data is also used to show the advantages of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/11/2016

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for varia...
research
06/09/2023

Variable screening using factor analysis for high-dimensional data with multicollinearity

Screening methods are useful tools for variable selection in regression ...
research
10/27/2020

Sequential knockoffs for continuous and categorical predictors: with application to a large Psoriatic Arthritis clinical trial pool

Knockoffs provide a general framework for controlling the false discover...
research
07/01/2023

A Transparent and Nonlinear Method for Variable Selection

Variable selection is a procedure to attain the truly important predicto...
research
07/20/2020

Variable Selection in Macroeconomic Forecasting with Many Predictors

In the data-rich environment, using many economic predictors to forecast...
research
06/17/2020

FREEtree: A Tree-based Approach for High Dimensional Longitudinal Data With Correlated Features

This paper proposes FREEtree, a tree-based method for high dimensional l...
research
10/14/2022

Variable Importance Based Interaction Modeling with an Application on Initial Spread of COVID-19 in China

Interaction selection for linear regression models with both continuous ...

Please sign up or login with your details

Forgot password? Click here to reset