Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses

11/22/2017
by   Xiang Liu, et al.
0

We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taken on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and prolific response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in a wealth of false discoveries when Lasso or its variants are naïvely applied. Therefore, the research interest of developing effective confounder correction methods is growing. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among the prolific response variables. To fully improve current variable selection methods, we introduce a model that can utilize the dependency information from multiple responses to select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods.

READ FULL TEXT
research
12/26/2022

Bayesian indicator variable selection of multivariate response with heterogeneous sparsity for multi-trait fine mapping

Variable selection has been played a critical role in contemporary stati...
research
11/06/2022

Iterative variable selection for high-dimensional data with binary outcomes

We propose an iterative variable selection scheme for high-dimensional d...
research
06/18/2018

Variable Importance Assessments and Backward Variable Selection for High-Dimensional Data

Variable selection in high-dimensional scenarios is of great interested ...
research
05/30/2009

A Minimum Description Length Approach to Multitask Feature Selection

Many regression problems involve not one but several response variables ...
research
05/07/2022

Determination of class-specific variables in nonparametric multiple-class classification

As technology advanced, collecting data via automatic collection devices...
research
09/30/2022

Experts in the Loop: Conditional Variable Selection for Accelerating Post-Silicon Analysis Based on Deep Learning

Post-silicon validation is one of the most critical processes in modern ...
research
08/23/2019

Fusing heterogeneous data sets

In systems biology, it is common to measure biochemical entities at diff...

Please sign up or login with your details

Forgot password? Click here to reset