Regression-based heterogeneity analysis to identify overlapping subgroup structure in high-dimensional data

11/28/2022
by   Ziye Luo, et al.
0

Heterogeneity is a hallmark of complex diseases. Regression-based heterogeneity analysis, which is directly concerned with outcome-feature relationships, has led to a deeper understanding of disease biology. Such an analysis identifies the underlying subgroup structure and estimates the subgroup-specific regression coefficients. However, most of the existing regression-based heterogeneity analyses can only address disjoint subgroups; that is, each sample is assigned to only one subgroup. In reality, some samples have multiple labels, for example, many genes have several biological functions, and some cells of pure cell types transition into other types over time, which suggest that their outcome-feature relationships (regression coefficients) can be a mixture of relationships in more than one subgroups, and as a result, the disjoint subgrouping results can be unsatisfactory. To this end, we develop a novel approach to regression-based heterogeneity analysis, which takes into account possible overlaps between subgroups and high data dimensions. A subgroup membership vector is introduced for each sample, which is combined with a loss function. Considering the lack of information arising from small sample sizes, an l_2 norm penalty is developed for each membership vector to encourage similarity in its elements. A sparse penalization is also applied for regularized estimation and feature selection. Extensive simulations demonstrate its superiority over direct competitors. The analysis of Cancer Cell Line Encyclopedia data and lung cancer data from The Cancer Genome Atlas shows that the proposed approach can identify an overlapping subgroup structure with favorable performance in prediction and stability.

READ FULL TEXT

page 15

page 18

page 31

page 33

research
11/28/2022

Robust structured heterogeneity analysis approach for high-dimensional data

Revealing relationships between genes and disease phenotypes is a critic...
research
08/07/2023

Regulation-incorporated Gene Expression Network-based Heterogeneity Analysis

Gene expression-based heterogeneity analysis has been extensively conduc...
research
11/30/2022

Biomarker-guided heterogeneity analysis of genetic regulations via multivariate sparse fusion

Heterogeneity is a hallmark of many complex diseases. There are multiple...
research
03/19/2020

Weighted Cox regression for the prediction of heterogeneous patient subgroups

An important task in clinical medicine is the construction of risk predi...
research
07/02/2023

Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

This paper presents a novel approach that leverages domain variability t...
research
07/21/2020

Outcome-Guided Disease Subtyping for High-Dimensional Omics Data

High-throughput microarray and sequencing technology have been used to i...
research
09/16/2017

Multivariate Gaussian Network Structure Learning

We consider a graphical model where a multivariate normal vector is asso...

Please sign up or login with your details

Forgot password? Click here to reset