Robust structured heterogeneity analysis approach for high-dimensional data

11/28/2022
by   Yifan Sun, et al.
0

Revealing relationships between genes and disease phenotypes is a critical problem in biomedical studies. This problem has been challenged by the heterogeneity of diseases. Patients of a perceived same disease may form multiple subgroups, and different subgroups have distinct sets of important genes. It is hence imperative to discover the latent subgroups and reveal the subgroup-specific important genes. Some heterogeneity analysis methods have been proposed in recent literature. Despite considerable successes, most of the existing studies are still limited as they cannot accommodate data contamination and ignore the interconnections among genes. Aiming at these shortages, we develop a robust structured heterogeneity analysis approach to identify subgroups, select important genes as well as estimate their effects on the phenotype of interest. Possible data contamination is accommodated by employing the Huber loss function. A sparse overlapping group lasso penalty is imposed to conduct regularization estimation and gene identification, while taking into account the possibly overlapping cluster structure of genes. This approach takes an iterative strategy in the similar spirit of K-means clustering. Simulations demonstrate that the proposed approach outperforms alternatives in revealing the heterogeneity and selecting important genes for each subgroup. The analysis of Cancer Cell Line Encyclopedia data leads to biologically meaningful findings with improved prediction and grouping stability.

READ FULL TEXT

page 1

page 11

research
11/28/2022

Regression-based heterogeneity analysis to identify overlapping subgroup structure in high-dimensional data

Heterogeneity is a hallmark of complex diseases. Regression-based hetero...
research
11/30/2022

Biomarker-guided heterogeneity analysis of genetic regulations via multivariate sparse fusion

Heterogeneity is a hallmark of many complex diseases. There are multiple...
research
08/07/2023

Regulation-incorporated Gene Expression Network-based Heterogeneity Analysis

Gene expression-based heterogeneity analysis has been extensively conduc...
research
03/05/2020

Robust Identification of Gene-Environment Interactions under High-Dimensional Accelerated Failure Time Models

For complex diseases, beyond the main effects of genetic (G) and environ...
research
12/18/2019

Multidimensional molecular changes-environment interaction analysis for disease outcomes

For the outcomes and phenotypes of complex diseases, multiple types of m...
research
11/12/2021

Accounting for data heterogeneity in integrative analysis and prediction methods: An application to Chronic Obstructive Pulmonary Disease

Epidemiologic and genetic studies in chronic obstructive pulmonary disea...
research
10/25/2021

RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data

Applications of single-cell RNA sequencing in various biomedical researc...

Please sign up or login with your details

Forgot password? Click here to reset