Supervised clustering of high dimensional data using regularized mixture modeling

by   Wennan Chang, et al.

Identifying relationships between molecular variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high dimensional molecular manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects. We proposed a novel supervised clustering algorithm using penalized mixture regression model, called CSMR, to deal with the challenges in studying the heterogeneous relationships between high dimensional molecular features to a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical manifestations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease, and could be of special relevance in the growing field of personalized medicine.



There are no comments yet.


page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 9


Outcome-guided Sparse K-means for Disease Subtype Discovery via Integrating Phenotypic Data with High-dimensional Transcriptomic Data

The discovery of disease subtypes is an essential step for developing pr...

Weighted Cox regression for the prediction of heterogeneous patient subgroups

An important task in clinical medicine is the construction of risk predi...

Parallel subgroup analysis of high-dimensional data via M-regression

It becomes an interesting problem to identify subgroup structures in dat...

Dose-response modeling in high-throughput cancer drug screenings: A case study with recommendations for practitioners

Personalized cancer treatments based on the molecular profile of a patie...

Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space

We introduce an unsupervised clustering algorithm to improve training ef...

Outcome-Guided Disease Subtyping for High-Dimensional Omics Data

High-throughput microarray and sequencing technology have been used to i...

A New Algorithm using Component-wise Adaptive Trimming For Robust Mixture Regression

Mixture regression provides a statistical model for teasing out latent h...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.