A sparse negative binomial mixture model for clustering RNA-seq count data

12/05/2019
by   Tanbin Rahman, et al.
8

Clustering with variable selection is a challenging but critical task for modern small-n-large-p data. Existing methods based on Gaussian mixture models or sparse K-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with Gaussian assumption. In this paper, we develop a negative binomial mixture model with gene regularization to cluster samples (small n) with high-dimensional gene features (large p). EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with sparse Gaussian mixture model and sparse K-means using extensive simulations and two real transcriptomic applications in breast cancer and rat brain studies. The result shows superior performance of the proposed count data model in clustering accuracy, feature selection and biological interpretation by pathway enrichment analysis.

READ FULL TEXT

page 21

page 22

page 23

page 24

research
04/13/2018

A Latent Gaussian Mixture Model for Clustering Longitudinal Data

Finite mixture models have become a popular tool for clustering. Amongst...
research
07/11/2017

Efficient mixture model for clustering of sparse high dimensional binary data

In this paper we propose a mixture model, SparseMix, for clustering of s...
research
02/23/2019

Bayesian Modeling of Microbiome Data for Differential Abundance Analysis

The advances of next-generation sequencing technology have accelerated s...
research
04/03/2018

A Mixture Model to Detect Edges in Sparse Co-expression Graphs

In the early days of microarray data, the medical and statistical commun...
research
08/16/2019

Regression on imperfect class labels derived by unsupervised clustering

Outcome regressed on class labels identified by unsupervised clustering ...
research
03/04/2022

False clustering rate control in mixture models

The clustering task consists in delivering labels to the members of a sa...
research
12/17/2020

Smoothed Gaussian Mixture Models for Video Classification and Recommendation

Cluster-and-aggregate techniques such as Vector of Locally Aggregated De...

Please sign up or login with your details

Forgot password? Click here to reset