A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data

11/30/2017
by   Anjali Silva, et al.
0

High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. A mixture of multivariate Poisson-Log Normal (MPLN) model is proposed for clustering of high-throughput transcriptome sequencing data. The MPLN model is able to fit a wide range of correlation and overdispersion situations, and is ideal for modeling multivariate count data from RNA sequencing studies. Parameter estimation is carried out via a Markov chain Monte Carlo expectation-maximization algorithm (MCMC-EM), and information criteria are used for model selection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/22/2018

Finite mixtures of matrix-variate Poisson-log normal distributions for three-way count data

Three-way data structures, characterized by three entities, the units, t...
research
10/24/2017

A Bayesian Method for Joint Clustering of Vectorial Data and Network Data

We present a new model-based integrative method for clustering objects g...
research
10/13/2014

Mining Block I/O Traces for Cache Preloading with Sparse Temporal Non-parametric Mixture of Multivariate Poisson

Existing caching strategies, in the storage domain, though well suited t...
research
09/02/2019

Clustering of count data through a mixture of multinomial PCA

Count data is becoming more and more ubiquitous in a wide range of appli...
research
08/23/2022

Multinomial Cluster-Weighted Models for High-Dimensional Data

Modeling of high-dimensional data is very important to categorize differ...
research
08/15/2017

Sparse Inverse Covariance Estimation for High-throughput microRNA Sequencing Data in the Poisson Log-Normal Graphical Model

We introduce the Poisson Log-Normal Graphical Model for count data, and ...
research
12/07/2022

Network Analysis of Count Data from Mixed Populations

In applications such as gene regulatory network analysis based on single...

Please sign up or login with your details

Forgot password? Click here to reset