Dirichlet-tree multinomial mixtures for clustering microbiome compositions

08/02/2020
by   Jialiang Mao, et al.
0

A common routine in microbiome research is to identify reproducible patterns in the population through unsupervised clustering of samples. To this end, we introduce Dirichlet-tree multinomial mixtures (DTMM) as a generative model for the amplicon sequencing data in microbiome studies. DTMM models the microbiome population with Dirichlet process mixtures to learn a clustering structure. For the mixing kernels, DTMM directly utilizes a phylogenetic tree to perform a tree-based decomposition of the Dirichlet distribution. Through this decomposition, DTMM offers a flexible covariance structure to capture the large within-cluster variations, while providing a way of borrowing information among samples in different clusters to accurately learn the common part of the clusters. We perform extensive simulation studies to evaluate the performance of DTMM and compare it to several model-based and distance-based clustering methods in the microbiome context. Finally, we analyze a specific version of the fecal data in the American Gut project to identify underlying clusters of the microbiota of IBD and diabetes patients. Our analysis shows that (i) clusters in the human gut microbiome are generally determined by a large number of OTUs jointly in a sophisticated manner; (ii) OTUs from genera Bacteroides, Prevotella and Ruminococcus are typically among the important OTUs in identifying clusters; (iii) the number of clusters and the OTUs that characterize each cluster can differ across different patient groups.

READ FULL TEXT

page 25

page 29

page 32

page 34

research
05/25/2022

Clustering consistency with Dirichlet process mixtures

Dirichlet process mixtures are flexible non-parametric models, particula...
research
01/16/2013

Model-Based Hierarchical Clustering

We present an approach to model-based hierarchical clustering by formula...
research
07/31/2020

Bayesian Approaches for Flexible and Informative Clustering of Microbiome Data

We propose two unsupervised clustering methods that are designed for hum...
research
07/07/2021

Bayesian model-based clustering for multiple network data

There is increasing appetite for analysing multiple network data. This i...
research
04/14/2015

Probabilistic Clustering of Time-Evolving Distance Data

We present a novel probabilistic clustering model for objects that are r...
research
11/01/2021

Network Clustering for Latent State and Changepoint Detection

Network models provide a powerful and flexible framework for analyzing a...
research
12/25/2018

Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process Mixtures

Motivation: With the development of droplet based systems, massive singl...

Please sign up or login with your details

Forgot password? Click here to reset