Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process Mixtures

12/25/2018
by   Tiehang Duan, et al.
0

Motivation: With the development of droplet based systems, massive single cell transcriptome data has become available, which enables analysis of cellular and molecular processes at single cell resolution and is instrumental to understanding many biological processes. While state-of-the-art clustering methods have been applied to the data, they face challenges in the following aspects: (1) the clustering quality still needs to be improved; (2) most models need prior knowledge on number of clusters, which is not always available; (3) there is a demand for faster computational speed. Results: We propose to tackle these challenges with Parallel Split Merge Sampling on Dirichlet Process Mixture Model (the Para-DPMM model). Unlike classic DPMM methods that perform sampling on each single data point, the split merge mechanism samples on the cluster level, which significantly improves convergence and optimality of the result. The model is highly parallelized and can utilize the computing power of high performance computing (HPC) clusters, enabling massive clustering on huge datasets. Experiment results show the model outperforms current widely used models in both clustering quality and computational speed. Availability: Source code is publicly available on https://github.com/tiehangd/Para_DPMM/tree/master/Para_DPMM_package

READ FULL TEXT
research
04/06/2017

DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become ...
research
05/23/2023

DIVA: A Dirichlet Process Based Incremental Deep Clustering Algorithm via Variational Auto-Encoder

Generative model-based deep clustering frameworks excel in classifying c...
research
08/02/2020

Dirichlet-tree multinomial mixtures for clustering microbiome compositions

A common routine in microbiome research is to identify reproducible patt...
research
05/31/2014

Adaptive Reconfiguration Moves for Dirichlet Mixtures

Bayesian mixture models are widely applied for unsupervised learning and...
research
03/27/2022

DeepDPM: Deep Clustering With an Unknown Number of Clusters

Deep Learning (DL) has shown great promise in the unsupervised task of c...
research
08/30/2019

Optimal Legislative County Clustering in North Carolina

North Carolina's constitution requires that state legislative districts ...
research
02/21/2018

Scaling-up Split-Merge MCMC with Locality Sensitive Sampling (LSS)

Split-Merge MCMC (Monte Carlo Markov Chain) is one of the essential and ...

Please sign up or login with your details

Forgot password? Click here to reset