DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

04/06/2017
by   Zhe Sun, et al.
0

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become a revolutionary tool to study cellular and molecular processes at single cell resolution. Among existing technologies, the recently developed droplet-based platform enables efficient parallel processing of thousands of single cells with direct counting of transcript copies using Unique Molecular Identifier (UMI). Despite the technology advances, statistical methods and computational tools are still lacking for analyzing droplet-based scRNA-Seq data. Particularly, model-based approaches for clustering large-scale single cell transcriptomic data are still under-explored. Methods: We developed DIMM-SC, a Dirichlet Mixture Model for clustering droplet-based Single Cell transcriptomic data. This approach explicitly models UMI count data from scRNA-Seq experiments and characterizes variations across different cell clusters via a Dirichlet mixture prior. An expectation-maximization algorithm is used for parameter inference. Results: We performed comprehensive simulations to evaluate DIMM-SC and compared it with existing clustering methods such as K-means, CellTree and Seurat. In addition, we analyzed public scRNA-Seq datasets with known cluster labels and in-house scRNA-Seq datasets from a study of systemic sclerosis with prior biological knowledge to benchmark and validate DIMM-SC. Both simulation studies and real data applications demonstrated that overall, DIMM-SC achieves substantially improved clustering accuracy and much lower clustering variability compared to other existing clustering methods. More importantly, as a model-based approach, DIMM-SC is able to quantify the clustering uncertainty for each single cell, facilitating rigorous statistical inference and biological interpretations, which are typically unavailable from existing clustering methods.

READ FULL TEXT

page 12

page 13

research
12/25/2018

Parallel Clustering of Single Cell Transcriptomic Data with Split-Merge Sampling on Dirichlet Process Mixtures

Motivation: With the development of droplet based systems, massive singl...
research
06/14/2023

Graph-Aligned Random Partition Model (GARP)

Bayesian nonparametric mixtures and random partition models are powerful...
research
08/03/2023

Is your data alignable? Principled and interpretable alignability testing and integration of single-cell data

Single-cell data integration can provide a comprehensive molecular view ...
research
10/25/2021

RZiMM-scRNA: A regularized zero-inflated mixture model framework for single-cell RNA-seq data

Applications of single-cell RNA sequencing in various biomedical researc...
research
05/19/2022

Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis

Unsupervised clustering algorithms for vectors has been widely used in t...
research
09/02/2019

Clustering of count data through a mixture of multinomial PCA

Count data is becoming more and more ubiquitous in a wide range of appli...
research
07/22/2017

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents uni...

Please sign up or login with your details

Forgot password? Click here to reset