Confident Clustering via PCA Compression Ratio and Its Application to Single-cell RNA-seq Analysis

05/19/2022
by   Yingcong Li, et al.
0

Unsupervised clustering algorithms for vectors has been widely used in the area of machine learning. Many applications, including the biological data we studied in this paper, contain some boundary datapoints which show combination properties of two underlying clusters and could lower the performance of the traditional clustering algorithms. We develop a confident clustering method aiming to diminish the influence of these datapoints and improve the clustering results. Concretely, for a list of datapoints, we give two clustering results. The first-round clustering attempts to classify only pure vectors with high confidence. Based on it, we classify more vectors with less confidence in the second round. We validate our algorithm on single-cell RNA-seq data, which is a powerful and widely used tool in biology area. Our confident clustering shows a high accuracy on our tested datasets. In addition, unlike traditional clustering methods in single-cell analysis, the confident clustering shows high stability under different choices of parameters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2022

Compressibility: Power of PCA in Clustering Problems Beyond Dimensionality Reduction

In this paper we take a step towards understanding the impact of princip...
research
01/03/2020

Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

In recent years, the advances in single-cell RNA-seq techniques have ena...
research
04/06/2017

DIMM-SC: A Dirichlet mixture model for clustering droplet-based single cell transcriptomic data

Motivation: Single cell transcriptome sequencing (scRNA-Seq) has become ...
research
07/14/2022

Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach

Clustering is an unsupervised machine learning methodology where unlabel...
research
05/19/2022

scICML: Information-theoretic Co-clustering-based Multi-view Learning for the Integrative Analysis of Single-cell Multi-omics data

Modern high-throughput sequencing technologies have enabled us to profil...
research
10/05/2021

Fast and Interpretable Consensus Clustering via Minipatch Learning

Consensus clustering has been widely used in bioinformatics and other ap...
research
05/22/2018

Clustering - What Both Theoreticians and Practitioners are Doing Wrong

Unsupervised learning is widely recognized as one of the most important ...

Please sign up or login with your details

Forgot password? Click here to reset