Nonparametric clustering of RNA-sequencing data

09/23/2022
by   Gabriel Lozano, et al.
0

Identification of clusters of co-expressed genes in transcriptomic data is a difficult task. Most algorithms used for this purpose can be classified into two broad categories: distance-based or model-based approaches. Distance-based approaches typically utilize a distance function between pairs of data objects and group similar objects together into clusters. Model-based approaches are based on using the mixture-modeling framework. Compared to distance-based approaches, model-based approaches offer better interpretability because each cluster can be explicitly characterized in terms of the proposed model. However, these models present a particular difficulty in identifying a correct multivariate distribution that a mixture can be based upon. In this manuscript, we review some of the approaches used to select a distribution for the needed mixture model first. Then, we propose avoiding this problem altogether by using a nonparametric MSL (Maximum Smoothed Likelihood) algorithm. This algorithm was proposed earlier in statistical literature but has not been, to the best of our knowledge, applied to transcriptomics data. The salient feature of this approach is that it avoids explicit specification of distributions of individual biological samples altogether, thus making the task of a practitioner easier. When used on a real dataset, the algorithm produces a large number of biologically meaningful clusters and compares favorably to the two other mixture-based algorithms commonly used for RNA-seq data clustering. Our code is publicly available in Github at https://github.com/Matematikoi/non_parametric_clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2016

Mixture model modal clustering

The two most extended density-based approaches to clustering are surely ...
research
03/29/2020

DCMD: Distance-based Classification Using Mixture Distributions on Microbiome Data

Current advances in next generation sequencing techniques have allowed r...
research
12/20/2021

Bayesian nonparametric model based clustering with intractable distributions: an ABC approach

Bayesian nonparametric mixture models offer a rich framework for model b...
research
09/02/2020

An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture based clustering

We introduce a new approach to deciding the number of clusters. The appr...
research
02/13/2021

ThetA – fast and robust clustering via a distance parameter

Clustering is a fundamental problem in machine learning where distance-b...
research
10/19/2018

Bayesian Distance Clustering

Model-based clustering is widely-used in a variety of application areas....
research
06/09/2023

An introduction and tutorial to model-based clustering in education via Gaussian mixture modelling

Heterogeneity has been a hot topic in recent educational literature. Sev...

Please sign up or login with your details

Forgot password? Click here to reset