Cross-Study Replicability in Cluster Analysis

02/03/2022
by   Lorenzo Masoero, et al.
0

In cancer research, clustering techniques are widely used for exploratory analyses and dimensionality reduction, playing a critical role in the identification of novel cancer subtypes, often with direct implications for patient management. As data collected by multiple research groups grows, it is increasingly feasible to investigate the replicability of clustering procedures, that is, their ability to consistently recover biologically meaningful clusters across several datasets. In this paper, we review existing methods to assess replicability of clustering analyses, and discuss a framework for evaluating cross-study clustering replicability, useful when two or more studies are available. These approaches can be applied to any clustering algorithm and can employ different measures of similarity between partitions to quantify replicability, globally (i.e. for the whole sample) as well as locally (i.e. for individual clusters). Using experiments on synthetic and real gene expression data, we illustrate the utility of replicability metrics to evaluate if the same clusters are identified consistently across a collection of datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2018

An interpretable multiple kernel learning approach for the discovery of integrative cancer subtypes

Due to the complexity of cancer, clustering algorithms have been used to...
research
04/13/2009

KiWi: A Scalable Subspace Clustering Algorithm for Gene Expression Analysis

Subspace clustering has gained increasing popularity in the analysis of ...
research
11/29/2017

HSC: A Novel Method for Clustering Hierarchies of Networked Data

Hierarchical clustering is one of the most powerful solutions to the pro...
research
04/30/2023

A new clustering framework

Detection of clusters is a crucial task across many disciplines such as ...
research
02/03/2023

A Novel Fuzzy Bi-Clustering Algorithm with AFS for Identification of Co-Regulated Genes

The identification of co-regulated genes and their transcription-factor ...
research
04/26/2023

Automated calibration of consensus weighted distance-based clustering approaches using sharp

In consensus clustering, a clustering algorithm is used in combination w...
research
10/04/2022

Detection and Evaluation of Clusters within Sequential Data

Motivated by theoretical advancements in dimensionality reduction techni...

Please sign up or login with your details

Forgot password? Click here to reset