Automated calibration of consensus weighted distance-based clustering approaches using sharp

04/26/2023
by   Barbara Bodinier, et al.
0

In consensus clustering, a clustering algorithm is used in combination with a subsampling procedure to detect stable clusters. Previous studies on both simulated and real data suggest that consensus clustering outperforms native algorithms. We extend here consensus clustering to allow for attribute weighting in the calculation of pairwise distances using existing regularised approaches. We propose a procedure for the calibration of the number of clusters (and regularisation parameter) by maximising a novel consensus score calculated directly from consensus clustering outputs, making it extremely computationally competitive. Our simulation study shows better clustering performances of (i) models calibrated by maximising our consensus score compared to existing calibration scores, and (ii) weighted compared to unweighted approaches in the presence of features that do not contribute to cluster definition. Application on real gene expression data measured in lung tissue reveals clear clusters corresponding to different lung cancer subtypes. The R package sharp (version 1.4.0) is available on CRAN.

READ FULL TEXT

page 4

page 18

page 33

page 41

research
02/05/2020

Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

A key issue in cluster analysis is the choice of an appropriate clusteri...
research
04/08/2019

CRAD: Clustering with Robust Autocuts and Depth

We develop a new density-based clustering algorithm named CRAD which is ...
research
10/05/2021

Fast and Interpretable Consensus Clustering via Minipatch Learning

Consensus clustering has been widely used in bioinformatics and other ap...
research
09/20/2019

Consensual aggregation of clusters based on Bregman divergences to improve predictive models

A new procedure to construct predictive models in supervised learning pr...
research
04/05/2020

Stage I non-small cell lung cancer stratification by using a model-based clustering algorithm with covariates

Lung cancer is currently the leading cause of cancer deaths. Among vario...
research
09/03/2021

J-Score: A Robust Measure of Clustering Accuracy

Background. Clustering analysis discovers hidden structures in a data se...
research
02/03/2022

Cross-Study Replicability in Cluster Analysis

In cancer research, clustering techniques are widely used for explorator...

Please sign up or login with your details

Forgot password? Click here to reset