Clustering with Confidence: Finding Clusters with Statistical Guarantees

12/27/2016
by   Andreas Henelius, et al.
0

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least 1 - α. We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2019

A Novel Initial Clusters Generation Method for K-means-based Clustering Algorithms for Mixed Datasets

Mixed datasets consist of numeric and categorical attributes. Various K-...
research
05/25/2023

Metrics for quantifying isotropy in high dimensional unsupervised clustering tasks in a materials context

Clustering is a common task in machine learning, but clusters of unlabel...
research
05/16/2020

Revisiting Agglomerative Clustering

In data clustering, emphasis is often placed in finding groups of points...
research
07/16/2023

Using Decision Trees for Interpretable Supervised Clustering

In this paper, we address an issue of finding explainable clusters of cl...
research
06/29/2018

Grapevine: A Wine Prediction Algorithm Using Multi-dimensional Clustering Methods

We present a method for a wine recommendation system that employs multid...
research
09/19/2019

DAOC: Stable Clustering of Large Networks

Clustering is a crucial component of many data mining systems involving ...

Please sign up or login with your details

Forgot password? Click here to reset