Tangles: From Weak to Strong Clustering

06/25/2020
by   Christian Elbracht, et al.
0

We introduce a new approach to clustering by using tangles, a tool that originates in mathematical graph theory. Given a collection of "weak partitions" of a data set, tangles provide a framework to aggregate these weak partitions such that they "point in the direction of a cluster". As a result, a cluster is softly characterized by a set of consistent pointers. This mechanism provides a highly flexible way of solving soft clustering problems in a variety of setups, ranging from questionnaires over community detection in graphs to clustering points in metric spaces. Conceptually, tangles have many intriguing properties: (1) Similar to boosting, which combines many weak classifiers to a strong classifier, tangles provide a formal way to combine many weak partitions to obtain few strong clusters. (2) In terms of computational complexity, tangles allow us to use simple, fast algorithms to produce the weak partitions. The complexity of identifying the strong partitions is dominated by the number of weak partitions, not the number of data points, leading to an interesting trade-off between the two. (3) If the weak partitions are interpretable, so are the strong partitions induced by the tangles, resulting in one of the rare algorithms to produce interpretable clusters. (4) The output of tangles is of a hierarchical nature, inducing the notion of a soft dendrogram that can be helpful in data visualization.

READ FULL TEXT

page 7

page 10

page 26

page 28

page 29

page 30

page 33

page 35

research
01/20/2021

Partitions of an Integer into Powers

In this paper, we use a simple discrete dynamical model to study partiti...
research
04/15/2020

Modified Relational Mountain Clustering Method

The relational mountain clustering method (RMCM) is a simple and effecti...
research
08/26/2016

Clustering and Community Detection with Imbalanced Clusters

Spectral clustering methods which are frequently used in clustering and ...
research
09/27/2012

Reclassification formula that provides to surpass K-means method

The paper presents a formula for the reclassification of multidimensiona...
research
11/30/2019

Crime in Philadelphia: Bayesian Clustering with Particle Optimization

Accurate estimation of the change in crime over time is a critical first...
research
12/12/2021

Identifying bias in cluster quality metrics

We study potential biases of popular cluster quality metrics, such as co...
research
02/08/2016

Homogeneity of Cluster Ensembles

The expectation and the mean of partitions generated by a cluster ensemb...

Please sign up or login with your details

Forgot password? Click here to reset