Fast and explainable clustering based on sorting

02/03/2022
by   Xinye Chen, et al.
13

We introduce a fast and explainable clustering method called CLASSIX. It consists of two phases, namely a greedy aggregation phase of the sorted data into groups of nearby data points, followed by the merging of groups into clusters. The algorithm is controlled by two scalar parameters, namely a distance parameter for the aggregation and another parameter controlling the minimal cluster size. Extensive experiments are conducted to give a comprehensive evaluation of the clustering performance on synthetic and real-world datasets, with various cluster shapes and low to high feature dimensionality. Our experiments demonstrate that CLASSIX competes with state-of-the-art clustering algorithms. The algorithm has linear space complexity and achieves near linear time complexity on a wide range of problems. Its inherent simplicity allows for the generation of intuitive explanations of the computed clusters.

READ FULL TEXT

page 17

page 18

page 20

page 21

research
09/17/2019

Global Optimal Path-Based Clustering Algorithm

Combinatorial optimization problems for clustering are known to be NP-ha...
research
10/17/2022

Cluster Explanation via Polyhedral Descriptions

Clustering is an unsupervised learning problem that aims to partition un...
research
02/14/2018

Robust Continuous Co-Clustering

Clustering consists of grouping together samples giving their similar pr...
research
09/14/2023

Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering

Given a set of points labeled with k labels, we introduce the heat map s...
research
12/10/2021

Interpretable Clustering via Multi-Polytope Machines

Clustering is a popular unsupervised learning tool often used to discove...
research
01/20/2022

Sketch-and-Lift: Scalable Subsampled Semidefinite Program for K-means Clustering

Semidefinite programming (SDP) is a powerful tool for tackling a wide ra...
research
04/10/2023

FINEX: A Fast Index for Exact Flexible Density-Based Clustering (Extended Version with Proofs)*

Density-based clustering aims to find groups of similar objects (i.e., c...

Please sign up or login with your details

Forgot password? Click here to reset