Hierarchical Clustering: O(1)-Approximation for Well-Clustered Graphs

12/16/2021
by   Bogdan-Adrian Manghiuc, et al.
3

Hierarchical clustering studies a recursive partition of a data set into clusters of successively smaller size, and is a fundamental problem in data analysis. In this work we study the cost function for hierarchical clustering introduced by Dasgupta, and present two polynomial-time approximation algorithms: Our first result is an O(1)-approximation algorithm for graphs of high conductance. Our simple construction bypasses complicated recursive routines of finding sparse cuts known in the literature. Our second and main result is an O(1)-approximation algorithm for a wide family of graphs that exhibit a well-defined structure of clusters. This result generalises the previous state-of-the-art, which holds only for graphs generated from stochastic models. The significance of our work is demonstrated by the empirical analysis on both synthetic and real-world data sets, on which our presented algorithm outperforms the previously proposed algorithm for graphs with a well-defined cluster structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2018

A New Cost Function for Hierarchical Cluster Trees

Hierarchical clustering has been a popular method in various data analys...
research
12/06/2018

An Improved Cost Function for Hierarchical Cluster Trees

Hierarchical clustering has been a popular method in various data analys...
research
05/05/2022

Finding Bipartite Components in Hypergraphs

Hypergraphs are important objects to model ternary or higher-order relat...
research
03/16/2022

Tangles and Hierarchical Clustering

We establish a connection between tangles, a concept from structural gra...
research
03/09/2023

Parallel Filtered Graphs for Hierarchical Clustering

Given all pairwise weights (distances) among a set of objects, filtered ...
research
12/15/2019

Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

Hierarchical Clustering is an unsupervised data analysis method which ha...
research
10/06/2021

T-SNE Is Not Optimized to Reveal Clusters in Data

Cluster visualization is an essential task for nonlinear dimensionality ...

Please sign up or login with your details

Forgot password? Click here to reset