Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time

06/10/2021
by   Laxman Dhulipala, et al.
0

We study the widely used hierarchical agglomerative clustering (HAC) algorithm on edge-weighted graphs. We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient Õ(m) time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures. Furthermore, for average-linkage, arguably the most popular variant of HAC, we provide an algorithm that runs in Õ(n√(m)) time. For this variant, this is the first exact algorithm that runs in subquadratic time, as long as m=n^2-ϵ for some constant ϵ > 0. We complement this result with a simple ϵ-close approximation algorithm for average-linkage in our framework that runs in Õ(m) time. As an application of our algorithms, we consider clustering points in a metric space by first using k-NN to generate a graph from the point set, and then running our algorithms on the resulting weighted graph. We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7–76.5x.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2022

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Obtaining scalable algorithms for hierarchical agglomerative clustering ...
research
03/03/2020

Scalable Distributed Approximation of Internal Measures for Clustering Evaluation

The most widely used internal measure for clustering evaluation is the s...
research
03/16/2018

Fast approximation and exact computation of negative curvature parameters of graphs

In this paper, we study Gromov hyperbolicity and related parameters, tha...
research
05/10/2020

Comparison and Benchmark of Graph Clustering Algorithms

Graph clustering is widely used in analysis of biological networks, soci...
research
08/15/2020

On Efficient Low Distortion Ultrametric Embedding

A classic problem in unsupervised learning and data analysis is to find ...
research
06/18/2012

Efficient Active Algorithms for Hierarchical Clustering

Advances in sensing technologies and the growth of the internet have res...
research
09/20/2019

Online Hierarchical Clustering Approximations

Hierarchical clustering is a widely used approach for clustering dataset...

Please sign up or login with your details

Forgot password? Click here to reset