Log In Sign Up

Scalable Hierarchical Clustering with Tree Grafting

by   Nicholas Monath, et al.

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.


page 1

page 2

page 3

page 4


An Online Hierarchical Algorithm for Extreme Clustering

Many modern clustering methods scale well to a large number of data item...

An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means

This paper explores hierarchical clustering in the case where pairs of p...

Clustering based on Point-Set Kernel

Measuring similarity between two objects is the core operation in existi...

Scalable Community Detection via Parallel Correlation Clustering

Graph clustering and community detection are central problems in modern ...

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Across many areas, from neural tracking to database entity resolution, m...

Anytime Hierarchical Clustering

We propose a new anytime hierarchical clustering method that iteratively...

Code Repositories


Scalable Hierarchical Clustering with Tree Grafting

view repo