Scalable Hierarchical Clustering with Tree Grafting

by   Nicholas Monath, et al.

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.



There are no comments yet.


page 1

page 2

page 3

page 4


An Online Hierarchical Algorithm for Extreme Clustering

Many modern clustering methods scale well to a large number of data item...

An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means

This paper explores hierarchical clustering in the case where pairs of p...

Clustering based on Point-Set Kernel

Measuring similarity between two objects is the core operation in existi...

Scalable Community Detection via Parallel Correlation Clustering

Graph clustering and community detection are central problems in modern ...

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

FISHDBC is a flexible, incremental, scalable, and hierarchical density-b...

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Across many areas, from neural tracking to database entity resolution, m...

PHANTOM: Curating GitHub for engineered software projects using time-series clustering

Context: Within the field of Mining Software Repositories, there are num...

Code Repositories


Scalable Hierarchical Clustering with Tree Grafting

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.