Scalable Hierarchical Clustering with Tree Grafting

12/31/2019
by   Nicholas Monath, et al.
15

We introduce Grinch, a new algorithm for large-scale, non-greedy hierarchical clustering with general linkage functions that compute arbitrary similarity between two point sets. The key components of Grinch are its rotate and graft subroutines that efficiently reconfigure the hierarchy as new points arrive, supporting discovery of clusters with complex structure. Grinch is motivated by a new notion of separability for clustering with linkage functions: we prove that when the model is consistent with a ground-truth clustering, Grinch is guaranteed to produce a cluster tree containing the ground-truth, independent of data arrival order. Our empirical results on benchmark and author coreference datasets (with standard and learned linkage functions) show that Grinch is more accurate than other scalable methods, and orders of magnitude faster than hierarchical agglomerative clustering.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

04/06/2017

An Online Hierarchical Algorithm for Extreme Clustering

Many modern clustering methods scale well to a large number of data item...
08/30/2020

An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means

This paper explores hierarchical clustering in the case where pairs of p...
02/14/2020

Clustering based on Point-Set Kernel

Measuring similarity between two objects is the core operation in existi...
07/27/2021

Scalable Community Detection via Parallel Correlation Clustering

Graph clustering and community detection are central problems in modern ...
10/16/2019

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

FISHDBC is a flexible, incremental, scalable, and hierarchical density-b...
03/19/2020

Clustering with Fast, Automated and Reproducible assessment applied to longitudinal neural tracking

Across many areas, from neural tracking to database entity resolution, m...
04/25/2019

PHANTOM: Curating GitHub for engineered software projects using time-series clustering

Context: Within the field of Mining Software Repositories, there are num...

Code Repositories

grinch

Scalable Hierarchical Clustering with Tree Grafting


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.