Exact and Approximate Hierarchical Clustering Using A*

04/14/2021
by   Craig S. Greenberg, et al.
7

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel trellis data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with 10^12 trees to 10^15 trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than 10^1000 trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/26/2020

Compact Representation of Uncertainty in Hierarchical Clustering

Hierarchical clustering is a fundamental task often used to discover mea...
research
10/01/2020

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Similarity-based Hierarchical Clustering (HC) is a classical unsupervise...
research
06/16/2023

Nearly-Optimal Hierarchical Clustering for Well-Clustered Graphs

This paper presents two efficient hierarchical clustering (HC) algorithm...
research
08/07/2023

TeraHAC: Hierarchical Agglomerative Clustering of Trillion-Edge Graphs

We introduce TeraHAC, a (1+ϵ)-approximate hierarchical agglomerative clu...
research
11/16/2020

Hierarchical clustering in particle physics through reinforcement learning

Particle physics experiments often require the reconstruction of decay p...
research
08/03/2016

Improving Quality of Hierarchical Clustering for Large Data Series

Brown clustering is a hard, hierarchical, bottom-up clustering of words ...
research
08/22/2018

Clustering and Labelling Auction Fraud Data

Although shill bidding is a common auction fraud, it is however very tou...

Please sign up or login with your details

Forgot password? Click here to reset