Log In Sign Up

Exact and Approximate Hierarchical Clustering Using A*

by   Craig S. Greenberg, et al.

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel trellis data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with 10^12 trees to 10^15 trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than 10^1000 trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.


page 1

page 2

page 3

page 4


Compact Representation of Uncertainty in Hierarchical Clustering

Hierarchical clustering is a fundamental task often used to discover mea...

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Similarity-based Hierarchical Clustering (HC) is a classical unsupervise...

Learning (Re-)Starting Solutions for Vehicle Routing Problems

A key challenge in solving a combinatorial optimization problem is how t...

Hierarchical clustering in particle physics through reinforcement learning

Particle physics experiments often require the reconstruction of decay p...

Output Space Search for Structured Prediction

We consider a framework for structured prediction based on search in the...

Analyzing the Effect of Objective Correlation on the Efficient Set of MNK-Landscapes

In multiobjective combinatorial optimization, there exists two main clas...

Learning Tree Structures from Leaves For Particle Decay Reconstruction

In this work, we present a neural approach to reconstructing rooted tree...