Exact and Approximate Hierarchical Clustering Using A*

by   Craig S. Greenberg, et al.

Hierarchical clustering is a critical task in numerous domains. Many approaches are based on heuristics and the properties of the resulting clusterings are studied post hoc. However, in several applications, there is a natural cost function that can be used to characterize the quality of the clustering. In those cases, hierarchical clustering can be seen as a combinatorial optimization problem. To that end, we introduce a new approach based on A* search. We overcome the prohibitively large search space by combining A* with a novel trellis data structure. This combination results in an exact algorithm that scales beyond previous state of the art, from a search space with 10^12 trees to 10^15 trees, and an approximate algorithm that improves over baselines, even in enormous search spaces that contain more than 10^1000 trees. We empirically demonstrate that our method achieves substantially higher quality results than baselines for a particle physics use case and other clustering benchmarks. We describe how our method provides significantly improved theoretical bounds on the time and space complexity of A* for clustering.



There are no comments yet.


page 1

page 2

page 3

page 4


Compact Representation of Uncertainty in Hierarchical Clustering

Hierarchical clustering is a fundamental task often used to discover mea...

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Similarity-based Hierarchical Clustering (HC) is a classical unsupervise...

Learning (Re-)Starting Solutions for Vehicle Routing Problems

A key challenge in solving a combinatorial optimization problem is how t...

Hierarchical clustering in particle physics through reinforcement learning

Particle physics experiments often require the reconstruction of decay p...

A branch-and-bound feature selection algorithm for U-shaped cost functions

This paper presents the formulation of a combinatorial optimization prob...

Analyzing the Effect of Objective Correlation on the Efficient Set of MNK-Landscapes

In multiobjective combinatorial optimization, there exists two main clas...

Clustering Binary Data by Application of Combinatorial Optimization Heuristics

We study clustering methods for binary data, first defining aggregation ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.