From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

10/01/2020
∙
by   Ines Chami, et al.
∙
4
∙

Similarity-based Hierarchical Clustering (HC) is a classical unsupervised machine learning algorithm that has traditionally been solved with heuristic algorithms like Average-Linkage. Recently, Dasgupta reframed HC as a discrete optimization problem by introducing a global cost function measuring the quality of a given tree. In this work, we provide the first continuous relaxation of Dasgupta's discrete optimization problem with provable quality guarantees. The key idea of our method, HypHC, is showing a direct correspondence from discrete trees to continuous representations (via the hyperbolic embeddings of their leaf nodes) and back (via a decoding algorithm that maps leaf embeddings to a dendrogram), allowing us to search the space of discrete binary trees with continuous optimization. Building on analogies between trees and hyperbolic space, we derive a continuous analogue for the notion of lowest common ancestor, which leads to a continuous relaxation of Dasgupta's discrete objective. We can show that after decoding, the global minimizer of our continuous relaxation yields a discrete tree with a (1 + epsilon)-factor approximation for Dasgupta's optimal tree, where epsilon can be made arbitrarily small and controls optimization challenges. We experimentally evaluate HypHC on a variety of HC benchmarks and find that even approximate solutions found with gradient descent have superior clustering quality than agglomerative heuristics or other gradient based algorithms. Finally, we highlight the flexibility of HypHC using end-to-end training in a downstream classification task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 04/14/2021

Exact and Approximate Hierarchical Clustering Using A*

Hierarchical clustering is a critical task in numerous domains. Many app...
research
∙ 05/05/2022

Contrastive Multi-view Hyperbolic Hierarchical Clustering

Hierarchical clustering recursively partitions data at an increasingly f...
research
∙ 03/22/2022

Non-linear Embeddings in Hilbert Simplex Geometry

A key technique of machine learning and computer vision is to embed disc...
research
∙ 06/05/2023

End-to-end Differentiable Clustering with Associative Memories

Clustering is a widely used unsupervised learning technique involving an...
research
∙ 05/25/2019

Ultrametric Fitting by Gradient Descent

We study the problem of fitting an ultrametric distance to a dissimilari...
research
∙ 04/23/2021

Learning phylogenetic trees as hyperbolic point configurations

An alternative to independent pairwise distance estimation is proposed t...
research
∙ 05/19/2022

HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering

The problem of fitting distances by tree-metrics has received significan...

Please sign up or login with your details

Forgot password? Click here to reset