HyperAid: Denoising in hyperbolic spaces for tree-fitting and hierarchical clustering

05/19/2022
by   Eli Chien, et al.
7

The problem of fitting distances by tree-metrics has received significant attention in the theoretical computer science and machine learning communities alike, due to many applications in natural language processing, phylogeny, cancer genomics and a myriad of problem areas that involve hierarchical clustering. Despite the existence of several provably exact algorithms for tree-metric fitting of data that inherently obeys tree-metric constraints, much less is known about how to best fit tree-metrics for data whose structure moderately (or substantially) differs from a tree. For such noisy data, most available algorithms perform poorly and often produce negative edge weights in representative trees. Furthermore, it is currently not known how to choose the most suitable approximation objective for noisy fitting. Our contributions are as follows. First, we propose a new approach to tree-metric denoising (HyperAid) in hyperbolic spaces which transforms the original data into data that is “more” tree-like, when evaluated in terms of Gromov's δ hyperbolicity. Second, we perform an ablation study involving two choices for the approximation objective, ℓ_p norms and the Dasgupta loss. Third, we integrate HyperAid with schemes for enforcing nonnegative edge-weights. As a result, the HyperAid platform outperforms all other existing methods in the literature, including Neighbor Joining (NJ), TreeRep and T-REX, both on synthetic and real-world data. Synthetic data is represented by edge-augmented trees and shortest-distance metrics while the real-world datasets include Zoo, Iris, Glass, Segmentation and SpamBase; on these datasets, the average improvement with respect to NJ is 125.94%.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 8

page 9

page 10

research
05/08/2020

Tree! I am no Tree! I am a Low Dimensional Hyperbolic Embedding

Given data, finding a faithful low-dimensional hyperbolic embedding of t...
research
07/29/2023

Fitting Tree Metrics with Minimum Disagreements

In the L_0 Fitting Tree Metrics problem, we are given all pairwise dista...
research
12/29/2022

On Learning the Structure of Clusters in Graphs

Graph clustering is a fundamental problem in unsupervised learning, with...
research
08/05/2019

Recognizing and realizing cactus metrics

The problem of realizing finite metric spaces in terms of weighted graph...
research
10/06/2021

Fitting Distances by Tree Metrics Minimizing the Total Error within a Constant Factor

We consider the numerical taxonomy problem of fitting a positive distanc...
research
08/14/2023

Federated Classification in Hyperbolic Spaces via Secure Aggregation of Convex Hulls

Hierarchical and tree-like data sets arise in many applications, includi...
research
10/01/2020

From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering

Similarity-based Hierarchical Clustering (HC) is a classical unsupervise...

Please sign up or login with your details

Forgot password? Click here to reset