Hierarchical clustering with dot products recovers hidden tree structure

05/24/2023
by   Annie Gray, et al.
0

In this paper we offer a new perspective on the well established agglomerative clustering algorithm, focusing on recovery of hierarchical structure. We recommend a simple variant of the standard algorithm, in which clusters are merged by maximum average dot product and not, for example, by minimum distance or within-cluster variance. We demonstrate that the tree output by this algorithm provides a bona fide estimate of generative hierarchical structure in data, under a generic probabilistic graphical model. The key technical innovations are to understand how hierarchical information in this model translates into tree geometry which can be recovered from data, and to characterise the benefits of simultaneously growing sample size and data dimension. We demonstrate superior tree recovery performance with real data over existing approaches such as UPGMA, Ward's method, and HDBSCAN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2013

Clustering for high-dimension, low-sample size data using distance vectors

In high-dimension, low-sample size (HDLSS) data, it is not always true t...
research
11/11/2021

Hierarchical clustering by aggregating representatives in sub-minimum-spanning-trees

One of the main challenges for hierarchical clustering is how to appropr...
research
06/09/2018

Hierarchical Clustering with Prior Knowledge

Hierarchical clustering is a class of algorithms that seeks to build a h...
research
10/29/2017

Complexity Analysis Approach for Prefabricated Construction Products Using Uncertain Data Clustering

This paper proposes an uncertain data clustering approach to quantitativ...
research
02/27/2023

Detecting Jumps on a Tree: a Hierarchical Pitman-Yor Model for Evolution of Phenotypic Distributions

This work focuses on clustering populations with a hierarchical dependen...
research
01/26/2018

Information Content of a Phylogenetic Tree in a Data Matrix

Phylogenetic trees in genetics and biology in general are all binary. We...
research
11/30/2021

Hierarchical clustering: visualization, feature importance and model selection

We propose methods for the analysis of hierarchical clustering that full...

Please sign up or login with your details

Forgot password? Click here to reset