Statistical Inference for Cluster Trees

05/20/2016
by   Jisu Kim, et al.
0

A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyze their properties and assess their suitability for inference. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2010

Stability of Density-Based Clustering

High density clusters can be characterized by the connected components o...
research
11/06/2019

A Hybrid Approach To Hierarchical Density-based Cluster Selection

HDBSCAN is a density-based clustering algorithm that constructs a cluste...
research
11/06/2019

HDBSCAN(): An Alternative Cluster Extraction Method for HDBSCAN

HDBSCAN is a density-based clustering algorithm that constructs a cluste...
research
06/23/2020

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

BIRCH clustering is a widely known approach for clustering, that has inf...
research
09/11/2019

Tree congruence: quantifying similarity between dendrogram topologies

Tree congruence metrics are typically global indices that describe the s...
research
07/30/2013

DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering

The level set tree approach of Hartigan (1975) provides a probabilistica...
research
11/19/2021

An Asymptotic Equivalence between the Mean-Shift Algorithm and the Cluster Tree

Two important nonparametric approaches to clustering emerged in the 1970...

Please sign up or login with your details

Forgot password? Click here to reset