Statistical Properties of the Single Linkage Hierarchical Clustering Estimator

11/24/2015
by   Dekang Zhu, et al.
0

Distance-based hierarchical clustering (HC) methods are widely used in unsupervised data analysis but few authors take account of uncertainty in the distance data. We incorporate a statistical model of the uncertainty through corruption or noise in the pairwise distances and investigate the problem of estimating the HC as unknown parameters from measurements. Specifically, we focus on single linkage hierarchical clustering (SLHC) and study its geometry. We prove that under fairly reasonable conditions on the probability distribution governing measurements, SLHC is equivalent to maximum partial profile likelihood estimation (MPPLE) with some of the information contained in the data ignored. At the same time, we show that direct evaluation of SLHC on maximum likelihood estimation (MLE) of pairwise distances yields a consistent estimator. Consequently, a full MLE is expected to perform better than SLHC in getting the correct HC results for the ground truth metric.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/25/2015

Maximum Likelihood Estimation for Single Linkage Hierarchical Clustering

We derive a statistical model for estimation of a dendrogram from single...
research
07/24/2018

Composite likelihood estimation for a Gaussian process under fixed domain asymptotics

We study composite likelihood estimation of the covariance parameters wi...
research
06/08/2018

Tutorial: Maximum likelihood estimation in the context of an optical measurement

The method of maximum likelihood estimation (MLE) is a widely used stati...
research
12/09/2021

Spatial clustering of extreme annual precipitation in Uruguay

The main objective of this work is to study the existence of spatial pat...
research
12/07/2022

Designing Feature Vector Representations: A case study from Chemistry

We present a case study investigating feature descriptors in the context...
research
10/19/2018

Bayesian Distance Clustering

Model-based clustering is widely-used in a variety of application areas....
research
10/17/2019

Consistency of the Buckley-Osthus model and the hierarchical preferential attachment model

This paper is concerned with statistical estimation of two preferential ...

Please sign up or login with your details

Forgot password? Click here to reset