Modern hierarchical, agglomerative clustering algorithms

09/12/2011
by   Daniel Müllner, et al.
0

This paper presents algorithms for hierarchical, agglomerative clustering which perform most efficiently in the general-purpose setup that is given in modern standard software. Requirements are: (1) the input data is given by pairwise dissimilarities between data points, but extensions to vector data are also discussed (2) the output is a "stepwise dendrogram", a data structure which is shared by all implementations in current standard software. We present algorithms (old and new) which perform clustering in this setting efficiently, both in an asymptotic worst-case analysis and from a practical point of view. The main contributions of this paper are: (1) We present a new algorithm which is suitable for any distance update scheme and performs significantly better than the existing algorithms. (2) We prove the correctness of two algorithms by Rohlf and Murtagh, which is necessary in each case for different reasons. (3) We give well-founded recommendations for the best current algorithms for the various agglomerative clustering schemes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2020

Fully-Dynamic All-Pairs Shortest Paths: Improved Worst-Case Time and Space Bounds

Given a directed weighted graph G=(V,E) undergoing vertex insertions and...
research
11/27/2011

Ward's Hierarchical Clustering Method: Clustering Criterion and Agglomerative Algorithm

The Ward error sum of squares hierarchical clustering method has been ve...
research
12/15/2019

Bisect and Conquer: Hierarchical Clustering via Max-Uncut Bisection

Hierarchical Clustering is an unsupervised data analysis method which ha...
research
02/23/2021

Maximizing Agreements for Ranking, Clustering and Hierarchical Clustering via MAX-CUT

In this paper, we study a number of well-known combinatorial optimizatio...
research
06/30/2017

Agglomerative Clustering of Growing Squares

We study an agglomerative clustering problem motivated by interactive gl...
research
02/21/2020

Inverted-File k-Means Clustering: Performance Analysis

This paper presents an inverted-file k-means clustering algorithm (IVF) ...
research
10/19/2015

Clustering is Easy When ....What?

It is well known that most of the common clustering objectives are NP-ha...

Please sign up or login with your details

Forgot password? Click here to reset