Benchmarking Minimax Linkage

06/07/2019
by   Xiao Hui Tai, et al.
0

Minimax linkage was first introduced by Ao et al. [3] in 2004, as an alternative to standard linkage methods used in hierarchical clustering. Minimax linkage relies on distances to a prototype for each cluster; this prototype can be thought of as a representative object in the cluster, hence improving the interpretability of clustering results. Bien and Tibshirani analyzed properties of this method in 2011 [2], popularizing the method within the statistics community. Additionally, they performed comparisons of minimax linkage to standard linkage methods, making use of five data sets and two different evaluation metrics (distance to prototype and misclassification rate). In an effort to expand upon their work and evaluate minimax linkage more comprehensively, our benchmark study focuses on thorough method evaluation via multiple performance metrics on several well-described data sets. We also make all code and data publicly available through an R package, for full reproducibility. Similarly to [2], we find that minimax linkage often produces the smallest maximum minimax radius of all linkage methods, meaning that minimax linkage produces clusters where objects in a cluster are tightly clustered around their prototype. This is true across a range of values for the total number of clusters (k). However, this is not always the case, and special attention should be paid to the case when k is the true known value. For true k, minimax linkage does not always perform the best in terms of all the evaluation metrics studied, including maximum minimax radius. This paper was motivated by the IFCS Cluster Benchmarking Task Force's call for clustering benchmark studies and the white paper [5], which put forth guidelines and principles for comprehensive benchmarking in clustering. Our work is designed to be a neutral benchmark study of minimax linkage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2021

An empirical comparison and characterisation of nine popular clustering methods

Nine popular clustering methods are applied to 42 real data sets. The ai...
research
02/18/2020

Hierarchical Correlation Clustering and Tree Preserving Embedding

We propose a hierarchical correlation clustering method that extends the...
research
06/17/2016

Generating Object Cluster Hierarchies for Benchmarking

The field of Machine Learning and the topic of clustering within it is s...
research
07/11/2019

Analysis of Ward's Method

We study Ward's method for the hierarchical k-means problem. This popula...
research
07/06/2021

SOCluster- Towards Intent-based Clustering of Stack Overflow Questions using Graph-Based Approach

Stack Overflow (SO) platform has a huge dataset of questions and answers...
research
04/27/2019

Nonparametric feature extraction based on Minimax distance

We investigate the use of Minimax distances to extract in a nonparametri...
research
03/27/2013

An Evaluation of Two Alternatives to Minimax

In the field of Artificial Intelligence, traditional approaches to choos...

Please sign up or login with your details

Forgot password? Click here to reset