The Impact of Isolation Kernel on Agglomerative Hierarchical Clustering Algorithms

10/12/2020
by   Xin Han, et al.
0

Agglomerative hierarchical clustering (AHC) is one of the popular clustering approaches. Existing AHC methods, which are based on a distance measure, have one key issue: it has difficulty in identifying adjacent clusters with varied densities, regardless of the cluster extraction methods applied on the resultant dendrogram. In this paper, we identify the root cause of this issue and show that the use of a data-dependent kernel (instead of distance or existing kernel) provides an effective means to address it. We analyse the condition under which existing AHC methods fail to extract clusters effectively; and the reason why the data-dependent kernel is an effective remedy. This leads to a new approach to kernerlise existing hierarchical clustering algorithms such as existing traditional AHC algorithms, HDBSCAN, GDL and PHA. In each of these algorithms, our empirical evaluation shows that a recently introduced Isolation Kernel produces a higher quality or purer dendrogram than distance, Gaussian Kernel and adaptive Gaussian Kernel.

READ FULL TEXT
research
09/29/2021

Breaking the curse of dimensionality with Isolation Kernel

The curse of dimensionality has been studied in different aspects. Howev...
research
06/30/2019

Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

A recent proposal of data dependent similarity called Isolation Kernel/S...
research
11/27/2019

K-MACE and Kernel K-MACE Clustering

Determining the correct number of clusters (CNC) is an important task in...
research
11/29/2012

Overlapping clustering based on kernel similarity metric

Producing overlapping schemes is a major issue in clustering. Recent pro...
research
06/24/2019

Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel

We identify a fundamental issue in the popular Stochastic Neighbour Embe...
research
07/02/2019

Isolation Kernel: The X Factor in Efficient and Effective Large Scale Online Kernel Learning

Large scale online kernel learning aims to build an efficient and scalab...
research
06/06/2016

On Robustness of Kernel Clustering

Clustering is one of the most important unsupervised problems in machine...

Please sign up or login with your details

Forgot password? Click here to reset