A-Ward_pe̱ṯa̱: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation

In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Ward p algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Ward pe̱ṯa̱ , is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Ward pe̱ṯa̱ provides better cluster recovery than both Ward and Ward p.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2021

Cluster Analysis via Random Partition Distributions

Hierarchical and k-medoids clustering are deterministic clustering algor...
research
11/29/2017

HSC: A Novel Method for Clustering Hierarchies of Networked Data

Hierarchical clustering is one of the most powerful solutions to the pro...
research
06/07/2019

Learning Clustered Representation for Complex Free Energy Landscapes

In this paper we first analyzed the inductive bias underlying the data s...
research
04/27/2020

A Centroid Auto-Fused Hierarchical Fuzzy c-Means Clustering

Like k-means and Gaussian Mixture Model (GMM), fuzzy c-means (FCM) with ...
research
08/18/2020

EXCLUVIS: A MATLAB GUI Software for Comparative Study of Clustering and Visualization of Gene Expression Data

Clustering is a popular data mining technique that aims to partition an ...
research
03/09/2020

Probabilistic Partitive Partitioning (PPP)

Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heur...
research
12/01/2020

Improving cluster recovery with feature rescaling factors

The data preprocessing stage is crucial in clustering. Features may desc...

Please sign up or login with your details

Forgot password? Click here to reset