Recovering the number of clusters in data sets with noise features using feature rescaling factors

In this paper we introduce three methods for re-scaling data sets aiming at improving the likelihood of clustering validity indexes to return the true number of spherical Gaussian clusters with additional noise features. Our method obtains feature re-scaling factors taking into account the structure of a given data set and the intuitive idea that different features may have different degrees of relevance at different clusters. We experiment with the Silhouette (using squared Euclidean, Manhattan, and the p^th power of the Minkowski distance), Dunn's, Calinski-Harabasz and Hartigan indexes on data sets with spherical Gaussian clusters with and without noise features. We conclude that our methods indeed increase the chances of estimating the true number of clusters in a data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2021

An empirical comparison and characterisation of nine popular clustering methods

Nine popular clustering methods are applied to 42 real data sets. The ai...
research
05/11/2023

Comparison of Clustering Algorithms for Statistical Features of Vibration Data Sets

Vibration-based condition monitoring systems are receiving increasing at...
research
11/11/2020

Clustering of Big Data with Mixed Features

Clustering large, mixed data is a central problem in data mining. Many a...
research
05/17/2022

Shape complexity in cluster analysis

In cluster analysis, a common first step is to scale the data aiming to ...
research
07/31/2020

Identifying meaningful clusters in malware data

Finding meaningful clusters in drive-by-download malware data is a parti...
research
12/01/2020

Improving cluster recovery with feature rescaling factors

The data preprocessing stage is crucial in clustering. Features may desc...
research
02/08/2021

Blue Noise Plots

We propose Blue Noise Plots, two-dimensional dot plots that depict data ...

Please sign up or login with your details

Forgot password? Click here to reset