Improving cluster recovery with feature rescaling factors

The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization.

READ FULL TEXT

page 18

page 19

research
10/03/2018

Real-time Clustering Algorithm Based on Predefined Level-of-Similarity

This paper proposes a centroid-based clustering algorithm which is capab...
research
09/13/2018

Discovering Features in Sr_14Cu_24O_41 Neutron Single Crystal Diffraction Data by Cluster Analysis

To address the SMC'18 data challenge, "Discovering Features in Sr_14Cu_2...
research
12/24/2019

An Entropy-based Variable Feature Weighted Fuzzy k-Means Algorithm for High Dimensional Data

This paper presents a new fuzzy k-means algorithm for the clustering of ...
research
08/07/2020

Hierarchical Clusterings of Unweighted Graphs

We study the complexity of finding an optimal hierarchical clustering of...
research
11/03/2016

A-Ward_pe̱ṯa̱: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation

In this paper we make two novel contributions to hierarchical clustering...
research
02/22/2016

Recovering the number of clusters in data sets with noise features using feature rescaling factors

In this paper we introduce three methods for re-scaling data sets aiming...
research
03/31/2020

A Clustering Framework for Lexical Normalization of Roman Urdu

Roman Urdu is an informal form of the Urdu language written in Roman scr...

Please sign up or login with your details

Forgot password? Click here to reset