Feature selection or extraction decision process for clustering using PCA and FRSD

This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are usually made for a supervised learning technique process. A clustering algorithm is an unsupervised method. It means that there is no known output label to match the input data. This paper proposes a new method to choose the best dimensionality reduction method (selection or extraction) according to the data scientist's parameters, aiming to apply a clustering process at the end. It uses Feature Ranking Process Based on Silhouette Decomposition (FRSD) algorithm, a Principal Component Analysis (PCA) algorithm, and a K-Means algorithm along with its metric, the Silhouette Index (SI). This paper presents 5 use cases based on a smart city dataset. This research also aims to discuss the impacts, the advantages, and the disadvantages of each choice that can be made in this unsupervised learning process.

READ FULL TEXT

page 11

page 12

page 14

page 17

research
06/17/2022

DPDR: A novel machine learning method for the Decision Process for Dimensionality Reduction

This paper discusses the critical decision process of extracting or sele...
research
12/22/2020

Unsupervised Machine learning methods for city vitality index

This paper concerns the challenge to evaluate and predict a district vit...
research
11/17/2022

Data Dimension Reduction makes ML Algorithms efficient

Data dimension reduction (DDR) is all about mapping data from high dimen...
research
11/17/2012

Data Clustering via Principal Direction Gap Partitioning

We explore the geometrical interpretation of the PCA based clustering al...
research
11/15/2022

Solving clustering as ill-posed problem: experiments with K-Means algorithm

In this contribution, the clustering procedure based on K-Means algorith...
research
08/23/2021

Cube Sampled K-Prototype Clustering for Featured Data

Clustering large amount of data is becoming increasingly important in th...
research
08/07/2023

Deep Feature Learning for Wireless Spectrum Data

In recent years, the traditional feature engineering process for trainin...

Please sign up or login with your details

Forgot password? Click here to reset