DeepAI AI Chat
Log In Sign Up

Feature selection or extraction decision process for clustering using PCA and FRSD

This paper concerns the critical decision process of extracting or selecting the features before applying a clustering algorithm. It is not obvious to evaluate the importance of the features since the most popular methods to do it are usually made for a supervised learning technique process. A clustering algorithm is an unsupervised method. It means that there is no known output label to match the input data. This paper proposes a new method to choose the best dimensionality reduction method (selection or extraction) according to the data scientist's parameters, aiming to apply a clustering process at the end. It uses Feature Ranking Process Based on Silhouette Decomposition (FRSD) algorithm, a Principal Component Analysis (PCA) algorithm, and a K-Means algorithm along with its metric, the Silhouette Index (SI). This paper presents 5 use cases based on a smart city dataset. This research also aims to discuss the impacts, the advantages, and the disadvantages of each choice that can be made in this unsupervised learning process.

READ FULL TEXT

page 11

page 12

page 14

page 17

06/17/2022

DPDR: A novel machine learning method for the Decision Process for Dimensionality Reduction

This paper discusses the critical decision process of extracting or sele...
12/22/2020

Unsupervised Machine learning methods for city vitality index

This paper concerns the challenge to evaluate and predict a district vit...
11/15/2022

Solving clustering as ill-posed problem: experiments with K-Means algorithm

In this contribution, the clustering procedure based on K-Means algorith...
11/17/2012

Data Clustering via Principal Direction Gap Partitioning

We explore the geometrical interpretation of the PCA based clustering al...
08/23/2021

Cube Sampled K-Prototype Clustering for Featured Data

Clustering large amount of data is becoming increasingly important in th...
11/17/2022

Data Dimension Reduction makes ML Algorithms efficient

Data dimension reduction (DDR) is all about mapping data from high dimen...
04/28/2022

Representative period selection for power system planning using autoencoder-based dimensionality reduction

Power sector capacity expansion models (CEMs) that are used for studying...