A review of systematic selection of clustering algorithms and their evaluation

06/24/2021
by   Marc Wegmann, et al.
0

Data analysis plays an indispensable role for value creation in industry. Cluster analysis in this context is able to explore given datasets with little or no prior knowledge and to identify unknown patterns. As (big) data complexity increases in the dimensions volume, variety, and velocity, this becomes even more important. Many tools for cluster analysis have been developed from early on and the variety of different clustering algorithms is huge. As the selection of the right clustering procedure is crucial to the results of the data analysis, users are in need for support on their journey of extracting knowledge from raw data. Thus, the objective of this paper lies in the identification of a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem. Moreover, users are supported in selecting the right validation concepts to make sense of the clustering results. Based on a comprehensive literature review, this paper provides assessment criteria for clustering method evaluation and validation concept selection. The criteria are applied to several common algorithms and the selection process of an algorithm is supported by the introduction of pseudocode-based routines that consider the underlying data structure.

READ FULL TEXT

page 1

page 2

page 6

page 8

page 12

page 16

page 17

research
09/18/2017

A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry

The hospitality industry is one of the data-rich industries that receive...
research
03/01/2021

Validation of cluster analysis results on validation data: A systematic framework

Cluster analysis refers to a wide range of data analytic techniques for ...
research
04/30/2021

Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets

This article presents the data used to evaluate the performance of evolu...
research
04/20/2023

Salience-based stakeholder selection to maintain stakeholder coverage in solving the next release problem

Stakeholders quantification plays a basic role in selecting the appropri...
research
10/03/2022

Review of Clustering Methods for Functional Data

Functional data clustering is to identify heterogeneous morphological pa...
research
02/13/2021

HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis

Comprehensive benchmarking of clustering algorithms is rendered difficul...
research
05/06/2020

Integrating Prior Knowledge in Mixed Initiative Social Network Clustering

We propose a new paradigm—called PK-clustering—to help social scientists...

Please sign up or login with your details

Forgot password? Click here to reset