Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

02/05/2020
by   Serhat Emre Akhanli, et al.
0

A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2022

Clustering of football players based on performance data and aggregated clustering validity indexes

We analyse football (soccer) player performance data with mixed type var...
research
02/16/2013

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying da...
research
04/26/2023

Automated calibration of consensus weighted distance-based clustering approaches using sharp

In consensus clustering, a clustering algorithm is used in combination w...
research
05/02/2019

Selection of the Number of Clusters in Functional Data Analysis

Identifying the number K of clusters in a dataset is one of the most dif...
research
12/18/2019

s-DRN: Stabilized Developmental Resonance Network

Online incremental clustering of sequentially incoming data without prio...
research
08/30/2019

Optimal Legislative County Clustering in North Carolina

North Carolina's constitution requires that state legislative districts ...
research
06/10/2020

Using an expert deviation carrying the knowledge of climate data in usual clustering algorithms

In order to help physicists to expand their knowledge of the climate in ...

Please sign up or login with your details

Forgot password? Click here to reset