Clustering performance analysis using new correlation based cluster validity indices

by   Nathakhun Wiroonsri, et al.

There are various cluster validity measures used for evaluating clustering results. One of the main objective of using these measures is to seek the optimal unknown number of clusters. Some measures work well for clusters with different densities, sizes and shapes. Yet, one of the weakness that those validity measures share is that they sometimes provide only one clear optimal number of clusters. That number is actually unknown and there might be more than one potential sub-optimal options that a user may wish to choose based on different applications. We develop two new cluster validity indices based on a correlation between an actual distance between a pair of data points and a centroid distance of clusters that the two points locate in. Our proposed indices constantly yield several peaks at different numbers of clusters which overcome the weakness previously stated. Furthermore, the introduced correlation can also be used for evaluating the quality of a selected clustering result. Several experiments in different scenarios including the well-known iris data set and a real-world marketing application have been conducted in order to compare the proposed validity indices with several well-known ones.


page 1

page 2

page 3

page 4


A correlation-based fuzzy cluster validity index with secondary options detector

The optimal number of clusters is one of the main concerns when applying...

Are Cluster Validity Measures (In)valid?

Internal cluster validity measures (such as the Calinski-Harabasz, Dunn,...

A New Validity Index for Fuzzy-Possibilistic C-Means Clustering

In some complicated datasets, due to the presence of noisy data points a...

A fast and integrative algorithm for clustering performance evaluation in author name disambiguation

Author name disambiguation results are often evaluated by measures such ...

Medoid Silhouette clustering with automatic cluster number selection

The evaluation of clustering results is difficult, highly dependent on t...

Clustering validity based on the most similarity

One basic requirement of many studies is the necessity of classifying da...

Systematic Analysis of Cluster Similarity Indices: Towards Bias-free Cluster Validation

There are many cluster similarity indices used to evaluate clustering al...

Please sign up or login with your details

Forgot password? Click here to reset