An Effective and Efficient Approach for Clusterability Evaluation

02/22/2016
by   Margareta Ackerman, et al.
0

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the structure of real datasets. In this paper, we propose a novel approach to clusterability evaluation that is both computationally efficient and successfully captures the structure of real data. Our method applies multimodality tests to the (one-dimensional) set of pairwise distances based on the original, potentially high-dimensional data. We present extensive analyses of our approach for both the Dip and Silverman multimodality tests on real data as well as 17,000 simulations, demonstrating the success of our approach as the first practical notion of clusterability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2018

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Clustering is an essential data mining tool that aims to discover inhere...
research
07/01/2022

Enhancing cluster analysis via topological manifold learning

We discuss topological aspects of cluster analysis and show that inferri...
research
10/21/2015

Multiple co-clustering based on nonparametric mixture models with heterogeneous marginal distributions

We propose a novel method for multiple clustering that assumes a co-clus...
research
12/12/2022

Tandem clustering with invariant coordinate selection

For high-dimensional data or data with noise variables, tandem clusterin...
research
06/16/2021

Clustering inference in multiple groups

Inference in clustering is paramount to uncovering inherent group struct...
research
06/16/2020

Tell Me Something I Don't Know: Randomization Strategies for Iterative Data Mining

There is a wide variety of data mining methods available, and it is gene...
research
12/22/2019

Pooled variable scaling for cluster analysis

We propose a new approach for scaling prior to cluster analysis based on...

Please sign up or login with your details

Forgot password? Click here to reset