Systematic Analysis of Cluster Similarity Indices: Towards Bias-free Cluster Validation
There are many cluster similarity indices used to evaluate clustering algorithms and choosing the best one for a particular task is usually an open problem. In this paper, we perform a thorough analysis of this problem: we develop a list of desirable properties (requirements) and theoretically verify which indices satisfy them. In particular, we investigate dozens of pair-counting indices and prove that none of them satisfies all the requirements. Based on our analysis, we propose using the arccosine of the correlation coefficient as a similarity measure and prove that it satisfies almost all requirements (except for one, which is still satisfied assymptotically). This new measure can be thought of as an angle between partitions.
READ FULL TEXT