Systematic Analysis of Cluster Similarity Indices: Towards Bias-free Cluster Validation

11/12/2019
by   Martijn Gösgens, et al.
0

There are many cluster similarity indices used to evaluate clustering algorithms and choosing the best one for a particular task is usually an open problem. In this paper, we perform a thorough analysis of this problem: we develop a list of desirable properties (requirements) and theoretically verify which indices satisfy them. In particular, we investigate dozens of pair-counting indices and prove that none of them satisfies all the requirements. Based on our analysis, we propose using the arccosine of the correlation coefficient as a similarity measure and prove that it satisfies almost all requirements (except for one, which is still satisfied assymptotically). This new measure can be thought of as an angle between partitions.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset