Fuzzy Jaccard Index: A robust comparison of ordered lists
We propose Fuzzy Jaccard Index (FUJI) – a scale-invariant score for assessment of the similarity between two ranked/ordered lists. FUJI improves upon the Jaccard index by incorporating a membership function which takes into account the particular ranks, thus producing both more stable and more accurate similarity estimates. We provide theoretical insights into the properties of the FUJI score as well as propose an efficient algorithm for computing it. We also present empirical evidence of its performance on different synthetic scenarios. Finally, we demonstrate its utility in a typical machine learning setting – comparing feature ranking lists relevant to a given machine learning task. In real-life, and in particular high-dimensional domains, where only a small percentage of the whole feature space might be relevant, a robust and confident feature ranking leads to interpretable findings as well as efficient computation and good predictive performance. In such cases, FUJI correctly distinguishes between existing feature ranking approaches, while being more robust and efficient than the benchmark similarity scores.
READ FULL TEXT