Accuracy Evaluation of Overlapping and Multi-resolution Clustering Algorithms on Large Datasets

02/01/2019
by   Artem Lutov, et al.
0

Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few metrics for the accuracy measurement of overlapping and multi-resolution clustering algorithms on large datasets. In this paper, we first discuss existing metrics, how they satisfy a set of formal constraints, and how they can be applied to specific cases. Then, we propose several optimizations and extensions of these metrics. More specifically, we introduce a new indexing technique to reduce both the runtime and the memory complexity of the Mean F1 score evaluation. Our technique can be applied on large datasets and it is faster on a single CPU than state-of-the-art implementations running on high-performance servers. In addition, we propose several extensions of the discussed metrics to improve their effectiveness and satisfaction to formal constraints without affecting their efficiency. All the metrics discussed in this paper are implemented in C++ and are available for free as open-source packages that can be used either as stand-alone tools or as part of a benchmarking system to compare various clustering algorithms.

READ FULL TEXT
research
02/01/2019

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA Architectures

There is a great diversity of clustering and community detection algorit...
research
07/06/2018

Scalable Formal Concept Analysis algorithm for large datasets using Spark

In the process of knowledge discovery and representation in large datase...
research
09/19/2019

DAOC: Stable Clustering of Large Networks

Clustering is a crucial component of many data mining systems involving ...
research
11/25/2015

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools ...
research
12/22/2022

Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

t-SNE remains one of the most popular embedding techniques for visualizi...
research
11/16/2015

Fast clustering for scalable statistical analysis on structured images

The use of brain images as markers for diseases or behavioral difference...

Please sign up or login with your details

Forgot password? Click here to reset