A Critical Note on the Evaluation of Clustering Algorithms

08/10/2019
by   Li Zhong, et al.
0

Experimental evaluation is a major research methodology for investigating clustering algorithms. For this purpose, a number of benchmark datasets have been widely used in the literature and their quality plays an important role on the value of the research work. However, in most of the existing studies, little attention has been paid to the specific properties of the datasets and they are often regarded as black-box problems. In our work, with the help of advanced visualization and dimension reduction techniques, we show that there are potential issues with some of the popular benchmark datasets used to evaluate clustering algorithms that may seriously compromise the research quality and even may produce completely misleading results. We suggest that significant efforts need to be devoted to improving the current practice of experimental evaluation of clustering algorithms by having a principled analysis of each benchmark dataset of interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2018

A Quantitative Evaluation of Natural Language Question Interpretation for Question Answering Systems

Systematic benchmark evaluation plays an important role in the process o...
research
06/27/2019

A New Benchmark Dataset for Texture Image Analysis and Surface Defect Detection

Texture analysis plays an important role in many image processing applic...
research
01/09/2023

PatentsView-Evaluation: Evaluation Datasets and Tools to Advance Research on Inventor Name Disambiguation

We present PatentsView-Evaluation, a Python package that enables researc...
research
08/29/2018

Evaluating Fuzz Testing

Fuzz testing has enjoyed great success at discovering security critical ...
research
03/16/2018

Impacts of Dirty Data: and Experimental Evaluation

Data quality issues have attracted widespread attention due to the negat...
research
07/06/2020

Incorrect Data in the Widely Used Inside Airbnb Dataset

Several recently published papers in Decision Support Systems discussed ...
research
07/03/2023

A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based Matching Algorithms

Entity resolution (ER) is the process of identifying records that refer ...

Please sign up or login with your details

Forgot password? Click here to reset