Document Clustering Evaluation: Divergence from a Random Baseline

08/28/2012
by   Christopher M. de Vries, et al.
0

Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.

READ FULL TEXT
research
01/06/2010

Document Clustering with K-tree

This paper describes the approach taken to the XML Mining track at INEX ...
research
01/23/2017

The Impact of Random Models on Clustering Similarity

Clustering is a central approach for unsupervised learning. After cluste...
research
05/24/2018

An experimental comparison of label selection methods for hierarchical document clusters

The focus of this paper is on the evaluation of sixteen labeling methods...
research
01/06/2010

Random Indexing K-tree

Random Indexing (RI) K-tree is the combination of two algorithms for clu...
research
10/04/2021

Clustering with Respect to the Information Distance

We discuss the notion of a dense cluster with respect to the information...
research
05/29/2023

DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific Clustering Using Side Information

We present a novel approach, in which we learn to cluster data directly ...
research
09/19/2010

Pair-Wise Cluster Analysis

This paper studies the problem of learning clusters which are consistent...

Please sign up or login with your details

Forgot password? Click here to reset