Towards Automatic Clustering Analysis using Traces of Information Gain: The InfoGuide Method

01/23/2020
by   Paulo Rocha, et al.
0

Clustering analysis has become a ubiquitous information retrieval tool in a wide range of domains, but a more automatic framework is still lacking. Though internal metrics are the key players towards a successful retrieval of clusters, their effectiveness on real-world datasets remains not fully understood, mainly because of their unrealistic assumptions underlying datasets. We hypothesized that capturing traces of information gain between increasingly complex clustering retrievals— InfoGuide—enables an automatic clustering analysis with improved clustering retrievals. We validated the InfoGuide hypothesis by capturing the traces of information gain using the Kolmogorov-Smirnov statistic and comparing the clusters retrieved by InfoGuide against those retrieved by other commonly used internal metrics in artificially-generated, benchmarks, and real-world datasets. Our results suggested that InfoGuide can enable a more automatic clustering analysis and may be more suitable for retrieving clusters in real-world datasets displaying nontrivial statistical properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/21/2016

AMOS: An Automated Model Order Selection Algorithm for Spectral Graph Clustering

One of the longstanding problems in spectral graph clustering (SGC) is t...
research
04/24/2020

Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis

The goal of co-clustering is to simultaneously identify a clustering of ...
research
09/06/2011

An Automatic Clustering Technique for Optimal Clusters

This paper proposes a simple, automatic and efficient clustering algorit...
research
06/13/2023

PaVa: a novel Path-based Valley-seeking clustering algorithm

Clustering methods are being applied to a wider range of scenarios invol...
research
03/12/2018

Clustering with Simultaneous Local and Global View of Data: A message passing based approach

A good clustering algorithm should not only be able to discover clusters...
research
04/19/2021

Benchmarking the Benchmark – Analysis of Synthetic NIDS Datasets

Network Intrusion Detection Systems (NIDSs) are an increasingly importan...
research
10/11/2018

FeatureLego: Volume Exploration Using Exhaustive Clustering of Super-Voxels

We present a volume exploration framework, FeatureLego, that uses a nove...

Please sign up or login with your details

Forgot password? Click here to reset