When is Clustering Perturbation Robust?

01/22/2016
by   Margareta Ackerman, et al.
0

Clustering is a fundamental data mining tool that aims to divide data into groups of similar items. Generally, intuition about clustering reflects the ideal case -- exact data sets endowed with flawless dissimilarity between individual instances. In practice however, these cases are in the minority, and clustering applications are typically characterized by noisy data sets with approximate pairwise dissimilarities. As such, the efficacy of clustering methods in practical applications necessitates robustness to perturbations. In this paper, we perform a formal analysis of perturbation robustness, revealing that the extent to which algorithms can exhibit this desirable characteristic is inherently limited, and identifying the types of structures that allow popular clustering paradigms to discover meaningful clusters in spite of faulty data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

A Survey of Some Density Based Clustering Techniques

Density Based Clustering are a type of Clustering methods using in data ...
research
02/06/2021

An empirical comparison and characterisation of nine popular clustering methods

Nine popular clustering methods are applied to 42 real data sets. The ai...
research
07/16/2018

Novel Feature-Based Clustering of Micro-Panel Data (CluMP)

Micro-panel data are collected and analysed in many research and industr...
research
10/24/2019

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

Speaker diarization based on bottom-up clustering of speech segments by ...
research
01/13/2023

Understanding Concept Identification as Consistent Data Clustering Across Multiple Feature Spaces

Identifying meaningful concepts in large data sets can provide valuable ...
research
11/08/2019

Subspace Clustering with Active Learning

Subspace clustering is a growing field of unsupervised learning that has...
research
10/19/2015

Clustering is Easy When ....What?

It is well known that most of the common clustering objectives are NP-ha...

Please sign up or login with your details

Forgot password? Click here to reset