Clustering is Easy When ....What?

10/19/2015
by   Shai Ben-David, et al.
0

It is well known that most of the common clustering objectives are NP-hard to optimize. In practice, however, clustering is being routinely carried out. One approach for providing theoretical understanding of this seeming discrepancy is to come up with notions of clusterability that distinguish realistically interesting input data from worst-case data sets. The hope is that there will be clustering algorithms that are provably efficient on such "clusterable" instances. This paper addresses the thesis that the computational hardness of clustering tasks goes away for inputs that one really cares about. In other words, that "Clustering is difficult only when it does not matter" (the CDNM thesis for short). I wish to present a a critical bird's eye overview of the results published on this issue so far and to call attention to the gap between available and desirable results on this issue. A longer, more detailed version of this note is available as arXiv:1507.05307. I discuss which requirements should be met in order to provide formal support to the the CDNM thesis and then examine existing results in view of these requirements and list some significant unsolved research challenges in that direction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/04/2020

Inapproximability for Local Correlation Clustering and Dissimilarity Hierarchical Clustering

We present hardness of approximation results for Correlation Clustering ...
research
07/24/2018

Shortest path queries, graph partitioning and covering problems in worst and beyond worst case settings

In this thesis, we design algorithms for several NP-hard problems in bot...
research
04/28/2018

Clustering Perturbation Resilient Instances

Euclidean k-means is a problem that is NP-hard in the worst-case but oft...
research
07/24/2018

A Note on Clustering Aggregation

We consider the clustering aggregation problem in which we are given a s...
research
10/27/2012

Discrete Energy Minimization, beyond Submodularity: Applications and Approximations

In this thesis I explore challenging discrete energy minimization proble...
research
01/22/2016

When is Clustering Perturbation Robust?

Clustering is a fundamental data mining tool that aims to divide data in...
research
09/12/2011

Modern hierarchical, agglomerative clustering algorithms

This paper presents algorithms for hierarchical, agglomerative clusterin...

Please sign up or login with your details

Forgot password? Click here to reset