Clustering Perturbation Resilient Instances

04/28/2018
by   Amit Deshpande, et al.
0

Euclidean k-means is a problem that is NP-hard in the worst-case but often solved efficiently by simple heuristics in practice. This has lead researchers to study various properties of real-world data sets that allow stable optimal clusters and provably efficient, simple algorithms to recover them. We consider stable instances of Euclidean k-means that have provable polynomial time algorithms for recovering optimal cluster. These results often have assumptions about the data that either do not hold in practice or the algorithms are not practical or stable enough with running time quadratic or more in the number of points. We propose simple algorithms with running time linear in the number of points and the dimension that provably recover the optimal clustering on α-metric perturbation resilient instances of Euclidean k-means. Our results hold even when the instances satisfy α-center proximity, a weaker property that is implied by α-metric perturbation resilience. In the case when the data contains a certain class of outliers (and only the inliers satisfy α-center proximity property), we give an algorithm that outputs a small list of clusterings, one of which is guaranteed to recover the optimal clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2017

Clustering Stable Instances of Euclidean k-means

The Euclidean k-means problem is arguably the most widely-studied cluste...
research
06/11/2018

Perturbation Resilient Clustering for k-Center and Related Problems via LP Relaxations

We consider clustering in the perturbation resilience model that has bee...
research
09/30/2020

Clustering under Perturbation Stability in Near-Linear Time

We consider the problem of center-based clustering in low-dimensional Eu...
research
10/19/2015

Clustering is Easy When ....What?

It is well known that most of the common clustering objectives are NP-ha...
research
11/30/2022

Improved Smoothed Analysis of 2-Opt for the Euclidean TSP

The 2-opt heuristic is a simple local search heuristic for the Travellin...
research
02/11/2023

Partial k-means to avoid outliers, mathematical programming formulations, complexity results

A well-known bottleneck of Min-Sum-of-Square Clustering (MSSC, the celeb...
research
10/12/2018

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Clustering non-Euclidean data is difficult, and one of the most used alg...

Please sign up or login with your details

Forgot password? Click here to reset