Adversarially robust clustering with optimality guarantees

06/16/2023
by   Soham Jana, et al.
0

We consider the problem of clustering data points coming from sub-Gaussian mixtures. Existing methods that provably achieve the optimal mislabeling error, such as the Lloyd algorithm, are usually vulnerable to outliers. In contrast, clustering methods seemingly robust to adversarial perturbations are not known to satisfy the optimal statistical guarantees. We propose a simple algorithm that obtains the optimal mislabeling rate even when we allow adversarial outliers to be present. Our algorithm achieves the optimal error rate in constant iterations when a weak initialization condition is satisfied. In the absence of outliers, in fixed dimensions, our theoretical guarantees are similar to that of the Lloyd algorithm. Extensive experiments on various simulated data sets are conducted to support the theoretical guarantees of our method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

Scalable Clustering: Large Scale Unsupervised Learning of Gaussian Mixture Models with Outliers

Clustering is a widely used technique with a long and rich history in a ...
research
04/05/2019

Robust Subspace Recovery with Adversarial Outliers

We study the problem of robust subspace recovery (RSR) in the presence o...
research
12/16/2019

A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers

We consider the problem of clustering datasets in the presence of arbitr...
research
05/24/2018

A Practical Algorithm for Distributed Clustering and Outlier Detection

We study the classic k-means/median clustering, which are fundamental pr...
research
06/10/2015

Fast Online Clustering with Randomized Skeleton Sets

We present a new fast online clustering algorithm that reliably recovers...
research
11/27/2017

One-Shot Coresets: The Case of k-Clustering

Scaling clustering algorithms to massive data sets is a challenging task...
research
09/01/2023

Consistency of Lloyd's Algorithm Under Perturbations

In the context of unsupervised learning, Lloyd's algorithm is one of the...

Please sign up or login with your details

Forgot password? Click here to reset