A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers

12/16/2019
by   Prateek R. Srivastava, et al.
0

We consider the problem of clustering datasets in the presence of arbitrary outliers. Traditional clustering algorithms such as k-means and spectral clustering are known to perform poorly for datasets contaminated with even a small number of outliers. In this paper, we develop a provably robust spectral clustering algorithm that applies a simple rounding scheme to denoise a Gaussian kernel matrix built from the data points, and uses vanilla spectral clustering to recover the cluster labels of data points. We analyze the performance of our algorithm under the assumption that the "good" inlier data points are generated from a mixture of sub-gaussians, while the "noisy" outlier points can come from any arbitrary probability distribution. For this general class of models, we show that the asymptotic mis-classification error decays at an exponential rate in the signal-to-noise ratio, provided the number of outliers are a small fraction of the inlier points. Surprisingly, the derived error bound matches with the best-known bound for semidefinite programs (SDPs) under the same setting without outliers. We conduct extensive experiments on a variety of simulated and real-world datasets to demonstrate that our algorithm is less sensitive to outliers compared to other state-of-the-art algorithms proposed in the literature, in terms of both accuracy as well as scalability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2010

Spectral clustering based on local linear approximations

In the context of clustering, we assume a generative model where each cl...
research
09/12/2023

Spectral clustering algorithm for the allometric extension model

The spectral clustering algorithm is often used as a binary clustering m...
research
02/23/2017

Spectral Clustering using PCKID - A Probabilistic Cluster Kernel for Incomplete Data

In this paper, we propose PCKID, a novel, robust, kernel function for sp...
research
08/16/2021

Robust Trimmed k-means

Clustering is a fundamental tool in unsupervised learning, used to group...
research
10/28/2015

Fast Landmark Subspace Clustering

Kernel methods obtain superb performance in terms of accuracy for variou...
research
06/16/2023

Adversarially robust clustering with optimality guarantees

We consider the problem of clustering data points coming from sub-Gaussi...
research
09/07/2019

Concentration of kernel matrices with application to kernel spectral clustering

We study the concentration of random kernel matrices around their mean. ...

Please sign up or login with your details

Forgot password? Click here to reset