A Robust Spectral Clustering Algorithm for Sub-Gaussian Mixture Models with Outliers

12/16/2019 ∙ by Prateek R. Srivastava, et al. ∙ 0

We consider the problem of clustering datasets in the presence of arbitrary outliers. Traditional clustering algorithms such as k-means and spectral clustering are known to perform poorly for datasets contaminated with even a small number of outliers. In this paper, we develop a provably robust spectral clustering algorithm that applies a simple rounding scheme to denoise a Gaussian kernel matrix built from the data points, and uses vanilla spectral clustering to recover the cluster labels of data points. We analyze the performance of our algorithm under the assumption that the "good" inlier data points are generated from a mixture of sub-gaussians, while the "noisy" outlier points can come from any arbitrary probability distribution. For this general class of models, we show that the asymptotic mis-classification error decays at an exponential rate in the signal-to-noise ratio, provided the number of outliers are a small fraction of the inlier points. Surprisingly, the derived error bound matches with the best-known bound for semidefinite programs (SDPs) under the same setting without outliers. We conduct extensive experiments on a variety of simulated and real-world datasets to demonstrate that our algorithm is less sensitive to outliers compared to other state-of-the-art algorithms proposed in the literature, in terms of both accuracy as well as scalability.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.