Robust Subspace Clustering via Thresholding

07/18/2013
by   Reinhard Heckel, et al.
0

The problem of clustering noisy and incompletely observed high-dimensional data points into a union of low-dimensional subspaces and a set of outliers is considered. The number of subspaces, their dimensions, and their orientations are assumed unknown. We propose a simple low-complexity subspace clustering algorithm, which applies spectral clustering to an adjacency matrix obtained by thresholding the correlations between data points. In other words, the adjacency matrix is constructed from the nearest neighbors of each data point in spherical distance. A statistical performance analysis shows that the algorithm exhibits robustness to additive noise and succeeds even when the subspaces intersect. Specifically, our results reveal an explicit tradeoff between the affinity of the subspaces and the tolerable noise level. We furthermore prove that the algorithm succeeds even when the data points are incompletely observed with the number of missing entries allowed to be (up to a log-factor) linear in the ambient dimension. We also propose a simple scheme that provably detects outliers, and we present numerical results on real and synthetic data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2013

Subspace Clustering via Thresholding and Spectral Clustering

We consider the problem of clustering a set of high-dimensional data poi...
research
05/15/2013

Noisy Subspace Clustering via Thresholding

We consider the problem of clustering noisy high-dimensional data points...
research
03/13/2014

Neighborhood Selection for Thresholding-based Subspace Clustering

Subspace clustering refers to the problem of clustering high-dimensional...
research
12/19/2011

A geometric analysis of subspace clustering with outliers

This paper considers the problem of clustering a collection of unlabeled...
research
10/15/2016

Unsupervised clustering under the Union of Polyhedral Cones (UOPC) model

In this paper, we consider clustering data that is assumed to come from ...
research
06/20/2015

Filtrated Algebraic Subspace Clustering

Subspace clustering is the problem of clustering data that lie close to ...
research
08/16/2021

Provable Data Clustering via Innovation Search

This paper studies the subspace clustering problem in which data points ...

Please sign up or login with your details

Forgot password? Click here to reset