Subspace Clustering with Missing and Corrupted Data

07/08/2017
by   Zachary Charles, et al.
0

Subspace clustering is the process of identifying a union of subspaces model underlying a collection of samples and determining which sample belongs to which subspace. A popular approach, sparse subspace clustering (SSC), represents each sample as a weighted combination of the other samples, and then uses those learned weights to cluster the samples. SSC has been shown to be stable in settings where each sample is contaminated by a relatively small amount of noise. However, when a subset of entries in each sample is corrupted by significant noise or even unobserved, providing guarantees for subspace clustering remains an open problem. Instead of analyzing commonly used versions of SSC in the context of missing data, this paper describes a robust variant, mean absolute deviation sparse subspace clustering (MAD-SSC), and characterizes the conditions under which it provably correctly clusters all of the observed samples, even in the presence of noisy or missing data. MAD-SSC is efficiently solvable by linear programming. We show that MAD-SSC performs as predicted by the theoretical guarantees and that it performs comparably to a widely-used variant of SSC in the context of missing data.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset