Low Rank Approximation in the Presence of Outliers
We consider the problem of principal component analysis (PCA) in the presence of outliers. Given a matrix A (d × n) and parameters k, m, the goal is to remove a set of at most m columns of A (known as outliers), so as to minimize the rank-k approximation error of the remaining matrix. While much of the work on this problem has focused on recovery of the rank-k subspace under assumptions on the inliers and outliers, we focus on the approximation problem above. Our main result shows that sampling-based methods developed in the outlier-free case give non-trivial guarantees even in the presence of outliers. Using this insight, we develop a simple algorithm that has bi-criteria guarantees. Further, unlike similar formulations for clustering, we show that bi-criteria guarantees are unavoidable for the problem, under appropriate complexity assumptions.
READ FULL TEXT