Subspace approximation with outliers

06/30/2020
by   Amit Deshpande, et al.
39

The subspace approximation problem with outliers, for given n points in d dimensions x_1,…, x_n∈ R^d, an integer 1 ≤ k ≤ d, and an outlier parameter 0 ≤α≤ 1, is to find a k-dimensional linear subspace of R^d that minimizes the sum of squared distances to its nearest (1-α)n points. More generally, the ℓ_p subspace approximation problem with outliers minimizes the sum of p-th powers of distances instead of the sum of squared distances. Even the case of robust PCA is non-trivial, and previous work requires additional assumptions on the input. Any multiplicative approximation algorithm for the subspace approximation problem with outliers must solve the robust subspace recovery problem, a special case in which the (1-α)n inliers in the optimal solution are promised to lie exactly on a k-dimensional linear subspace. However, robust subspace recovery is Small Set Expansion (SSE)-hard. We show how to extend dimension reduction techniques and bi-criteria approximations based on sampling to the problem of subspace approximation with outliers. To get around the SSE-hardness of robust subspace recovery, we assume that the squared distance error of the optimal k-dimensional subspace summed over the optimal (1-α)n inliers is at least δ times its squared-error summed over all n points, for some 0 < δ≤ 1 - α. With this assumption, we give an efficient algorithm to find a subset of poly(k/ϵ) log(1/δ) loglog(1/δ) points whose span contains a k-dimensional subspace that gives a multiplicative (1+ϵ)-approximation to the optimal solution. The running time of our algorithm is linear in n and d. Interestingly, our results hold even when the fraction of outliers α is large, as long as the obvious condition 0 < δ≤ 1 - α is satisfied.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2021

On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling

We consider the problem of subset selection for ℓ_p subspace approximati...
research
04/27/2018

Low Rank Approximation in the Presence of Outliers

We consider the problem of principal component analysis (PCA) in the pre...
research
04/26/2022

One-pass additive-error subset selection for ℓ_p subspace approximation

We consider the problem of subset selection for ℓ_p subspace approximati...
research
10/08/2020

Deep Learning Meets Projective Clustering

A common approach for compressing NLP networks is to encode the embeddin...
research
06/22/2006

Outlier Robust ICP for Minimizing Fractional RMSD

We describe a variation of the iterative closest point (ICP) algorithm f...
research
12/18/2010

lp-Recovery of the Most Significant Subspace among Multiple Subspaces with Outliers

We assume data sampled from a mixture of d-dimensional linear subspaces ...
research
11/20/2022

Higher-order interaction model from geometric measurements

We introduce a higher simplicial generalization of the linear consensus ...

Please sign up or login with your details

Forgot password? Click here to reset