On Subspace Approximation and Subset Selection in Fewer Passes by MCMC Sampling

03/20/2021
by   Amit Deshpande, et al.
0

We consider the problem of subset selection for ℓ_p subspace approximation, i.e., given n points in d dimensions, we need to pick a small, representative subset of the given points such that its span gives (1+ϵ) approximation to the best k-dimensional subspace that minimizes the sum of p-th powers of distances of all the points to this subspace. Sampling-based subset selection techniques require adaptive sampling iterations with multiple passes over the data. Matrix sketching techniques give a single-pass (1+ϵ) approximation for ℓ_p subspace approximation but require additional passes for subset selection. In this work, we propose an MCMC algorithm to reduce the number of passes required by previous subset selection algorithms based on adaptive sampling. For p=2, our algorithm gives subset selection of nearly optimal size in only 2 passes, whereas the number of passes required in previous work depend on k. Our algorithm picks a subset of size poly(k/ϵ) that gives (1+ϵ) approximation to the optimal subspace. The running time of the algorithm is nd + d poly(k/ϵ). We extend our results to the case when outliers are present in the datasets, and suggest a two pass algorithm for the same. Our ideas also extend to give a reduction in the number of passes required by adaptive sampling algorithms for ℓ_p subspace approximation and subset selection, for p ≥ 2.

READ FULL TEXT

page 1

page 3

page 11

page 13

research
04/26/2022

One-pass additive-error subset selection for ℓ_p subspace approximation

We consider the problem of subset selection for ℓ_p subspace approximati...
research
06/30/2020

Subspace approximation with outliers

The subspace approximation problem with outliers, for given n points in ...
research
04/23/2020

Non-Adaptive Adaptive Sampling on Turnstile Streams

Adaptive sampling is a useful algorithmic tool for data summarization pr...
research
12/31/2020

Exploiting Transitivity for Top-k Selection with Score-Based Dueling Bandits

We consider the problem of top-k subset selection in Dueling Bandit prob...
research
04/18/2023

New Subset Selection Algorithms for Low Rank Approximation: Offline and Online

Subset selection for the rank k approximation of an n× d matrix A offers...
research
09/09/2018

Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension

We obtain the first strong coresets for the k-median and subspace approx...
research
12/31/2021

Fast Graph Subset Selection Based on G-optimal Design

Graph sampling theory extends the traditional sampling theory to graphs ...

Please sign up or login with your details

Forgot password? Click here to reset