Streaming and Distributed Algorithms for Robust Column Subset Selection

07/16/2021
by   Shuli Jiang, et al.
0

We give the first single-pass streaming algorithm for Column Subset Selection with respect to the entrywise ℓ_p-norm with 1 ≤ p < 2. We study the ℓ_p norm loss since it is often considered more robust to noise than the standard Frobenius norm. Given an input matrix A ∈ℝ^d × n (n ≫ d), our algorithm achieves a multiplicative k^1/p - 1/2poly(log nd)-approximation to the error with respect to the best possible column subset of size k. Furthermore, the space complexity of the streaming algorithm is optimal up to a logarithmic factor. Our streaming algorithm also extends naturally to a 1-round distributed protocol with nearly optimal communication cost. A key ingredient in our algorithms is a reduction to column subset selection in the ℓ_p,2-norm, which corresponds to the p-norm of the vector of Euclidean norms of each of the columns of A. This enables us to leverage strong coreset constructions for the Euclidean norm, which previously had not been applied in this context. We also give the first provable guarantees for greedy column subset selection in the ℓ_1, 2 norm, which can be used as an alternative, practical subroutine in our algorithms. Finally, we show that our algorithms give significant practical advantages on real-world data analysis tasks.

READ FULL TEXT
research
04/16/2020

Average Case Column Subset Selection for Entrywise ℓ_1-Norm Loss

We study the column subset selection problem with respect to the entrywi...
research
08/16/2019

Low-rank approximation in the Frobenius norm by column and row subset selection

A CUR approximation of a matrix A is a particular type of low-rank appro...
research
06/07/2023

Fair Column Subset Selection

We consider the problem of fair column subset selection. In particular, ...
research
12/01/2018

Universal Streaming of Subset Norms

Most known algorithms in the streaming model of computation aim to appro...
research
03/15/2019

Subset Selection for Matrices with Fixed Blocks

Subset selection for matrices is the task of extracting a column sub-mat...
research
05/09/2022

Robust Parameter Identifiability Analysis via Column Subset Selection

We advocate a numerically reliable and accurate approach for practical p...
research
12/23/2018

A determinantal point process for column subset selection

Dimensionality reduction is a first step of many machine learning pipeli...

Please sign up or login with your details

Forgot password? Click here to reset