Tight Sensitivity Bounds For Smaller Coresets

07/02/2019
by   Alaa Maalouf, et al.
0

An ε-coreset for Least-Mean-Squares (LMS) of a matrix A∈R^n× d is a small weighted subset of its rows that approximates the sum of squared distances from its rows to every affine k-dimensional subspace of R^d, up to a factor of 1±ε. Such coresets are useful for hyper-parameter tuning and solving many least-mean-squares problems such as low-rank approximation (k-SVD), k-PCA, Lassso/Ridge/Linear regression and many more. Coresets are also useful for handling streaming, dynamic and distributed big data in parallel. With high probability, non-uniform sampling based on upper bounds on what is known as importance or sensitivity of each row in A yields a coreset. The size of the (sampled) coreset is then near-linear in the total sum of these sensitivity bounds. We provide algorithms that compute provably tight bounds for the sensitivity of each input row. It is based on two ingredients: (i) iterative algorithm that computes the exact sensitivity of each point up to arbitrary small precision for (non-affine) k-subspaces, and (ii) a general reduction of independent interest from computing sensitivity for the family of affine k-subspaces in R^d to (non-affine) (k+1)- subspaces in R^d+1. Experimental results on real-world datasets, including the English Wikipedia documents-term matrix, show that our bounds provide significantly smaller and data-dependent coresets also in practice. Full open source is also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Coresets for Near-Convex Functions

Coreset is usually a small weighted subset of n input points in R^d, tha...
research
06/11/2019

Fast and Accurate Least-Mean-Squares Solvers

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regressi...
research
10/08/2020

Deep Learning Meets Projective Clustering

A common approach for compressing NLP networks is to encode the embeddin...
research
02/15/2020

Sparse Coresets for SVD on Infinite Streams

In streaming Singular Value Decomposition (SVD), d-dimensional rows of a...
research
05/20/2019

Uniform bounds for invariant subspace perturbations

For a fixed matrix A and perturbation E we develop purely deterministic ...
research
07/04/2016

A Residual Bootstrap for High-Dimensional Regression with Near Low-Rank Designs

We study the residual bootstrap (RB) method in the context of high-dimen...
research
11/02/2020

Coresets for Regressions with Panel Data

This paper introduces the problem of coresets for regression problems to...

Please sign up or login with your details

Forgot password? Click here to reset