Coreset Construction via Randomized Matrix Multiplication

05/29/2017
by   Jiasen Yang, et al.
0

Coresets are small sets of points that approximate the properties of a larger point-set. For example, given a compact set S⊆R^d, a coreset could be defined as a (weighted) subset of S that approximates the sum of squared distances from S to every linear subspace of R^d. As such, coresets can be used as a proxy to the full dataset and provide an important technique to speed up algorithms for solving problems including principal component analysis, latent semantic indexing, etc. In this paper, we provide a structural result that connects the construction of such coresets to approximating matrix products. This structural result implies a simple, randomized algorithm that constructs coresets whose sizes are independent of the number and dimensionality of the input points. The expected size of the resulting coresets yields an improvement over the state-of-the-art deterministic approach. Finally, we evaluate the proposed randomized algorithm on synthetic and real data, and demonstrate its effective performance relative to its deterministic counterpart.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2020

Approximation Algorithms for Sparse Principal Component Analysis

We present three provably accurate, polynomial time, approximation algor...
research
04/01/2018

Sparse Principal Component Analysis via Variable Projection

Sparse principal component analysis (SPCA) has emerged as a powerful tec...
research
07/12/2018

Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering

We develop and analyze a method to reduce the size of a very large set o...
research
09/09/2018

Strong Coresets for k-Median and Subspace Approximation: Goodbye Dimension

We obtain the first strong coresets for the k-median and subspace approx...
research
07/08/2015

Optimal approximate matrix product in terms of stable rank

We prove, using the subspace embedding guarantee in a black box way, tha...
research
05/17/2021

A deterministic Kaczmarz algorithm for solving linear systems

We propose a deterministic Kaczmarz method for solving linear systems A=...
research
04/10/2023

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

Large language models (LLMs) have shown their power in different areas. ...

Please sign up or login with your details

Forgot password? Click here to reset