Algorithmic Gaussianization through Sketching: Converting Data into Sub-gaussian Random Designs

06/21/2022
by   Michał Dereziński, et al.
0

Algorithmic Gaussianization is a phenomenon that can arise when using randomized sketching or sampling methods to produce smaller representations of large datasets: For certain tasks, these sketched representations have been observed to exhibit many robust performance characteristics that are known to occur when a data sample comes from a sub-gaussian random design, which is a powerful statistical model of data distributions. However, this phenomenon has only been studied for specific tasks and metrics, or by relying on computationally expensive methods. We address this by providing an algorithmic framework for gaussianizing data distributions via averaging, proving that it is possible to efficiently construct data sketches that are nearly indistinguishable (in terms of total variation distance) from sub-gaussian random designs. In particular, relying on a recently introduced sketching technique called Leverage Score Sparsified (LESS) embeddings, we show that one can construct an n× d sketch of an N× d matrix A, where n≪ N, that is nearly indistinguishable from a sub-gaussian design, in time O(nnz(A)log N + nd^2), where nnz(A) is the number of non-zero entries in A. As a consequence, strong statistical guarantees and precise asymptotics available for the estimators produced from sub-gaussian designs (e.g., for least squares and Lasso regression, covariance estimation, low-rank approximation, etc.) can be straightforwardly adapted to our sketching framework. We illustrate this with a new approximation guarantee for sketched least squares, among other examples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2020

Sparse sketches with small inversion bias

For a tall n× d matrix A and a random m× n sketching matrix S, the sketc...
research
08/10/2023

Randomized low-rank approximations beyond Gaussian random matrices

This paper expands the analysis of randomized low-rank approximation bey...
research
05/07/2020

Determinantal Point Processes in Randomized Numerical Linear Algebra

Randomized Numerical Linear Algebra (RandNLA) uses randomness to develop...
research
06/16/2022

Universality of regularized regression estimators in high dimensions

The Convex Gaussian Min-Max Theorem (CGMT) has emerged as a prominent th...
research
06/10/2019

Low Rank Approximation Directed by Leverage Scores and Computed at Sub-linear Cost

Low rank approximation (LRA) of a matrix is a major subject of matrix an...
research
07/20/2020

On Learned Sketches for Randomized Numerical Linear Algebra

We study "learning-based" sketching approaches for diverse tasks in nume...
research
05/21/2018

Restricted eigenvalue property for corrupted Gaussian designs

Motivated by the construction of robust estimators using the convex rela...

Please sign up or login with your details

Forgot password? Click here to reset