Optimal Fast Johnson-Lindenstrauss Embeddings for Large Data Sets

12/05/2017
by   Stefan Bamberger, et al.
0

We introduce a new fast construction of a Johnson-Lindenstrauss matrix based on the composition of the following two embeddings: A fast construction by the second author joint with Ward [arXiv:1009.0744] maps points into a space of lower, but not optimal dimension. Then a subsequent transformation by a dense matrix with independent entries reaches an optimal embedding dimension. As we show in this note, the computational cost of applying this transform simultaneously to all points in a large data set comes close to the complexity of just reading the data under only very mild restrictions on the size of the data set. Along the way, our construction also yields the least restricted Johnson-Lindenstrauss Transform of order optimal embedding dimension known to date that allows for a fast query step, that is, a fast application to an arbitrary point that is not part of the given data set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2018

Too many secants: a hierarchical approach to secant-based dimensionality reduction on large data sets

A fundamental question in many data analysis settings is the problem of ...
research
04/04/2022

The Fast Johnson-Lindenstrauss Transform is Even Faster

The seminal Fast Johnson-Lindenstrauss (Fast JL) transform by Ailon and ...
research
09/24/2022

Fractal dimension, approximation and data sets

The purpose of this paper is to study the fractal phenomena in large dat...
research
03/14/2022

Permutation Invariant Representations with Applications to Graph Deep Learning

This paper presents primarily two Euclidean embeddings of the quotient s...
research
09/11/2019

Faster Johnson-Lindenstrauss Transforms via Kronecker Products

The Kronecker product is an important matrix operation with a wide range...
research
08/29/2023

Tuning the perplexity for and computing sampling-based t-SNE embeddings

Widely used pipelines for the analysis of high-dimensional data utilize ...
research
06/22/2022

Diversity Subsampling: Custom Subsamples from Large Data Sets

Subsampling from a large data set is useful in many supervised learning ...

Please sign up or login with your details

Forgot password? Click here to reset