Sign-Full Random Projections

04/26/2018
by   Ping Li, et al.
0

The method of 1-bit ("sign-sign") random projections has been a popular tool for efficient search and machine learning on large datasets. Given two D-dim data vectors u, v∈R^D, one can generate x = ∑_i=1^D u_i r_i, and y = ∑_i=1^D v_i r_i, where r_i∼ N(0,1) iid. The "collision probability" is Pr(sgn(x)=sgn(y)) = 1-cos^-1ρ/π, where ρ = ρ(u,v) is the cosine similarity. We develop "sign-full" random projections by estimating ρ from (e.g.,) the expectation E(sgn(x)y)=√(2/π)ρ, which can be further substantially improved by normalizing y. For nonnegative data, we recommend an interesting estimator based on E(y_- 1_x≥ 0 + y_+ 1_x<0) and its normalized version. The recommended estimator almost matches the accuracy of the (computationally expensive) maximum likelihood estimator. At high similarity (ρ→1), the asymptotic variance of recommended estimator is only 4/3π≈ 0.4 of the estimator for sign-sign projections. At small k and high similarity, the improvement would be even much more substantial.

READ FULL TEXT
research
04/27/2015

Sign Stable Random Projections for Large-Scale Learning

We study the use of "sign α-stable random projections" (where 0<α≤ 2) fo...
research
02/07/2023

OPORP: One Permutation + One Random Projection

Consider two D-dimensional data vectors (e.g., embeddings): u, v. In man...
research
09/02/2021

Exchangeability and the Cramér-Wold theorem

We address the problem of testing exchangeability and sign-invariant exc...
research
07/30/2020

Covariance estimation with nonnegative partial correlations

We study the problem of high-dimensional covariance estimation under the...
research
02/28/2023

A modified probabilistic amplitude shaping scheme to use sign-bit-like shaping with a BICM

On the one hand, sign-bit shaping is a popular shaping scheme where the ...
research
06/13/2023

Practice with Graph-based ANN Algorithms on Sparse Data: Chi-square Two-tower model, HNSW, Sign Cauchy Projections

Sparse data are common. The traditional “handcrafted” features are often...
research
02/21/2016

2-Bit Random Projections, NonLinear Estimators, and Approximate Near Neighbor Search

The method of random projections has become a standard tool for machine ...

Please sign up or login with your details

Forgot password? Click here to reset