RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming Data

02/18/2019
by   Benjamin Coleman, et al.
0

We demonstrate the first possibility of a sub-linear memory sketch for solving the approximate near-neighbor search problem. In particular, we develop an online sketching algorithm that can compress N vectors into a tiny sketch consisting of small arrays of counters whose size scales as O(N^b^2N), where b < 1 depending on the stability of the near-neighbor search. This sketch is sufficient to identify the top-v near-neighbors with high probability. To the best of our knowledge, this is the first near-neighbor search algorithm that breaks the linear memory (O(N)) barrier. We achieve sub-linear memory by combining advances in locality sensitive hashing (LSH) based estimation, especially the recently-published ACE algorithm, with compressed sensing and heavy hitter techniques. We provide strong theoretical guarantees; in particular, our analysis sheds new light on the memory-accuracy tradeoff in the near-neighbor search setting and the role of sparsity in compressed sensing, which could be of independent interest. We rigorously evaluate our framework, which we call RACE (Repeated ACE) data structures on a friend recommendation task on the Google plus graph with more than 100,000 high-dimensional vectors. RACE provides compression that is orders of magnitude better than the random projection based alternative, which is unsurprising given the theoretical advantage. We anticipate that RACE will enable both new theoretical perspectives on near-neighbor search and new methodologies for applications like high-speed data mining, internet-of-things (IoT), and beyond.

READ FULL TEXT
research
12/04/2019

Sub-linear RACE Sketches for Approximate Kernel Density Estimation on Streaming Data

Kernel density estimation is a simple and effective method that lies at ...
research
06/22/2021

Practical Near Neighbor Search via Group Testing

We present a new algorithm for the approximate near neighbor problem tha...
research
10/20/2014

Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS)

Recently it was shown that the problem of Maximum Inner Product Search (...
research
04/07/2021

Graph Reordering for Cache-Efficient Near Neighbor Search

Graph search is one of the most successful algorithmic trends in near ne...
research
07/19/2018

Optimal Las Vegas Approximate Near Neighbors in ℓ_p

We show that approximate near neighbor search in high dimensions can be ...
research
10/27/2022

DESSERT: An Efficient Algorithm for Vector Set Search with Vector Set Queries

We study the problem of vector set search with vector set queries. This ...

Please sign up or login with your details

Forgot password? Click here to reset