On the Problem of p_1^-1 in Locality-Sensitive Hashing

05/25/2020
by   Thomas Dybdahl Ahle, et al.
0

A Locality-Sensitive Hash (LSH) function is called (r,cr,p_1,p_2)-sensitive, if two data-points with a distance less than r collide with probability at least p_1 while data points with a distance greater than cr collide with probability at most p_2. These functions form the basis of the successful Indyk-Motwani algorithm (STOC 1998) for nearest neighbour problems. In particular one may build a c-approximate nearest neighbour data structure with query time Õ(n^ρ/p_1) where ρ=log1/p_1/log1/p_2∈(0,1). That is, sub-linear time, as long as p_1 is not too small. This is significant since most high dimensional nearest neighbour problems suffer from the curse of dimensionality, and can't be solved exact, faster than a brute force linear-time scan of the database. Unfortunately, the best LSH functions tend to have very low collision probabilities, p_1 and p_2. Including the best functions for Cosine and Jaccard Similarity. This means that the n^ρ/p_1 query time of LSH is often not sub-linear after all, even for approximate nearest neighbours! In this paper, we improve the general Indyk-Motwani algorithm to reduce the query time of LSH to Õ(n^ρ/p_1^1-ρ) (and the space usage correspondingly.) Since n^ρ p_1^ρ-1 < n ⇔ p_1 > n^-1, our algorithm always obtains sublinear query time, for any collision probabilities at least 1/n. For p_1 and p_2 small enough, our improvement over all previous methods can be up to a factor n in both query time and space. The improvement comes from a simple change to the Indyk-Motwani algorithm, which can easily be implemented in existing software packages.

READ FULL TEXT
research
03/01/2017

Fast k-Nearest Neighbour Search via Prioritized DCI

Most exact methods for k-nearest neighbour search suffer from the curse ...
research
04/15/2020

Locality Sensitive Hashing for Set-Queries, Motivated by Group Recommendations

Locality Sensitive Hashing (LSH) is an effective method to index a set o...
research
12/01/2015

Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing

Existing methods for retrieving k-nearest neighbours suffer from the cur...
research
07/16/2022

DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing

Among many solutions to the high-dimensional approximate nearest neighbo...
research
07/19/2018

Multi-Resolution Hashing for Fast Pairwise Summations

A basic computational primitive in the analysis of massive datasets is s...
research
07/16/2014

In Defense of MinHash Over SimHash

MinHash and SimHash are the two widely adopted Locality Sensitive Hashin...
research
12/22/2017

Lattice-based Locality Sensitive Hashing is Optimal

Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (ST...

Please sign up or login with your details

Forgot password? Click here to reset