DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing

07/16/2022
by   Yao Tian, et al.
0

Among many solutions to the high-dimensional approximate nearest neighbor (ANN) search problem, locality sensitive hashing (LSH) is known for its sub-linear query time and robust theoretical guarantee on query accuracy. Traditional LSH methods can generate a small number of candidates quickly from hash tables but suffer from large index sizes and hash boundary problems. Recent studies to address these issues often incur extra overhead to identify eligible candidates or remove false positives, making query time no longer sub-linear. To address this dilemma, in this paper we propose a novel LSH scheme called DB-LSH which supports efficient ANN search for large high-dimensional datasets. It organizes the projected spaces with multi-dimensional indexes rather than using fixed-width hash buckets. Our approach can significantly reduce the space cost as by avoiding the need to maintain many hash tables for different bucket sizes. During the query phase of DB-LSH, a small number of high-quality candidates can be generated efficiently by dynamically constructing query-based hypercubic buckets with the required widths through index-based window queries. For a dataset of n d-dimensional points with approximation ratio c, our rigorous theoretical analysis shows that DB-LSH achieves a smaller query cost O(n^ρ^* dlog n), where ρ^* is bounded by 1/c^α while the bound is 1/c in the existing work. An extensive range of experiments on real-world data demonstrates the superiority of DB-LSH over state-of-the-art methods on both efficiency and accuracy.

READ FULL TEXT
research
11/24/2020

Efficient Approximate Nearest Neighbor Search for Multiple Weighted l_p≤2 Distance Functions

Nearest neighbor search is fundamental to a wide range of applications. ...
research
07/06/2021

PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

Nearest neighbor (NN) search is inherently computationally expensive in ...
research
10/22/2018

Norm-Range Partition: A Univiseral Catalyst for LSH based Maximum Inner Product Search (MIPS)

Recently, locality sensitive hashing (LSH) was shown to be effective for...
research
09/20/2018

Local Density Estimation in High Dimensions

An important question that arises in the study of high dimensional vecto...
research
05/25/2020

On the Problem of p_1^-1 in Locality-Sensitive Hashing

A Locality-Sensitive Hash (LSH) function is called (r,cr,p_1,p_2)-sensit...
research
03/10/2021

MP-RW-LSH: An Efficient Multi-Probe LSH Solution to ANNS in L_1 Distance

Approximate Nearest Neighbor Search (ANNS) is a fundamental algorithmic ...
research
04/11/2020

Locality-Sensitive Hashing Scheme based on Longest Circular Co-Substring

Locality-Sensitive Hashing (LSH) is one of the most popular methods for ...

Please sign up or login with your details

Forgot password? Click here to reset