PM-LSH: a fast and accurate in-memory framework for high-dimensional approximate NN and closest pair search

by   Bolong Zheng, et al.

Nearest neighbor (NN) search is inherently computationally expensive in high-dimensional spaces due to the curse of dimensionality. As a well-known solution, locality-sensitive hashing (LSH) is able to answer c-approximate NN (c-ANN) queries in sublinear time with constant probability. Existing LSH methods focus mainly on building hash bucket-based indexing such that the candidate points can be retrieved quickly. However, existing coarse-grained structures fail to offer accurate distance estimation for candidate points, which translates into additional computational overhead when having to examine unnecessary points. This in turn reduces the performance of query processing. In contrast, we propose a fast and accurate in-memory LSH framework, called PM-LSH, that aims to compute c-ANN queries on large-scale, high-dimensional datasets. First, we adopt a simple yet effective PM-tree to index the data points. Second, we develop a tunable confidence interval to achieve accurate distance estimation and guarantee high result quality. Third, we propose an efficient algorithm on top of the PM-tree to improve the performance of computing c-ANN queries. In addition, we extend PM-LSH to support closest pair (CP) search in high-dimensional spaces. We again adopt the PM-tree to organize the points in a lowdimensional space, and we propose a branch and bound algorithm together with a radius pruning technique to improve the performance of computing c-approximate closest pair (c-ACP) queries. Extensive experiments with real-world data offer evidence that PM-LSH is capable of outperforming existing proposals with respect to both efficiency and accuracy for both NN and CP search.


page 1

page 2

page 3

page 4


DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing

Among many solutions to the high-dimensional approximate nearest neighbo...

Fast k-NN search

Efficient index structures for fast approximate nearest neighbor queries...

Experimental Analysis of Locality Sensitive Hashing Techniques for High-Dimensional Approximate Nearest Neighbor Searches

Finding nearest neighbors in high-dimensional spaces is a fundamental op...

Experimental Analysis of Machine Learning Techniques for Finding Search Radius in Locality Sensitive Hashing

Finding similar data in high-dimensional spaces is one of the important ...

REPOSE: Distributed Top-k Trajectory Similarity Search with Local Reference Point Tries

Trajectory similarity computation is a fundamental component in a variet...

ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

This paper describes ANN-Benchmarks, a tool for evaluating the performan...

Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries

Given a graph G, a source node s and a target node t, the personalized P...

Please sign up or login with your details

Forgot password? Click here to reset