Norm-Range Partition: A Univiseral Catalyst for LSH based Maximum Inner Product Search (MIPS)

10/22/2018
by   Xiao Yan, et al.
0

Recently, locality sensitive hashing (LSH) was shown to be effective for MIPS and several algorithms including L_2-ALSH, Sign-ALSH and Simple-LSH have been proposed. In this paper, we introduce the norm-range partition technique, which partitions the original dataset into sub-datasets containing items with similar 2-norms and builds hash index independently for each sub-dataset. We prove that norm-range partition reduces the query processing complexity for all existing LSH based MIPS algorithms under mild conditions. The key to performance improvement is that norm-range partition allows to use smaller normalization factor most sub-datasets. For efficient query processing, we also formulate a unified framework to rank the buckets from the hash indexes of different sub-datasets. Experiments on real datasets show that norm-range partition significantly reduces the number of probed for LSH based MIPS algorithms when achieving the same recall.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2018

Norm-Ranging LSH for Maximum Inner Product Search

Neyshabur and Srebro proposed Simple-LSH, which is the state-of-the-art ...
research
07/16/2022

DB-LSH: Locality-Sensitive Hashing with Query-based Dynamic Bucketing

Among many solutions to the high-dimensional approximate nearest neighbo...
research
06/25/2019

Pyramid: A General Framework for Distributed Similarity Search

Similarity search is a core component in various applications such as im...
research
09/11/2016

Sharing Hash Codes for Multiple Purposes

Locality sensitive hashing (LSH) is a powerful tool for sublinear-time a...
research
07/19/2018

Multi-Resolution Hashing for Fast Pairwise Summations

A basic computational primitive in the analysis of massive datasets is s...
research
10/31/2016

Numerical Facet Range Partition: Evaluation Metric and Methods

Faceted navigation is a very useful component in today's search engines....
research
06/30/2023

Hashing-Based Distributed Clustering for Massive High-Dimensional Data

Clustering analysis is of substantial significance for data mining. The ...

Please sign up or login with your details

Forgot password? Click here to reset