Asymmetric Minwise Hashing

11/14/2014
by   Anshumali Shrivastava, et al.
0

Minwise hashing (Minhash) is a widely popular indexing scheme in practice. Minhash is designed for estimating set resemblance and is known to be suboptimal in many applications where the desired measure is set overlap (i.e., inner product between binary vectors) or set containment. Minhash has inherent bias towards smaller sets, which adversely affects its performance in applications where such a penalization is not desirable. In this paper, we propose asymmetric minwise hashing (MH-ALSH), to provide a solution to this problem. The new scheme utilizes asymmetric transformations to cancel the bias of traditional minhash towards smaller sets, making the final "collision probability" monotonic in the inner product. Our theoretical comparisons show that for the task of retrieving with binary inner products asymmetric minhash is provably better than traditional minhash and other recently proposed hashing algorithms for general inner products. Thus, we obtain an algorithmic improvement over existing approaches in the literature. Experimental evaluations on four publicly available high-dimensional datasets validate our claims and the proposed scheme outperforms, often significantly, other hashing algorithms on the task of near neighbor retrieval with set containment. Our proposal is simple and easy to implement in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2014

Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)

We present the first provably sublinear time algorithm for approximate M...
research
10/20/2014

Improved Asymmetric Locality Sensitive Hashing (ALSH) for Maximum Inner Product Search (MIPS)

Recently it was shown that the problem of Maximum Inner Product Search (...
research
01/14/2023

Weighted Minwise Hashing Beats Linear Sketching for Inner Product Estimation

We present a new approach for computing compact sketches that can be use...
research
11/23/2022

SAH: Shifting-aware Asymmetric Hashing for Reverse k-Maximum Inner Product Search

This paper investigates a new yet challenging problem called Reverse k-M...
research
11/01/2022

Asymmetric Hashing for Fast Ranking via Neural Network Measures

Fast item ranking is an important task in recommender systems. In previo...
research
06/06/2011

Hashing Algorithms for Large-Scale Learning

In this paper, we first demonstrate that b-bit minwise hashing, whose es...
research
12/06/2016

Revisiting Winner Take All (WTA) Hashing for Sparse Datasets

WTA (Winner Take All) hashing has been successfully applied in many larg...

Please sign up or login with your details

Forgot password? Click here to reset