DartMinHash: Fast Sketching for Weighted Sets

05/23/2020
by   Tobias Christiani, et al.
0

Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set x ∈ℝ_≥ 0^d and computes k independent minhashes in expected time O(k log k + ‖ x ‖_0log( ‖ x ‖_1 + 1/‖ x ‖_1)), improving upon the state-of-the-art BagMinHash algorithm (KDD '18) and representing the fastest weighted minhash algorithm for sparse data. Our experiments show running times that scale better with k and ‖ x ‖_0 compared to ICWS (ICDM '10) and BagMinhash, obtaining 10x speedups in common use cases. Our approach also gives rise to a technique for computing fully independent locality-sensitive hash values for (L, K)-parameterized approximate near neighbor search under weighted Jaccard similarity in optimal expected time O(LK + ‖ x ‖_0), improving on prior work even in the case of unweighted sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Using Locality-sensitive Hashing for Rendezvous Search

The multichannel rendezvous problem is a fundamental problem for neighbo...
research
06/22/2019

Algorithms for Similarity Search and Pseudorandomness

We study the problem of approximate near neighbor (ANN) search and show ...
research
11/12/2018

A Review for Weighted MinHash Algorithms

Data similarity (or distance) computation is a fundamental research topi...
research
02/12/2018

BagMinHash - Minwise Hashing Algorithm for Weighted Sets

Minwise hashing has become a standard tool to calculate signatures which...
research
11/02/2019

ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity

The probability Jaccard similarity was recently proposed as a natural ge...
research
09/07/2021

C-MinHash: Rigorously Reducing K Permutations to Two

Minwise hashing (MinHash) is an important and practical algorithm for ge...
research
06/02/2018

Fast Locality Sensitive Hashing for Beam Search on GPU

We present a GPU-based Locality Sensitive Hashing (LSH) algorithm to spe...

Please sign up or login with your details

Forgot password? Click here to reset