Subsets and Supermajorities: Unifying Hashing-based Set Similarity Search

04/08/2019
by   Thomas Dybdahl Ahle, et al.
0

We consider the problem of designing Locality Sensitive Filters (LSF) for set overlaps, also known as maximum inner product search on binary data. We give a simple data structure that generalizes and outperforms previous algorithms such as MinHash [J. Discrete Algorithms 1998], SimHash [STOC 2002], Spherical LSF [SODA 2017] and Chosen Path [STOC 2017]; and we show matching lower bounds using hypercontractive inequalities for a wide range of parameters and space/time trade-offs. This answers the main open question in Christiani and Pagh [STOC 2017] on unifying the landscape of Locality Sensitive (non-data-dependent) set similarity search.

READ FULL TEXT
research
06/22/2019

Algorithms for Similarity Search and Pseudorandomness

We study the problem of approximate near neighbor (ANN) search and show ...
research
06/05/2019

Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

Similarity search is a fundamental algorithmic primitive, widely used in...
research
07/21/2015

Clustering is Efficient for Approximate Maximum Inner Product Search

Efficient Maximum Inner Product Search (MIPS) is an important task that ...
research
07/19/2018

Optimal Las Vegas Approximate Near Neighbors in ℓ_p

We show that approximate near neighbor search in high dimensions can be ...
research
05/18/2021

Sublinear Least-Squares Value Iteration via Locality Sensitive Hashing

We present the first provable Least-Squares Value Iteration (LSVI) algor...
research
06/08/2018

A neural network catalyzer for multi-dimensional similarity search

This paper aims at learning a function mapping input vectors to an outpu...
research
07/25/2018

Robust Set Reconciliation via Locality Sensitive Hashing

We consider variations of set reconciliation problems where two parties,...

Please sign up or login with your details

Forgot password? Click here to reset