Hardness of Bichromatic Closest Pair with Jaccard Similarity

07/04/2019
by   Rasmus Pagh, et al.
0

Consider collections A and B of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A×B that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity |a∩b|/|a∪b| for (a,b)∈A×B. We consider the approximate version of the problem where we are given thresholds j_1>j_2 and wish to return a pair from A×B that has Jaccard similarity higher than j_2 if there exists a pair in A×B with Jaccard similarity at least j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in Õ(n^2-δ) time if j_1> j_2^1-δ. In particular, for δ=Ω(1), the approximation ratio j_1/j_2=1/j_2^δ increases polynomially in 1/j_2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n^2-Ω(1)) time for j_1/j_2=1/j_2^o(1). Specifically, assuming OVC, we prove that for any δ>0 there exists an ε>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n^2-δ) for any choice of thresholds j_2<j_1<1-δ, that satisfy j_1< j_2^1-ε.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2011

Similarity Join Size Estimation using Locality Sensitive Hashing

Similarity joins are important operations with a broad range of applicat...
research
01/07/2020

The Langberg-Médard Multiple Unicast Conjecture: Stable 3-Pair Networks

The Langberg-Médard multiple unicast conjecture claims that for a strong...
research
07/06/2017

Hardness of learning noisy halfspaces using polynomial thresholds

We prove the hardness of weakly learning halfspaces in the presence of a...
research
11/29/2018

An Equivalence Class for Orthogonal Vectors

The Orthogonal Vectors problem (OV) asks: given n vectors in {0,1}^O( n)...
research
02/07/2018

On The Hardness of Approximate and Exact (Bichromatic) Maximum Inner Product

In this paper we study the (Bichromatic) Maximum Inner Product Problem (...
research
02/24/2015

Tensor SimRank for Heterogeneous Information Networks

We propose a generalization of SimRank similarity measure for heterogene...
research
07/25/2019

The Strong 3SUM-INDEXING Conjecture is False

In the 3SUM-Indexing problem the goal is to preprocess two lists of elem...

Please sign up or login with your details

Forgot password? Click here to reset