Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

06/05/2019
by   Martin Aumüller, et al.
0

Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the r-near neighbor (r-NN) problem: given a radius r>0 and a set of points S, construct a data structure that, for any given query point q, returns a point p within distance at most r from q. In this paper, we study the r-NN problem in the light of fairness. We consider fairness in the sense of equal opportunity: all points that are within distance r from the query should have the same probability to be returned. Locality sensitive hashing (LSH), the most common approach to similarity search in high dimensions, does not provide such a fairness guarantee. To address this, we propose efficient data structures for r-NN where all points in S that are near q have the same probability to be selected and returned by the query. Specifically, we first propose a black-box approach that, given any LSH scheme, constructs a data structure for uniformly sampling points in the neighborhood of a query. Then, we develop a data structure for fair similarity search under inner product, which requires nearly-linear space and exploits locality sensitive filters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2021

Sampling a Near Neighbor in High Dimensions – Who is the Fairest of Them All?

Similarity search is a fundamental algorithmic primitive, widely used in...
research
06/06/2019

Near Neighbor: Who is the Fairest of Them All?

In this work we study a fair variant of the near neighbor problem. Namel...
research
07/19/2018

Optimal Las Vegas Approximate Near Neighbors in ℓ_p

We show that approximate near neighbor search in high dimensions can be ...
research
04/08/2019

Subsets and Supermajorities: Unifying Hashing-based Set Similarity Search

We consider the problem of designing Locality Sensitive Filters (LSF) fo...
research
11/13/2020

Kernel Density Estimation through Density Constrained Near Neighbor Search

In this paper we revisit the kernel density estimation problem: given a ...
research
08/30/2018

Hashing-Based-Estimators for Kernel Density in High Dimensions

Given a set of points P⊂R^d and a kernel k, the Kernel Density Estimate ...

Please sign up or login with your details

Forgot password? Click here to reset