Near Neighbor: Who is the Fairest of Them All?
In this work we study a fair variant of the near neighbor problem. Namely, given a set of n points P and a parameter r, the goal is to preprocess the points, such that given a query point q, any point in the r-neighborhood of the query, i.e., (q,r), have the same probability of being reported as the near neighbor. We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point in the r-neighborhood of a query q with almost uniform probability. The query time is proportional to O( dns(q.r) (n,c) ), and its space is O((n,c)), where (n,c) and (n,c) are the query time and space of an LSH algorithm for c-approximate near neighbor, and dns(q,r) is a function of the local density around q. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. Finally, we run experiments to show performance of our approach on real data.
READ FULL TEXT