Confirmation Sampling for Exact Nearest Neighbor Search

12/06/2018
by   Tobias Christiani, et al.
0

Locality-sensitive hashing (LSH), introduced by Indyk and Motwani in STOC '98, has been an extremely influential framework for nearest neighbor search in high-dimensional data sets. While theoretical work has focused on the approximate nearest neighbor problems, in practice LSH data structures with suitably chosen parameters are used to solve the exact nearest neighbor problem (with some error probability). Sublinear query time is often possible in practice even for exact nearest neighbor search, intuitively because the nearest neighbor tends to be significantly closer than other data points. However, theory offers little advice on how to choose LSH parameters outside of pre-specified worst-case settings. We introduce the technique of confirmation sampling for solving the exact nearest neighbor problem using LSH. First, we give a general reduction that transforms a sequence of data structures that each find the nearest neighbor with a small, unknown probability, into a data structure that returns the nearest neighbor with probability 1-δ, using as few queries as possible. Second, we present a new query algorithm for the LSH Forest data structure with L trees that is able to return the exact nearest neighbor of a query point within the same time bound as an LSH Forest of Ω(L) trees with internal parameters specifically tuned to the query and data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2018

Approximate Nearest Neighbor Search in High Dimensions

The nearest neighbor problem is defined as follows: Given a set P of n p...
research
12/15/2022

Exact fixed-radius nearest neighbor search with an application to clustering

Fixed-radius nearest-neighbor search is a common database operation that...
research
06/11/2019

Similarity Problems in High Dimensions

The main contribution of this dissertation is the introduction of new or...
research
09/22/2017

Efficient Nearest-Neighbor Search for Dynamical Systems with Nonholonomic Constraints

Nearest-neighbor search dominates the asymptotic complexity of sampling-...
research
10/18/2019

Supervised Learning Approach to Approximate Nearest Neighbor Search

Approximate nearest neighbor search is a classic algorithmic problem whe...
research
07/19/2011

Unsupervised K-Nearest Neighbor Regression

In many scientific disciplines structures in high-dimensional data have ...

Please sign up or login with your details

Forgot password? Click here to reset