On the I/O complexity of the k-nearest neighbor problem

by   Mayank Goswami, et al.
CUNY Law School
IT University of Copenhagen

We consider static, external memory indexes for exact and approximate versions of the k-nearest neighbor (k-NN) problem, and show new lower bounds under a standard indivisibility assumption: - Polynomial space indexing schemes for high-dimensional k-NN in Hamming space cannot take advantage of block transfers: Ω(k) block reads are needed to to answer a query. - For the ℓ_∞ metric the lower bound holds even if we allow c-appoximate nearest neighbors to be returned, for c ∈ (1, 3). - The restriction to c < 3 is necessary: For every metric there exists an indexing scheme in the indexability model of Hellerstein et al. using space O(kn), where n is the number of points, that can retrieve k 3-approximate nearest neighbors using k/B I/Os, which is optimal. - For specific metrics, data structures with better approximation factors are possible. For k-NN in Hamming space and every approximation factor c>1 there exists a polynomial space data structure that returns kc-approximate nearest neighbors in k/B I/Os. To show these lower bounds we develop two new techniques: First, to handle that approximation algorithms have more freedom in deciding which result set to return we develop a relaxed version of the λ-set workload technique of Hellerstein et al. This technique allows us to show lower bounds that hold in d≥ n dimensions. To extend the lower bounds down to d = O(k log(n/k)) dimensions, we develop a new deterministic dimension reduction technique that may be of independent interest.


page 1

page 2

page 3

page 4


Approximate nearest neighbors search without false negatives for l_2 for c>√(n)

In this paper, we report progress on answering the open problem presente...

Stronger 3SUM-Indexing Lower Bounds

The 3SUM-Indexing problem was introduced as a data structure version of ...

Approximate Nearest Neighbors in Limited Space

We consider the (1+ϵ)-approximate nearest neighbor search problem: given...

Provably Adversarially Robust Nearest Prototype Classifiers

Nearest prototype classifiers (NPCs) assign to each input point the labe...

Hardness of Approximate Nearest Neighbor Search

We prove conditional near-quadratic running time lower bounds for approx...

Learning Mahalanobis Metric Spaces via Geometric Approximation Algorithms

Learning Mahalanobis metric spaces is an important problem that has foun...

Lower bounds for text indexing with mismatches and differences

In this paper we study lower bounds for the fundamental problem of text ...

Please sign up or login with your details

Forgot password? Click here to reset