# Chromatic k-Nearest Neighbor Queries

Let P be a set of n colored points. We develop efficient data structures that store P and can answer chromatic k-nearest neighbor (k-NN) queries. Such a query consists of a query point q and a number k, and asks for the color that appears most frequently among the k points in P closest to q. Answering such queries efficiently is the key to obtain fast k-NN classifiers. Our main aim is to obtain query times that are independent of k while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the k-nearest neighbors of a query point q, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of O(n^1 / 2log n) for points in ℝ^1, and with query times varying between O(n^2/3log^2/3 n) and O(n^5/6 polylog n), depending on the distance measure used, for points in ℝ^2. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least (1-ε)f^* times, where f^* is the frequency of the most frequent color, we obtain a query time of O(log n + loglog_1/1-ε n) in ℝ^1 and expected query times ranging between Õ(n^1/2ε^-3/2) and Õ(n^1/2ε^-5/2) in ℝ^2 using near-linear space (ignoring polylogarithmic factors).

READ FULL TEXT