A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

by   Hendrik Fichtenberger, et al.

In the k-nearest neighborhood model (k-NN), we are given a set of points P, and we shall answer queries q by returning the k nearest neighbors of q in P according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many k-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed k-NN is not explicit. We study property testing of k-NN graphs in theory and evaluate it empirically: given a point set P ⊂R^δ and a directed graph G=(P,E), is G a k-NN graph, i.e., every point p ∈ P has outgoing edges to its k nearest neighbors, or is it ϵ-far from being a k-NN graph? Here, ϵ-far means that one has to change more than an ϵ-fraction of the edges in order to make G a k-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the k-NN property, with complexity O(√(n) k^2 / ϵ^2) measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of Ω(√(n / ϵ k)). We evaluate our tester empirically on the k-NN models computed by various algorithms and show that it can be used to detect k-NN models with bad accuracy in significantly less time than the building time of the k-NN model.


page 12

page 13


A True O(n n) Algorithm for the All-k-Nearest-Neighbors Problem

In this paper we examined an algorithm for the All-k-Nearest-Neighbor pr...

Learning task-specific features for 3D pointcloud graph creation

Processing 3D pointclouds with Deep Learning methods is not an easy task...

Boosting k-NN for categorization of natural scenes

The k-nearest neighbors (k-NN) classification rule has proven extremely ...

Learned k-NN Distance Estimation

Big data mining is well known to be an important task for data science, ...

An Analogy Based Method for Freight Forwarding Cost Estimation

The author explored estimation by analogy (EBA) as a means of estimating...

Consistent recovery threshold of hidden nearest neighbor graphs

Motivated by applications such as discovering strong ties in social netw...

Efficient space virtualisation for Hoshen--Kopelman algorithm

In this paper the efficient space virtualisation for Hoshen--Kopelman al...

Please sign up or login with your details

Forgot password? Click here to reset