A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

10/11/2018
by   Hendrik Fichtenberger, et al.
0

In the k-nearest neighborhood model (k-NN), we are given a set of points P, and we shall answer queries q by returning the k nearest neighbors of q in P according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many k-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed k-NN is not explicit. We study property testing of k-NN graphs in theory and evaluate it empirically: given a point set P ⊂R^δ and a directed graph G=(P,E), is G a k-NN graph, i.e., every point p ∈ P has outgoing edges to its k nearest neighbors, or is it ϵ-far from being a k-NN graph? Here, ϵ-far means that one has to change more than an ϵ-fraction of the edges in order to make G a k-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the k-NN property, with complexity O(√(n) k^2 / ϵ^2) measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of Ω(√(n / ϵ k)). We evaluate our tester empirically on the k-NN models computed by various algorithms and show that it can be used to detect k-NN models with bad accuracy in significantly less time than the building time of the k-NN model.

READ FULL TEXT

page 12

page 13

research
08/01/2019

A True O(n n) Algorithm for the All-k-Nearest-Neighbors Problem

In this paper we examined an algorithm for the All-k-Nearest-Neighbor pr...
research
09/02/2022

Learning task-specific features for 3D pointcloud graph creation

Processing 3D pointclouds with Deep Learning methods is not an easy task...
research
01/08/2010

Boosting k-NN for categorization of natural scenes

The k-nearest neighbors (k-NN) classification rule has proven extremely ...
research
08/29/2022

Learned k-NN Distance Estimation

Big data mining is well known to be an important task for data science, ...
research
05/28/2015

An Analogy Based Method for Freight Forwarding Cost Estimation

The author explored estimation by analogy (EBA) as a means of estimating...
research
11/18/2019

Consistent recovery threshold of hidden nearest neighbor graphs

Motivated by applications such as discovering strong ties in social netw...
research
03/26/2018

Efficient space virtualisation for Hoshen--Kopelman algorithm

In this paper the efficient space virtualisation for Hoshen--Kopelman al...

Please sign up or login with your details

Forgot password? Click here to reset