Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

05/15/2020
by   Reza Fathi, et al.
0

The K-nearest neighbors is a basic problem in machine learning with numerous applications. In this problem, given a (training) set of n data points with labels and a query point p, we want to assign a label to p based on the labels of the K-nearest points to the query. We study this problem in the k-machine model,[%s] a model for distributed large-scale data. In this model, we assume that the n points are distributed (in a balanced fashion) among the k machines and the goal is to quickly compute answer given a query point to a machine. Our main result is a simple randomized algorithm in the k-machine model that runs in O(log K) communication rounds with high probability success (regardless of the number of machines k and the number of points n). The message complexity of the algorithm is small taking only O(klog K) messages. Our bounds are essentially the best possible for comparison-based algorithms.[%s] due to the existence of a lower bound of Ω(log n) communication rounds for finding the median of 2n elements distributed evenly among two processors by Rodeh <cit.>. We also implemented our algorithm and show that it performs well compared to an algorithm (used in practice) that sends K nearest points from each machine to a single machine which then computes the answer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2018

Scalable Secure Computation of Statistical Functions with Applications to k-Nearest Neighbors

Given a set S of n d-dimensional points, the k-nearest neighbors (KNN) i...
research
01/09/2018

Tight Bounds on the Round Complexity of the Distributed Maximum Coverage Problem

We study the maximum k-set coverage problem in the following distributed...
research
06/30/2018

Approximate Nearest Neighbors in Limited Space

We consider the (1+ϵ)-approximate nearest neighbor search problem: given...
research
08/06/2020

Singularly Optimal Randomized Leader Election

This paper concerns designing distributed algorithms that are singularly...
research
01/16/2023

A Distributed Palette Sparsification Theorem

Is fully decentralized graph streaming possible? We consider this questi...
research
07/14/2022

A Query-Optimal Algorithm for Finding Counterfactuals

We design an algorithm for finding counterfactuals with strong theoretic...
research
11/13/2017

Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

If we pick n random points uniformly in [0,1]^d and connect each point t...

Please sign up or login with your details

Forgot password? Click here to reset