Distributed Nearest Neighbor Classification

by   Jiexin Duan, et al.

Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of both the regret and instability, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our findings are supported by numerical studies using both simulated and real data sets.


page 1

page 2

page 3

page 4


Rates of Convergence for Large-scale Nearest Neighbor Classification

Nearest neighbor is a popular class of classification methods with many ...

A Latent Source Model for Nonparametric Time Series Classification

For classifying time series, a nearest-neighbor approach is widely used ...

Enhanced Nearest Neighbor Classification for Crowdsourcing

In machine learning, crowdsourcing is an economical way to label a large...

Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

When data is of an extraordinarily large size or physically stored in di...

Choice of neighbor order in nearest-neighbor classification

The kth-nearest neighbor rule is arguably the simplest and most intuitiv...

Visual Place Recognition with Probabilistic Vertex Voting

We propose a novel scoring concept for visual place recognition based on...

Neural Neighborhood Encoding for Classification

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter [Dasgu...