Distributed Nearest Neighbor Classification

12/12/2018
by   Jiexin Duan, et al.
0

Nearest neighbor is a popular nonparametric method for classification and regression with many appealing properties. In the big data era, the sheer volume and spatial/temporal disparity of big data may prohibit centrally processing and storing the data. This has imposed considerable hurdle for nearest neighbor predictions since the entire training data must be memorized. One effective way to overcome this issue is the distributed learning framework. Through majority voting, the distributed nearest neighbor classifier achieves the same rate of convergence as its oracle version in terms of both the regret and instability, up to a multiplicative constant that depends solely on the data dimension. The multiplicative difference can be eliminated by replacing majority voting with the weighted voting scheme. In addition, we provide sharp theoretical upper bounds of the number of subsamples in order for the distributed nearest neighbor classifier to reach the optimal convergence rate. It is interesting to note that the weighted voting scheme allows a larger number of subsamples than the majority voting one. Our findings are supported by numerical studies using both simulated and real data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2019

Rates of Convergence for Large-scale Nearest Neighbor Classification

Nearest neighbor is a popular class of classification methods with many ...
research
02/14/2013

A Latent Source Model for Nonparametric Time Series Classification

For classifying time series, a nearest-neighbor approach is widely used ...
research
02/26/2022

Enhanced Nearest Neighbor Classification for Crowdsourcing

In machine learning, crowdsourcing is an economical way to label a large...
research
10/29/2008

Choice of neighbor order in nearest-neighbor classification

The kth-nearest neighbor rule is arguably the simplest and most intuitiv...
research
11/22/2017

Causal nearest neighbor rules for optimal treatment regimes

The estimation of optimal treatment regimes is of considerable interest ...
research
08/16/2023

Two Phases of Scaling Laws for Nearest Neighbor Classifiers

A scaling law refers to the observation that the test performance of a m...
research
10/11/2016

Visual Place Recognition with Probabilistic Vertex Voting

We propose a novel scoring concept for visual place recognition based on...

Please sign up or login with your details

Forgot password? Click here to reset