Rates of Convergence for Large-scale Nearest Neighbor Classification

09/03/2019
by   Xingye Qiao, et al.
0

Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2018

Distributed Nearest Neighbor Classification

Nearest neighbor is a popular nonparametric method for classification an...
research
05/26/2014

Stabilized Nearest Neighbor Classifier and Its Statistical Properties

The stability of statistical analysis is an important indicator for repr...
research
02/26/2022

Enhanced Nearest Neighbor Classification for Crowdsourcing

In machine learning, crowdsourcing is an economical way to label a large...
research
05/20/2021

Distributed Adaptive Nearest Neighbor Classifier: Algorithm and Theory

When data is of an extraordinarily large size or physically stored in di...
research
02/05/2022

One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification

Recently, Qiao, Duan, and Cheng (2019) proposed a distributed nearest-ne...
research
05/29/2019

An adaptive nearest neighbor rule for classification

We introduce a variant of the k-nearest neighbor classifier in which k i...
research
12/03/2020

Approximate kNN Classification for Biomedical Data

We are in the era where the Big Data analytics has changed the way of in...

Please sign up or login with your details

Forgot password? Click here to reset