K-nearest Neighbor Search by Random Projection Forests

12/31/2018
by   Donghui Yan, et al.
0

K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests (rpForests), for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/25/2023

The Effect of Points Dispersion on the k-nn Search in Random Projection Forests

Partitioning trees are efficient data structures for k-nearest neighbor ...
research
08/28/2019

Similarity Kernel and Clustering via Random Projection Forests

Similarity plays a fundamental role in many areas, including data mining...
research
09/22/2017

Efficient Nearest-Neighbor Search for Dynamical Systems with Nonholonomic Constraints

Nearest-neighbor search dominates the asymptotic complexity of sampling-...
research
10/18/2019

Supervised Learning Approach to Approximate Nearest Neighbor Search

Approximate nearest neighbor search is a classic algorithmic problem whe...
research
04/14/2014

Random forests with random projections of the output space for high dimensional multi-label classification

We adapt the idea of random projections applied to the output space, so ...
research
06/25/2020

Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

Automated machine learning (AutoML) can produce complex model ensembles ...
research
05/09/2012

Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?

Recent theory work has found that a special type of spatial partition tr...

Please sign up or login with your details

Forgot password? Click here to reset