Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

11/13/2017
by   George C. Linderman, et al.
0

If we pick n random points uniformly in [0,1]^d and connect each point to its k-nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in [0,1]^d it suffices to connect every point to c_d,1n points chosen randomly among its c_d,2n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with ∼ n n instead of ∼ n n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the k-nearest neighbors, one can often pick k' ≪ k random points out of the k-nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

READ FULL TEXT
research
11/25/2022

Doubly robust nearest neighbors in factor models

In this technical note, we introduce an improved variant of nearest neig...
research
12/01/2019

Active Search for Nearest Neighbors

In pattern recognition or machine learning, it is a very fundamental tas...
research
03/02/2021

On the Connectivity and Giant Component Size of Random K-out Graphs Under Randomly Deleted Nodes

Random K-out graphs, denoted ℍ(n;K), are generated by each of the n node...
research
04/05/2017

Comparison Based Nearest Neighbor Search

We consider machine learning in a comparison-based setting where we are ...
research
12/21/2020

A Note on Graph-Based Nearest Neighbor Search

Nearest neighbor search has found numerous applications in machine learn...
research
05/15/2020

Efficient Distributed Algorithms for the K-Nearest Neighbors Problem

The K-nearest neighbors is a basic problem in machine learning with nume...
research
08/31/2011

Anisotropic k-Nearest Neighbor Search Using Covariance Quadtree

We present a variant of the hyper-quadtree that divides a multidimension...

Please sign up or login with your details

Forgot password? Click here to reset