Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science
If we pick n random points uniformly in [0,1]^d and connect each point to its k-nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in [0,1]^d it suffices to connect every point to c_d,1n points chosen randomly among its c_d,2n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with ∼ n n instead of ∼ n n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the k-nearest neighbors, one can often pick k' ≪ k random points out of the k-nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.
READ FULL TEXT