Balancing Geometry and Density: Path Distances on High-Dimensional Data

12/17/2020
by   Anna Little, et al.
0

New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance density and geometry in the underlying data, we clarify their key parameters and discuss how they may be chosen in practice. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results require only that the underlying data is sampled from a low-dimensional manifold, and depend crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.

READ FULL TEXT
research
06/27/2012

Shortest path distance in random k-nearest neighbor graphs

Consider a weighted or unweighted k-nearest neighbor graph that has been...
research
02/14/2012

Semi-supervised Learning with Density Based Distances

We present a simple, yet effective, approach to Semi-Supervised Learning...
research
02/17/2022

Hamilton-Jacobi equations on graphs with applications to semi-supervised learning and data depth

Shortest path graph distances are widely used in data science and machin...
research
05/30/2019

Power Weighted Shortest Paths for Unsupervised Learning

We study the use of power weighted shortest path distance functions for ...
research
11/03/2020

Convergence of Graph Laplacian with kNN Self-tuned Kernels

Kernelized Gram matrix W constructed from data points {x_i}_i=1^N as W_i...
research
07/05/2019

Geodesic Learning via Unsupervised Decision Forests

Geodesic distance is the shortest path between two points in a Riemannia...
research
07/24/2013

Cluster Trees on Manifolds

In this paper we investigate the problem of estimating the cluster tree ...

Please sign up or login with your details

Forgot password? Click here to reset