Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

09/16/2022
by   Boris Landa, et al.
0

The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches for assessing similarities between data points, commonly used for manifold learning and clustering, as well as supervised and semi-supervised learning on graphs. In many practical situations, the data can be corrupted by noise that prohibits traditional affinity matrices from correctly assessing similarities, especially if the noise magnitudes vary considerably across the data, e.g., under heteroskedasticity or outliers. An alternative approach that provides a more stable behavior under noise is the doubly stochastic normalization of the Gaussian kernel. In this work, we investigate this normalization in a setting where points are sampled from an unknown density on a low-dimensional manifold embedded in high-dimensional space and corrupted by possibly strong, non-identically distributed, sub-Gaussian noise. We establish the pointwise concentration of the doubly stochastic affinity matrix and its scaling factors around certain population forms. We then utilize these results to develop several tools for robust inference. First, we derive a robust density estimator that can substantially outperform the standard kernel density estimator under high-dimensional noise. Second, we provide estimators for the pointwise noise magnitudes, the pointwise signal magnitudes, and the pairwise Euclidean distances between clean data points. Lastly, we derive robust graph Laplacian normalizations that approximate popular manifold Laplacians, including the Laplace Beltrami operator, showing that the local geometry of the manifold can be recovered under high-dimensional noise. We exemplify our results in simulations and on real single-cell RNA-sequencing data. In the latter, we show that our proposed normalizations are robust to technical variability associated with different cell types.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2020

Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to Heteroskedastic Noise

A fundamental step in many data-analysis techniques is the construction ...
research
06/22/2022

Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise

Bi-stochastic normalization of kernelized graph affinity matrix provides...
research
04/16/2022

Detection of Small Holes by the Scale-Invariant Robust Density-Aware Distance (RDAD) Filtration

A novel topological-data-analytical (TDA) method is proposed to distingu...
research
11/22/2021

How do kernel-based sensor fusion algorithms behave under high dimensional noise?

We study the behavior of two kernel based sensor fusion algorithms, nonp...
research
01/25/2021

Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation

This work studies the spectral convergence of graph Laplacian to the Lap...
research
06/22/2014

Convex Optimization Learning of Faithful Euclidean Distance Representations in Nonlinear Dimensionality Reduction

Classical multidimensional scaling only works well when the noisy distan...
research
06/24/2019

Improving Stochastic Neighbour Embedding fundamentally with a well-defined data-dependent kernel

We identify a fundamental issue in the popular Stochastic Neighbour Embe...

Please sign up or login with your details

Forgot password? Click here to reset