Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to Heteroskedastic Noise

05/31/2020
by   Boris Landa, et al.
2

A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e. no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate m^-1/2, where m is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide a prototypical example of simulated single-cell RNA sequence data with strong intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.

READ FULL TEXT

page 14

page 15

research
09/16/2022

Robust Inference of Manifold Density and Geometry by Doubly Stochastic Scaling

The Gaussian kernel and its traditional normalizations (e.g., row-stocha...
research
05/23/2023

SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities

Many approaches in machine learning rely on a weighted graph to encode t...
research
06/22/2022

Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise

Bi-stochastic normalization of kernelized graph affinity matrix provides...
research
09/06/2022

Semi-Supervised Clustering via Dynamic Graph Structure Learning

Most existing semi-supervised graph-based clustering methods exploit the...
research
10/18/2022

Representation Power of Graph Convolutions : Neural Tangent Kernel Analysis

The fundamental principle of Graph Neural Networks (GNNs) is to exploit ...
research
10/22/2021

Sinkformers: Transformers with Doubly Stochastic Attention

Attention based models such as Transformers involve pairwise interaction...
research
11/24/2021

Auto robust relative radiometric normalization via latent change noise modelling

Relative radiometric normalization(RRN) of different satellite images of...

Please sign up or login with your details

Forgot password? Click here to reset