Unsupervised Ground Metric Learning using Wasserstein Eigenvectors

02/11/2021
by   Geert-Jan Huizing, et al.
22

Optimal Transport (OT) defines geometrically meaningful "Wasserstein" distances, used in machine learning applications to compare probability distributions. However, a key bottleneck is the design of a "ground" cost which should be adapted to the task under study. In most cases, supervised metric learning is not accessible, and one usually resorts to some ad-hoc approach. Unsupervised metric learning is thus a fundamental problem to enable data-driven applications of Optimal Transport. In this paper, we propose for the first time a canonical answer by computing the ground cost as a positive eigenvector of the function mapping a cost to the pairwise OT distances between the inputs. This map is homogeneous and monotone, thus framing unsupervised metric learning as a non-linear Perron-Frobenius problem. We provide criteria to ensure the existence and uniqueness of this eigenvector. In addition, we introduce a scalable computational method using entropic regularization, which - in the large regularization limit - operates a principal component analysis dimensionality reduction. We showcase this method on synthetic examples and datasets. Finally, we apply it in the context of biology to the analysis of a high-throughput single-cell RNA sequencing (scRNAseq) dataset, to improve cell clustering and infer the relationships between genes in an unsupervised way.

READ FULL TEXT

page 6

page 7

page 8

page 9

page 10

research
09/13/2023

Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks

Comparing graphs by means of optimal transport has recently gained signi...
research
02/10/2020

CO-Optimal Transport

Optimal transport (OT) is a powerful geometric and probabilistic tool fo...
research
05/23/2018

Optimal Transport for structured data

Optimal transport has recently gained a lot of interest in the machine l...
research
11/02/2022

Geodesic Sinkhorn: optimal transport for high-dimensional datasets

Understanding the dynamics and reactions of cells from population snapsh...
research
11/08/2019

Ground Metric Learning on Graphs

Optimal transport (OT) distances between probability distributions are p...
research
10/11/2011

Ground Metric Learning

Transportation distances have been used for more than a decade now in ma...
research
12/15/2018

Mapper Comparison with Wasserstein Metrics

The challenge of describing model drift is an open question in unsupervi...

Please sign up or login with your details

Forgot password? Click here to reset