Unsupervised Ground Metric Learning using Wasserstein Eigenvectors

02/11/2021
by   Geert-Jan Huizing, et al.
22

Optimal Transport (OT) defines geometrically meaningful "Wasserstein" distances, used in machine learning applications to compare probability distributions. However, a key bottleneck is the design of a "ground" cost which should be adapted to the task under study. In most cases, supervised metric learning is not accessible, and one usually resorts to some ad-hoc approach. Unsupervised metric learning is thus a fundamental problem to enable data-driven applications of Optimal Transport. In this paper, we propose for the first time a canonical answer by computing the ground cost as a positive eigenvector of the function mapping a cost to the pairwise OT distances between the inputs. This map is homogeneous and monotone, thus framing unsupervised metric learning as a non-linear Perron-Frobenius problem. We provide criteria to ensure the existence and uniqueness of this eigenvector. In addition, we introduce a scalable computational method using entropic regularization, which - in the large regularization limit - operates a principal component analysis dimensionality reduction. We showcase this method on synthetic examples and datasets. Finally, we apply it in the context of biology to the analysis of a high-throughput single-cell RNA sequencing (scRNAseq) dataset, to improve cell clustering and infer the relationships between genes in an unsupervised way.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset