Geometric Dataset Distances via Optimal Transport

02/07/2020
by   David Alvarez-Melis, et al.
0

The notion of task similarity is at the core of various machine learning paradigms, such as domain adaptation and meta-learning. Current methods to quantify it are often heuristic, make strong assumptions on the label sets across the tasks, and many are architecture-dependent, relying on task-specific optimal parameters (e.g., require training a model on each dataset). In this work we propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing. This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties. Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.

READ FULL TEXT
research
04/18/2022

Hierarchical Optimal Transport for Comparing Histopathology Datasets

Scarcity of labeled histopathology data limits the applicability of deep...
research
06/12/2023

Generating Synthetic Datasets by Interpolating along Generalized Geodesics

Data for pretraining machine learning models often consists of collectio...
research
02/10/2020

CO-Optimal Transport

Optimal transport (OT) is a powerful geometric and probabilistic tool fo...
research
07/13/2020

Representation Transfer by Optimal Transport

Deep learning currently provides the best representations of complex obj...
research
03/05/2021

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Optimal transport distances have found many applications in machine lear...
research
06/25/2018

Towards Optimal Transport with Global Invariances

Many problems in machine learning involve calculating correspondences be...
research
05/10/2023

Merging Rate of Opinions via Optimal Transport on Random Measures

The Dirichlet process has been pivotal to the development of Bayesian no...

Please sign up or login with your details

Forgot password? Click here to reset