An Optimal Transport View on Generalization

11/08/2018
by   Jingwei Zhang, et al.
0

We derive upper bounds on the generalization error of learning algorithms based on their algorithmic transport cost: the expected Wasserstein distance between the output hypothesis and the output hypothesis conditioned on an input example. The bounds provide a novel approach to study the generalization of learning algorithms from an optimal transport view and impose less constraints on the loss function, such as sub-gaussian or bounded. We further provide several upper bounds on the algorithmic transport cost in terms of total variation distance, relative entropy (or KL-divergence), and VC dimension, thus further bridging optimal transport theory and information theory with statistical learning theory. Moreover, we also study different conditions for loss functions under which the generalization error of a learning algorithm can be upper bounded by different probability metrics between distributions relating to the output hypothesis and/or the input data. Finally, under our established framework, we analyze the generalization in deep learning and conclude that the generalization error in deep neural networks (DNNs) decreases exponentially to zero as the number of layers increases. Our analyses of generalization error in deep learning mainly exploit the hierarchical structure in DNNs and the contraction property of f-divergence, which may be of independent interest in analyzing other learning models with hierarchical structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2022

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Generalization error bounds are essential to understanding machine learn...
research
09/04/2021

Barycenteric distribution alignment and manifold-restricted invertibility for domain generalization

For the Domain Generalization (DG) problem where the hypotheses are comp...
research
04/24/2018

An Information-Theoretic View for Deep Learning

Deep learning has transformed the computer vision, natural language proc...
research
08/04/2020

Analyzing Upper Bounds on Mean Absolute Errors for Deep Neural Network Based Vector-to-Vector Regression

In this paper, we show that, in vector-to-vector regression utilizing de...
research
12/13/2022

Score-based Generative Modeling Secretly Minimizes the Wasserstein Distance

Score-based generative models are shown to achieve remarkable empirical ...
research
08/26/2023

Optimal Transport-inspired Deep Learning Framework for Slow-Decaying Problems: Exploiting Sinkhorn Loss and Wasserstein Kernel

Reduced order models (ROMs) are widely used in scientific computing to t...
research
12/05/2019

Adversarial Risk via Optimal Transport and Optimal Couplings

The accuracy of modern machine learning algorithms deteriorates severely...

Please sign up or login with your details

Forgot password? Click here to reset