
GAN and VAE from an Optimal Transport Point of View
This short article revisits some of the ideas introduced in arXiv:1701.07875 and arXiv:1705.07642 in a simple setup. This sheds some lights on the connexions between Variational Autoencoders (VAE), Generative Adversarial Networks (GAN) and Minimum Kantorovitch Estimators (MKE).
06/06/2017 ∙ by Aude Genevay, et al. ∙ 0 ∙ shareread it

Learning Generative Models with Sinkhorn Divergences
The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct lowdimensional manifolds living in a much higherdimensional space) is a crucial problem arising in the estimation of generative models for highdimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable highdimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.
06/01/2017 ∙ by Aude Genevay, et al. ∙ 0 ∙ shareread it

Sample Complexity of Sinkhorn divergences
Optimal transport (OT) and maximum mean discrepancies (MMD) are now routinely used in machine learning to compare probability measures. We focus in this paper on Sinkhorn divergences (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength ε, between OT (ε=0) and MMD (ε=∞). Although the tradeoff induced by that regularization is now well understood computationally (OT, SDs and MMD require respectively O(n^3 n), O(n^2) and n^2 operations given a sample size n), much less is known in terms of their sample complexity, namely the gap between these quantities, when evaluated using finite samples vs. their respective densities. Indeed, while the sample complexity of OT and MMD stand at two extremes, 1/n^1/d for OT in dimension d and 1/√(n) for MMD, that for SDs has only been studied empirically. In this paper, we (i) derive a bound on the approximation error made with SDs when approximating OT as a function of the regularizer ε, (ii) prove that the optimizers of regularized OT are bounded in a Sobolev (RKHS) ball independent of the two measures and (iii) provide the first sample complexity bound for SDs, obtained,by reformulating SDs as a maximization problem in a RKHS. We thus obtain a scaling in 1/√(n) (as in MMD), with a constant that depends however on ε, making the bridge between OT and MMD complete.
10/05/2018 ∙ by Aude Genevay, et al. ∙ 0 ∙ shareread it
Aude Genevay
is this you? claim profile