Gabriel Peyré

is this you? claim profile


CNRS senior researcher (DR2), working at the DMA, École Normale Supérieure. I am also an associate member of the Mokaplan INRIA/CNRS/Paris-Dauphine research group.

  • Universal Invariant and Equivariant Graph Neural Networks

    Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise non-linearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higher-order tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the Stone-Weierstrass theorem for algebra of real-valued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized Stone-Weierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.

    05/13/2019 ∙ by Nicolas Keriven, et al. ∙ 13 share

    read it

  • Stochastic Deep Networks

    Machine learning is increasingly targeting areas where input data cannot be accurately described by a single vector, but can be modeled instead using the more flexible concept of random vectors, namely probability measures or more simply point clouds of varying cardinality. Using deep architectures on measures poses, however, many challenging issues. Indeed, deep architectures are originally designed to handle fixedlength vectors, or, using recursive mechanisms, ordered sequences thereof. In sharp contrast, measures describe a varying number of weighted observations with no particular order. We propose in this work a deep framework designed to handle crucial aspects of measures, namely permutation invariances, variations in weights and cardinality. Architectures derived from this pipeline can (i) map measures to measures - using the concept of push-forward operators; (ii) bridge the gap between measures and Euclidean spaces - through integration steps. This allows to design discriminative networks (to classify or reduce the dimensionality of input measures), generative architectures (to synthesize measures) and recurrent pipelines (to predict measure dynamics). We provide a theoretical analysis of these building blocks, review our architectures' approximation abilities and robustness w.r.t. perturbation, and try them on various discriminative and generative tasks.

    11/19/2018 ∙ by Gwendoline de Bie, et al. ∙ 12 share

    read it

  • Geometric Losses for Distributional Learning

    Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes. Unlike previous attempts to use optimal transport distances for learning, our loss results in unconstrained convex objective functions, supports infinite (or very large) class spaces, and naturally defines a geometric generalization of the softmax operator. The geometric properties of this loss make it suitable for predicting sparse and singular distributions, for instance supported on curves or hyper-surfaces. We study the theoretical properties of our loss and show-case its effectiveness on two applications: ordinal regression and drawing generation.

    05/15/2019 ∙ by Arthur Mensch, et al. ∙ 11 share

    read it

  • Semi-dual Regularized Optimal Transport

    Variational problems that involve Wasserstein distances and more generally optimal transport (OT) theory are playing an increasingly important role in data sciences. Such problems can be used to form an examplar measure out of various probability measures, as in the Wasserstein barycenter problem, or to carry out parametric inference and density fitting, where the loss is measured in terms of an optimal transport cost to the measure of observations. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. Entropic regularization has recently emerged as an efficient tool to approximate the solution of such variational Wasserstein problems. In this paper, we give a thorough duality tour of these regularization techniques. In particular, we show how important concepts from classical OT such as c-transforms and semi-discrete approaches translate into similar ideas in a regularized setting. These dual formulations lead to smooth variational problems, which can be solved using smooth, differentiable and convex optimization problems that are simpler to implement and numerically more stable that their un-regularized counterparts. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spatial regularization functionals.

    11/13/2018 ∙ by Marco Cuturi, et al. ∙ 6 share

    read it

  • GAN and VAE from an Optimal Transport Point of View

    This short article revisits some of the ideas introduced in arXiv:1701.07875 and arXiv:1705.07642 in a simple setup. This sheds some lights on the connexions between Variational Autoencoders (VAE), Generative Adversarial Networks (GAN) and Minimum Kantorovitch Estimators (MKE).

    06/06/2017 ∙ by Aude Genevay, et al. ∙ 0 share

    read it

  • Learning Generative Models with Sinkhorn Divergences

    The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable high-dimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.

    06/01/2017 ∙ by Aude Genevay, et al. ∙ 0 share

    read it

  • Sensitivity Analysis for Mirror-Stratifiable Convex Functions

    This paper provides a set of sensitivity analysis and activity identification results for a class of convex functions with a strong geometric structure, that we coined "mirror-stratifiable". These functions are such that there is a bijection between a primal and a dual stratification of the space into partitioning sets, called strata. This pairing is crucial to track the strata that are identifiable by solutions of parametrized optimization problems or by iterates of optimization algorithms. This class of functions encompasses all regularizers routinely used in signal and image processing, machine learning, and statistics. We show that this "mirror-stratifiable" structure enjoys a nice sensitivity theory, allowing us to study stability of solutions of optimization problems to small perturbations, as well as activity identification of first-order proximal splitting-type algorithms. Existing results in the literature typically assume that, under a non-degeneracy condition, the active set associated to a minimizer is stable to small perturbations and is identified in finite time by optimization schemes. In contrast, our results do not require any non-degeneracy assumption: in consequence, the optimal active set is not necessarily stable anymore, but we are able to track precisely the set of identifiable strata.We show that these results have crucial implications when solving challenging ill-posed inverse problems via regularization, a typical scenario where the non-degeneracy condition is not fulfilled. Our theoretical results, illustrated by numerical simulations, allow to characterize the instability behaviour of the regularized solutions, by locating the set of all low-dimensional strata that can be potentially identified by these solutions.

    07/11/2017 ∙ by Jalal Fadili, et al. ∙ 0 share

    read it

  • Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures

    A common practice to account for psychophysical biases in vision is to frame them as consequences of a dynamic process relying on optimal inference with respect to a generative model. The present study details the complete formulation of such a generative model intended to probe visual motion perception. It is first derived in a set of axiomatic steps constrained by biological plausibility. We then extend previous contributions by detailing three equivalent formulations of the Gaussian dynamic texture model. First, the composite dynamic textures are constructed by the random aggregation of warped patterns, which can be viewed as 3D Gaussian fields. Second, these textures are cast as solutions to a stochastic partial differential equation (sPDE). This essential step enables real time, on-the-fly, texture synthesis using time-discretized auto- regressive processes. It also allows for the derivation of a local motion-energy model, which corresponds to the log-likelihood of the probability density. The log-likelihoods are finally essential for the construction of a Bayesian inference framework. We use the model to probe speed perception in humans psychophysically using zoom-like changes in stimulus spatial frequency content. The likelihood is contained within the genera- tive model and we chose a slow speed prior consistent with previous literature. We then validated the fitting process of the model using synthesized data. The human data replicates previous findings that relative perceived speed is positively biased by spatial frequency increments. The effect cannot be fully accounted for by previous models, but the current prior acting on the spatio-temporal likelihoods has proved necessary in accounting for the perceptual bias.

    11/02/2016 ∙ by Jonathan Vacher, et al. ∙ 0 share

    read it

  • A Smoothed Dual Approach for Variational Wasserstein Problems

    Variational problems that involve Wasserstein distances have been recently proposed to summarize and learn from probability measures. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. We show that the dual formulation of Wasserstein variational problems introduced recently by Carlier et al. (2014) can be regularized using an entropic smoothing, which leads to smooth, differentiable, convex optimization problems that are simpler to implement and numerically more stable. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spacial regularization functionals.

    03/09/2015 ∙ by Marco Cuturi, et al. ∙ 0 share

    read it

  • Low Complexity Regularization of Linear Inverse Problems

    Inverse problems and regularization theory is a central theme in contemporary signal processing, where the goal is to reconstruct an unknown signal from partial indirect, and possibly noisy, measurements of it. A now standard method for recovering the unknown signal is to solve a convex optimization problem that enforces some prior knowledge about its structure. This has proved efficient in many problems routinely encountered in imaging sciences, statistics and machine learning. This chapter delivers a review of recent advances in the field where the regularization prior promotes solutions conforming to some notion of simplicity/low-complexity. These priors encompass as popular examples sparsity and group sparsity (to capture the compressibility of natural signals and images), total variation and analysis sparsity (to promote piecewise regularity), and low-rank (as natural extension of sparsity to matrix-valued data). Our aim is to provide a unified treatment of all these regularizations under a single umbrella, namely the theory of partial smoothness. This framework is very general and accommodates all low-complexity regularizers just mentioned, as well as many others. Partial smoothness turns out to be the canonical way to encode low-dimensional models that can be linear spaces or more general smooth manifolds. This review is intended to serve as a one stop shop toward the understanding of the theoretical properties of the so-regularized solutions. It covers a large spectrum including: (i) recovery guarantees and stability to noise, both in terms of ℓ^2-stability and model (manifold) identification; (ii) sensitivity analysis to perturbations of the parameters involved (in particular the observations), with applications to unbiased risk estimation ; (iii) convergence properties of the forward-backward proximal splitting scheme, that is particularly well suited to solve the corresponding large-scale regularized optimization problem.

    07/07/2014 ∙ by Samuel Vaiter, et al. ∙ 0 share

    read it

  • Model Consistency of Partly Smooth Regularizers

    This paper studies least-square regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of low-complexity. Indeed, they force solutions of variational problems to belong to a low-dimensional manifold (the so-called model) which is stable under small perturbations of the function. This property is crucial to make the underlying low-complexity model robust to small noise. We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct low-dimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and low-rank regularizations.

    05/05/2014 ∙ by Samuel Vaiter, et al. ∙ 0 share

    read it