
Universal Invariant and Equivariant Graph Neural Networks
Graph Neural Networks (GNN) come in many flavors, but should always be either invariant (permutation of the nodes of the input graph does not affect the output) or equivariant (permutation of the input permutes the output). In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems. More precisely, we consider networks with a single hidden layer, obtained by summing channels formed by applying an equivariant linear operator, a pointwise nonlinearity and either an invariant or equivariant linear operator. Recently, Maron et al. (2019) showed that by allowing higherorder tensorization inside the network, universal invariant GNNs can be obtained. As a first contribution, we propose an alternative proof of this result, which relies on the StoneWeierstrass theorem for algebra of realvalued functions. Our main contribution is then an extension of this result to the equivariant case, which appears in many practical applications but has been less studied from a theoretical point of view. The proof relies on a new generalized StoneWeierstrass theorem for algebra of equivariant functions, which is of independent interest. Finally, unlike many previous settings that consider a fixed number of nodes, our results show that a GNN defined by a single set of parameters can approximate uniformly well a function defined on graphs of varying size.
05/13/2019 ∙ by Nicolas Keriven, et al. ∙ 13 ∙ shareread it

Stochastic Deep Networks
Machine learning is increasingly targeting areas where input data cannot be accurately described by a single vector, but can be modeled instead using the more flexible concept of random vectors, namely probability measures or more simply point clouds of varying cardinality. Using deep architectures on measures poses, however, many challenging issues. Indeed, deep architectures are originally designed to handle fixedlength vectors, or, using recursive mechanisms, ordered sequences thereof. In sharp contrast, measures describe a varying number of weighted observations with no particular order. We propose in this work a deep framework designed to handle crucial aspects of measures, namely permutation invariances, variations in weights and cardinality. Architectures derived from this pipeline can (i) map measures to measures  using the concept of pushforward operators; (ii) bridge the gap between measures and Euclidean spaces  through integration steps. This allows to design discriminative networks (to classify or reduce the dimensionality of input measures), generative architectures (to synthesize measures) and recurrent pipelines (to predict measure dynamics). We provide a theoretical analysis of these building blocks, review our architectures' approximation abilities and robustness w.r.t. perturbation, and try them on various discriminative and generative tasks.
11/19/2018 ∙ by Gwendoline de Bie, et al. ∙ 12 ∙ shareread it

Geometric Losses for Distributional Learning
Building upon recent advances in entropyregularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes. Unlike previous attempts to use optimal transport distances for learning, our loss results in unconstrained convex objective functions, supports infinite (or very large) class spaces, and naturally defines a geometric generalization of the softmax operator. The geometric properties of this loss make it suitable for predicting sparse and singular distributions, for instance supported on curves or hypersurfaces. We study the theoretical properties of our loss and showcase its effectiveness on two applications: ordinal regression and drawing generation.
05/15/2019 ∙ by Arthur Mensch, et al. ∙ 11 ∙ shareread it

Semidual Regularized Optimal Transport
Variational problems that involve Wasserstein distances and more generally optimal transport (OT) theory are playing an increasingly important role in data sciences. Such problems can be used to form an examplar measure out of various probability measures, as in the Wasserstein barycenter problem, or to carry out parametric inference and density fitting, where the loss is measured in terms of an optimal transport cost to the measure of observations. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. Entropic regularization has recently emerged as an efficient tool to approximate the solution of such variational Wasserstein problems. In this paper, we give a thorough duality tour of these regularization techniques. In particular, we show how important concepts from classical OT such as ctransforms and semidiscrete approaches translate into similar ideas in a regularized setting. These dual formulations lead to smooth variational problems, which can be solved using smooth, differentiable and convex optimization problems that are simpler to implement and numerically more stable that their unregularized counterparts. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spatial regularization functionals.
11/13/2018 ∙ by Marco Cuturi, et al. ∙ 6 ∙ shareread it

GAN and VAE from an Optimal Transport Point of View
This short article revisits some of the ideas introduced in arXiv:1701.07875 and arXiv:1705.07642 in a simple setup. This sheds some lights on the connexions between Variational Autoencoders (VAE), Generative Adversarial Networks (GAN) and Minimum Kantorovitch Estimators (MKE).
06/06/2017 ∙ by Aude Genevay, et al. ∙ 0 ∙ shareread it

Learning Generative Models with Sinkhorn Divergences
The ability to compare two degenerate probability distributions (i.e. two probability distributions supported on two distinct lowdimensional manifolds living in a much higherdimensional space) is a crucial problem arising in the estimation of generative models for highdimensional observations such as those arising in computer vision or natural language. It is known that optimal transport metrics can represent a cure for this problem, since they were specifically designed as an alternative to information divergences to handle such problematic scenarios. Unfortunately, training generative machines using OT raises formidable computational and statistical challenges, because of (i) the computational burden of evaluating OT losses, (ii) the instability and lack of smoothness of these losses, (iii) the difficulty to estimate robustly these losses and their gradients in high dimension. This paper presents the first tractable computational method to train large scale generative models using an optimal transport loss, and tackles these three issues by relying on two key ideas: (a) entropic smoothing, which turns the original OT loss into one that can be computed using Sinkhorn fixed point iterations; (b) algorithmic (automatic) differentiation of these iterations. These two approximations result in a robust and differentiable approximation of the OT loss with streamlined GPU execution. Entropic smoothing generates a family of losses interpolating between Wasserstein (OT) and Maximum Mean Discrepancy (MMD), thus allowing to find a sweet spot leveraging the geometry of OT and the favorable highdimensional sample complexity of MMD which comes with unbiased gradient estimates. The resulting computational architecture complements nicely standard deep network generative models by a stack of extra layers implementing the loss function.
06/01/2017 ∙ by Aude Genevay, et al. ∙ 0 ∙ shareread it

Sensitivity Analysis for MirrorStratifiable Convex Functions
This paper provides a set of sensitivity analysis and activity identification results for a class of convex functions with a strong geometric structure, that we coined "mirrorstratifiable". These functions are such that there is a bijection between a primal and a dual stratification of the space into partitioning sets, called strata. This pairing is crucial to track the strata that are identifiable by solutions of parametrized optimization problems or by iterates of optimization algorithms. This class of functions encompasses all regularizers routinely used in signal and image processing, machine learning, and statistics. We show that this "mirrorstratifiable" structure enjoys a nice sensitivity theory, allowing us to study stability of solutions of optimization problems to small perturbations, as well as activity identification of firstorder proximal splittingtype algorithms. Existing results in the literature typically assume that, under a nondegeneracy condition, the active set associated to a minimizer is stable to small perturbations and is identified in finite time by optimization schemes. In contrast, our results do not require any nondegeneracy assumption: in consequence, the optimal active set is not necessarily stable anymore, but we are able to track precisely the set of identifiable strata.We show that these results have crucial implications when solving challenging illposed inverse problems via regularization, a typical scenario where the nondegeneracy condition is not fulfilled. Our theoretical results, illustrated by numerical simulations, allow to characterize the instability behaviour of the regularized solutions, by locating the set of all lowdimensional strata that can be potentially identified by these solutions.
07/11/2017 ∙ by Jalal Fadili, et al. ∙ 0 ∙ shareread it

Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures
A common practice to account for psychophysical biases in vision is to frame them as consequences of a dynamic process relying on optimal inference with respect to a generative model. The present study details the complete formulation of such a generative model intended to probe visual motion perception. It is first derived in a set of axiomatic steps constrained by biological plausibility. We then extend previous contributions by detailing three equivalent formulations of the Gaussian dynamic texture model. First, the composite dynamic textures are constructed by the random aggregation of warped patterns, which can be viewed as 3D Gaussian fields. Second, these textures are cast as solutions to a stochastic partial differential equation (sPDE). This essential step enables real time, onthefly, texture synthesis using timediscretized auto regressive processes. It also allows for the derivation of a local motionenergy model, which corresponds to the loglikelihood of the probability density. The loglikelihoods are finally essential for the construction of a Bayesian inference framework. We use the model to probe speed perception in humans psychophysically using zoomlike changes in stimulus spatial frequency content. The likelihood is contained within the genera tive model and we chose a slow speed prior consistent with previous literature. We then validated the fitting process of the model using synthesized data. The human data replicates previous findings that relative perceived speed is positively biased by spatial frequency increments. The effect cannot be fully accounted for by previous models, but the current prior acting on the spatiotemporal likelihoods has proved necessary in accounting for the perceptual bias.
11/02/2016 ∙ by Jonathan Vacher, et al. ∙ 0 ∙ shareread it

A Smoothed Dual Approach for Variational Wasserstein Problems
Variational problems that involve Wasserstein distances have been recently proposed to summarize and learn from probability measures. Despite being conceptually simple, such problems are computationally challenging because they involve minimizing over quantities (Wasserstein distances) that are themselves hard to compute. We show that the dual formulation of Wasserstein variational problems introduced recently by Carlier et al. (2014) can be regularized using an entropic smoothing, which leads to smooth, differentiable, convex optimization problems that are simpler to implement and numerically more stable. We illustrate the versatility of this approach by applying it to the computation of Wasserstein barycenters and gradient flows of spacial regularization functionals.
03/09/2015 ∙ by Marco Cuturi, et al. ∙ 0 ∙ shareread it

Low Complexity Regularization of Linear Inverse Problems
Inverse problems and regularization theory is a central theme in contemporary signal processing, where the goal is to reconstruct an unknown signal from partial indirect, and possibly noisy, measurements of it. A now standard method for recovering the unknown signal is to solve a convex optimization problem that enforces some prior knowledge about its structure. This has proved efficient in many problems routinely encountered in imaging sciences, statistics and machine learning. This chapter delivers a review of recent advances in the field where the regularization prior promotes solutions conforming to some notion of simplicity/lowcomplexity. These priors encompass as popular examples sparsity and group sparsity (to capture the compressibility of natural signals and images), total variation and analysis sparsity (to promote piecewise regularity), and lowrank (as natural extension of sparsity to matrixvalued data). Our aim is to provide a unified treatment of all these regularizations under a single umbrella, namely the theory of partial smoothness. This framework is very general and accommodates all lowcomplexity regularizers just mentioned, as well as many others. Partial smoothness turns out to be the canonical way to encode lowdimensional models that can be linear spaces or more general smooth manifolds. This review is intended to serve as a one stop shop toward the understanding of the theoretical properties of the soregularized solutions. It covers a large spectrum including: (i) recovery guarantees and stability to noise, both in terms of ℓ^2stability and model (manifold) identification; (ii) sensitivity analysis to perturbations of the parameters involved (in particular the observations), with applications to unbiased risk estimation ; (iii) convergence properties of the forwardbackward proximal splitting scheme, that is particularly well suited to solve the corresponding largescale regularized optimization problem.
07/07/2014 ∙ by Samuel Vaiter, et al. ∙ 0 ∙ shareread it

Model Consistency of Partly Smooth Regularizers
This paper studies leastsquare regression penalized with partly smooth convex regularizers. This class of functions is very large and versatile allowing to promote solutions conforming to some notion of lowcomplexity. Indeed, they force solutions of variational problems to belong to a lowdimensional manifold (the socalled model) which is stable under small perturbations of the function. This property is crucial to make the underlying lowcomplexity model robust to small noise. We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level. This condition is shown to be almost a necessary condition. We then show that this condition implies model consistency of the regularized estimator. That is, with a probability tending to one as the number of measurements increases, the regularized estimator belongs to the correct lowdimensional model manifold. This work unifies and generalizes several previous ones, where model consistency is known to hold for sparse, group sparse, total variation and lowrank regularizations.
05/05/2014 ∙ by Samuel Vaiter, et al. ∙ 0 ∙ shareread it
Gabriel Peyré
is this you? claim profile
CNRS senior researcher (DR2), working at the DMA, École Normale Supérieure. I am also an associate member of the Mokaplan INRIA/CNRS/ParisDauphine research group.