 # Free complete Wasserstein algebras

We present an algebraic account of the Wasserstein distances W_p on complete metric spaces. This is part of a program of a quantitative algebraic theory of effects in programming languages. In particular, we give axioms, parametric in p, for algebras over metric spaces equipped with probabilistic choice operations. The axioms say that the operations form a barycentric algebra and that the metric satisfies a property typical of the Wasserstein distance W_p. We show that the free complete such algebra over a complete metric space is that of the Radon probability measures on the space with the Wasserstein distance as metric, equipped with the usual binary convex sum operations.

Comments

There are no comments yet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

The denotational semantics of probabilistic programs generally makes use of one or another monad of probability measures. In [16, 8], Lawvere and Giry provided the first example of such a monad, over the category of measurable spaces; Giry also gave a second example, over Polish spaces. Later, in [13, 12], Jones and Plotkin provided a monad of evaluations over directed complete posets. More recently, in , Heunen, Kammar, et al provided such a monad over quasi-Borel spaces.

Metric spaces (and nonexpansive maps) form another natural category to consider. The question then arises as to what metric to take over probability measures, and the Kantorovich-Wasserstein, or, more generally, the Wasserstein distances of order (generalising from ), provide natural choices. A first result of this kind was given in , where it was shown that the Kantorovich distance provides a monad over the subcategory of 1-bounded complete separable metric spaces.

One purpose of such monads is to make probabilistic choice operations available, and so an algebraic account of the monads in terms of these operations is of interest; we focus here on such results in the metric context. Barycentric algebras axiomatise probabilistic choice. They have binary convex combination operations , for ; one can think of as choosing to continue with with probability and with with probability . The operations are required to obey appropriate laws.

In this paper we show that the Wasserstein distance of order yields a monad on the category of complete metric spaces and nonexpansive maps and that this monad can be characterised as the free algebra monad for barycentric algebras over complete metric spaces where the barycentric operations are required to be nonexpansive and are subject to an appropriate Wasserstein condition. The probability measures in are required to be Radon (equivalently, tight) and to have finite -moment. The important Kantorovich case of this result, where , was essentially already established in , as it is an immediate consequence of Theorem 5.2.1 there.

In  a somewhat different direction was taken where monads on (extended) metric spaces are defined algebraically, using a quantitative analogue of equational logic where equations give an upper bound on the distance between two elements, and algebras are extended metric spaces equipped with nonexpansive operations. One then seeks to characterise the action of the monad on as wide a variety of spaces as possible.

This is part of a program begun in  to establish a quantitative algebraic theory of effects in programming languages. In particular, in  we gave axioms, parametric in , for algebras equipped with probabilistic choice operations and showed the free complete such algebras over a complete 1-bounded separable metric space X are the probability measures on with the Wasserstein distance, equipped with the usual binary convex sum operations. Comparing this result to the results in this paper and in , it is worth remarking that probability measures on 1-bounded spaces automatically have finite moments, and probability measures on separable complete spaces are automatically Radon.

In Section 2, we discuss the Wasserstein distance, of order between probability measures on a metric space . In Theorem 2 we show that the distance is a metric on the Radon probability measures with finite moments of order . To do so, we make use of a result in  that the triangle inequality holds for all probability measures if is separable. Here, and throughout the paper, we reduce the general case to the separable case using a lemma, due to Basso, which states that the support of any probability measure on a metric space is separable.

Next, in Theorem 2 we show that if is complete then the Wasserstein metric of order on the Radon probability measures with finite moments of order is also complete and is generated by the probability measures with finite support. To do this, we make use of the well-known result that if is complete and separable then the Wasserstein metric of order on all probability measures over with finite moments of order is also complete and separable, being generated by the rational measures with finite support in a countable basis of (see, e.g., [24, Theorem 6.18] and the bibliographic discussion there). The section concludes with a side-excursion, a discussion of weak convergence: Theorem 2 generalises the characterisation of the topology induced by the Wasserstein metrics in terms of weak convergence given by [24, Theorem 6.9] from complete separable spaces to all complete spaces.

In Section 3 we discuss the algebraic aspect of these spaces. In particular, in Theorem 3 we show that the Radon probability measures with finite moments of order on a complete metric space , equipped with the metric and binary convex sums, form the free complete Wasserstein algebra of order over . To do this, we first characterise the free Wasserstein algebra of order over a metric space; the characterisation follows straightforwardly from the well-known characterisation of the free barycentric algebra over a set as the natural such algebra on the finite probability measures on the set. Having done so, we pass to the case of complete metric spaces using a general theorem on completions of metric algebras, combined with Theorem 2.

Finally, we discuss some loose ends. First, there is a certain disconnect between our work and that of  in that here we use standard metric spaces, whereas there the more general framework of extended metric spaces is employed where distances can be infinite. We sketch how to bridge this disconnect at the end of Section 3. Second, there are other natural algebraic approaches to probability. Convex spaces are an algebraic formulation of finite convex combinations . Midpoint algebras have a midpoint operation : one can think of as providing a fair choice between and , and so being equivalent to . We sketch how our algebraic characterisation of the Wasserstein monads on complete metric spaces can be rephrased in terms of either of these alternative approaches.

We affectionately dedicate this paper to Furio Honsell on the occasion of his 60th birthday.

## 2. The Wasserstein distance

We begin with some technical preliminaries on Radon probability measures and couplings and their support. For general background on probability measures on topological and metric spaces see [7, 19]. By probability measure we mean a Borel probability measure. Given such a probability measure on a Hausdorff space , we say that a Borel set is compact inner regular (for ) if:

 μ(B)=sup{μ(C)∣C compact,C⊆B}

Then is Radon if all Borel sets are compact inner regular for it, and tight if is compact inner regular for it (equivalently, if for any Borel set and there is a compact set such that ). Every Radon measure is tight, every tight probability measure on a metric space is Radon, and every probability measure on a separable complete metric space is tight.

The support of a probability measure on a topological space is

 supp(μ)=def{x∈X∣μ(U)>0 for all open U % containing x}

Note that the support is always a closed set. If is Radon then has measure 1. If the support of is finite then is Radon and can be written uniquely, up to order, as a finite convex sum of Dirac measures, viz:

 μ=∑s∈supp(μ)μ(s)δ(s)

(writing instead of ). We say that is rational if all the are.

The following very useful lemma is due to Basso . It enables us, as he did, to establish results about probability measures on metric spaces by applying results about probability measures on separable metric spaces to their supports. Every probability measure on a metric space has separable support.

###### Proof.

Let be a probability measure on a metric space. For every let be a maximal set of points in with the property that all points are at distance apart. As any two open balls with centre in and radius are disjoint and each such ball has -measure , is countable (were it uncountable, we would have an uncountable collection of reals with any denumerable subset summable, and no such collection exists). By the maximality of , any point in is at distance from some point in . It follows that the countable set is dense in . ∎

A coupling between two probability measures and on a topological space is a probability measure on whose left and right marginals (= pushforwards along the projections) are, respectively, and . We gather some facts about such couplings: Let be a coupling between two probability measures and on a topological space . Then:

 supp(γ)⊆supp(μ)×supp(ν)

Further, if and are tight, so is . Moreover, in the case that is a metric space, if and are Radon, so is .

###### Proof.

For the first part, suppose that and let be an open neighbourhood of . Then is an open neighbourhood of and so . So . Similarly .

For the second part, choose . As and are tight, and are compact inner regular for them. So there are compact sets and such that and . Then we have:

 γ((X×Y)∖(C×D))=γ((X×Y)∖((C×Y)∩(X×D)))=γ(((X×Y)∖(C×Y))∪((X×Y)∖(X×D)))=γ(((X∖C)×Y)∪(X×(Y∖D)))≤γ((X∖C)×Y)+γ(X×(Y∖D))=μ(X∖C)+ν(Y∖D)<ε

So, as was arbitrary, is compact inner regular for , as required. The last part is immediate as all tight probability measures on metric spaces are Radon. ∎

We now turn to the Wasserstein distance. A probability measure on a metric space is said to have finite moment of order , where , if, for some (equivalently all) , the integral

 ∫d(x0,−)pdμ

is finite To see that the existence of the -th moment does not depend on the choice of , recollect the inequality for and . Then, for any , we have:

 d(x,y)p ≤2p−1(d(x,z)p+d(z,y)p) (1)

and the conclusion follows, taking and to be any other choice. Note that finite probability measures with finite support have finite moments of all orders.

One can obtain examples of measures with countable support, and with or without finite moments, by using the fact that the Dirichlet series converges for reals and diverges for . Taking the natural numbers as a metric space with the usual Euclidean metric , one sees that, for , the discrete probability measure where for and , has finite -moment if, and only if, .

The Wasserstein distance of order , is defined between probability measures with finite moments of order by:

 Wp(μ,ν)=(infγ∫dpXdγ)1/p

where runs over the couplings between and . To see that it is well-defined for probability measures with finite -moment, one again invokes (1), this time to get an integrable upper bound on as follows:

 ∫dpXdγ≤2p−1(∫d(−,x0)pdγ+∫d(x0,−)pdγ)=2p−1(∫d(−,x0)pdμ+∫d(x0,−)pdν)

The Wasserstein distance in monotonic in , that is, for and with finite moments of order . This is an immediate consequence of the inequality , which holds for any probability measure and any with integrable w.r.t.  (this inequality is itself a straightforward consequence of Hölder’s inequality).

We need two lemmas relating probability measures on a metric space with probability measures on a closed subset of the space. Let be a closed subset of a metric space . We write for the pushforward of a finite measure on along the inclusion of in , so ; and we write for the restriction of a finite measure on to a finite measure on , so . Note that .

Let be a closed subset of a metric space .

1. If is a Radon probability measure on , then is a Radon probability measure on with the same support as . Further, has finite moment of order if does.

2. If is a Radon probability measure on with support included in then is a Radon probability measure on , and we have:

 ν=i∗(r(ν))

If, further, has finite moment of order , so does .

###### Proof.
1. Let be a Radon probability measure on , and note that must then be non-empty. It is straightforward to check that is a Radon probability measure on with the same support as . Next, choosing , as

 ∫dX(x0,−)pd(i∗μ)=∫dC(x0,−)pdμ

we see that has finite moment of order if does.

2. Let be a Radon probability measure on with support included in . It is straightforward to check that is a Radon probability measure on . Regarding the equality, for any Borel set of we have:

 i∗(r(ν))(B)=r(ν)(B∩C)=ν(B∩C)=ν(B)

with the last equality holding as is Radon and the support of is included in . Finally, has finite moment of order if does, as, choosing , we have:

 ∫dC(x0,−)pd(r(ν))=∫dX(x0,−)pd(i∗(r(ν)))=∫dX(x0,−)pdν ∎

Let be a closed subset of a metric space . Then

1. For any Radon probability measures on with finite moments of order we have:

 Wp(μ,ν)=Wp(i∗(μ),i∗(ν))
2. For any Radon probability measures on with finite moments of order whose support is included in , we have:

 Wp(μ,ν)=Wp(r(μ),r(ν))
###### Proof.
1. Let be a coupling between and . Then is a coupling between and , as , and similarly for . We also have:

 ∫dpXd((i×i)∗γ)=∫dpCdγ

As was chosen arbitrarily, we therefore have:

 Wp(μ,ν)≥Wp(i∗(μ),i∗(ν))

For the reverse inequality, let be a coupling between and . By Lemma 2.1, and are Radon. So, by Lemma 2, is also Radon. Also, , since and . So, by Lemma 2.2, is a Radon probability measure on and .

Further, is a coupling between and , for:

 r(γ)(B×C)=γ(B×C)=γ((B×X)∩(C×C))=γ(B×X)(as supp(γ)⊆C×C)=μ(B)=r(μ)(B)

and similarly for . We then have:

 ∫dpXdγ=∫dpXd((i×i)∗(rγ))=∫dpCd(rγ)

As was chosen arbitrarily, we therefore have, as required:

 Wp(i∗(μ),i∗(ν))≥Wp(μ,ν)
2. Using part (1), we have:

 Wp(μ,ν)=Wp(i∗(r(μ)),i∗(r(ν)))=Wp(r(μ),r(ν)) ∎

With these technical lemmas established, we can now prove two theorems on spaces of Radon probability measures. For any metric space and , define to be the set of Radon probability measures on with finite moments of order , equipped with the distance. For the first theorem we use the result in  that the triangle inequality holds for all probability measures if is separable. Let be a metric space. Then is a metric space.

###### Proof.

First for any as , the pushforward of along the diagonal , is a coupling between and itself. For the converse, choose , and suppose . Then , as is monotonic in . It follows that as is a metric on (see, e.g., [5, 1]).

Symmetry is evident. To show the triangle inequality, suppose that . Let be the closed set , separable by Lemma 2. Then , and are probability measures on the separable space , and so by , we have . Then, by Lemmas 2.2 and 2.2, we see that , as required. ∎

For the second theorem we use the result that the metric space of all probability measures with finite moments of order on a complete and separable space is also complete and separable, being generated by the rational measures with finite support in a countable basis of the space.

Let be a complete metric space. Then is a complete metric space generated by the finitely supported probability measures on .

###### Proof.

To show that is complete, let be a Cauchy sequence in . Let be the closure in of . By Lemma 2, each set is separable, and so is itself separable, being the closure of a countable union of separable sets. Further, is complete as it is a closed subset of a complete space. So is complete. Applying Lemmas 2.2 and 2.2, we see that is a Cauchy sequence in . Let be its limit there. As Lemma 2.1 shows that is an isometric embedding, we see that is a Cauchy sequence in with limit . But, by Lemma 2.2, is .

To see that the finitely supported probability measures are dense in , choose and . Then, taking to be the separable closed set , we have . Then, as is generated by the rational measures with finite support, there is a finitely supported probability measure at distance from , and so we see that is at distance from . Finally, is finitely supported as, by Lemma 2.1, it has the same support as . ∎

It is interesting to consider how the different Wasserstein metrics generate the various starting from the same basis, i.e., the probability measures with finite support. As is monotonic in when , any sequence Cauchy in the metric is also Cauchy in the metric. So the difference must be that sequences of probability measures with finite support can be Cauchy in the metric, but not in the metric when . Examples of this phenomenon can be found by building on Example 2:

Set

 Dq,m(n)=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩0(n=0)ζ(q+1)−11nq+1(1≤n≤m)1−ζ(q+1)−1∑mi=11nq+1(n=m+1)0(n>m+1)

Note that and agree for , but the rest of the mass of is concentrated on at . Then the sequence converges to in the metric when , but is not Cauchy w.r.t.  when .

We pause our development to examine the topologies induced by the Wasserstein metrics. In the case of separable such spaces it is known that these metrics topologise a suitable notion of weak convergence, see  [23, Theorem 7.12]. It turns out that we can again use our techniques and generalise these results to all complete metric spaces.

Given a metric space and probability measures () and on , we say that the converge weakly to , and write if, for all continuous bounded , we have ; an equivalent formulation is that for every open set (see  [19, Theorem 6]).

Let be a closed subset of a metric space and let () and be probability measures on . Then we have:

 μi⟶μ⟺i∗(μi)⟶i∗(μ)

where is the inclusion map.

###### Proof.

We use the equivalent formulation of weak convergence in terms of measures of open sets. In one direction, suppose that . Then , since for every open subset , and any probability measure on , we have . In the other direction, suppose that . Then since for any open set of there is an open set of such that and then, for any probability measure on , we have . ∎

Let be a separable closed subset of a complete metric space , and let be the inclusion map. Then, for any probability measures () and on , and any , the following are equivalent, and hold independently of the choice of :

1. and .

2. and .

3. For all continuous functions such that for all , for some , one has .

4. For all continuous functions such that for all , for some , one has .

###### Proof.

As is separable and complete, by [23, Theorem 7.12], (2) and (3) are equivalent. and hold independently of the choice of . As, further, (4) trivially implies (1), it suffices to prove that (1) implies (2) and (3) implies (4), for any . The first of these implications follows using Lemma 2 and the fact that , for all , and similarly for . For the second of these implications, one again uses Lemma 2 and notes that if is a continuous function such that for all , for some , then is a continuous function such that , for all and that , and similarly for . ∎

Let be a complete metric space and choose () and in . Then the following two conditions hold independently of the choice of and are equivalent:

1. and ,

2. For all continuous functions such that for all , for some , one has .

###### Proof.

First, choose . Next, choose a separable closed set containing and and including the supports of the () and . Then, taking () and in Lemma 2, we see that (1) and (2) hold independently of the choice of (e.g., whether choosing or ) and are equivalent for any such choice as , for all , and . As and were chosen arbitrarily from , the conclusion follows. ∎

If either of the two equivalent conditions of Theorem 2 hold for () and in and any choice of , where is a complete metric space, we say that the converge -weakly to , and write .

For any complete metric space , the Wasserstein distance metricises -weak convergence in .

###### Proof.

Choose () and in to show that iff converges to w.r.t. the Wasserstein distance in . To this end, let be a closed separable set containing the supports of the and (and so necessarily nonempty), and set and and let be an element of . By [24, Theorem 6.9] we have that iff converges to w.r.t. the Wasserstein distance in . So it suffices to show (i) that iff , and (ii) that converges to w.r.t. the Wasserstein distance in iff converges to w.r.t. the Wasserstein distance in .

That (i) holds follows from the definition of -weak convergence and the equivalence between parts (1) and (2) of Lemma 2 applied to the , , and . That (ii) holds follows from the fact that, by Lemma 2.1, the inclusion of in is an isometric embedding. ∎

## 3. Wasserstein algebras

We begin with an account of barycentric algebras. We then move on to quantitative algebras, by which we mean algebras over metric spaces. After some general considerations on these we move on to Wasserstein algebras of order , and our main theorem, characterising the Wasserstein monads .

A barycentric algebra (or abstract convex set) is a set equipped with binary operations for every real number such that the following equational laws hold:

 x+1y =x (B1) x+rx =x (B2) x+ry =y+1−rx (SC) (x+py)+rz =x+pr(y+r−pr1−prz)    provided r<1,p<1 (SA)

SC stands for skew commutativity and SA for skew associativity. Homomorphisms of barycentric algebras are termed affine.

One can inductively define finite convex sums in a barycentric algebra by:

 n∑i=1rixi=x1+r1n∑i=2ri1−r1xi

for , , with the other cases being evident. Note that affine maps preserve such finite convex sums. Finite convex sums have been axiomatised as convex spaces (or convex algebras) where the sums are required to obey the projection axioims:

 n∑i=1δikxi=xk

where is the Kronecker symbol, and the barycentre axioms:

With the above definition of finite sums, barycentric algebras form convex spaces, and indeed the two categories of algebras are equivalent under this correspondence.

Probability measures on a measurable space, and subclasses of them, naturally form barycentric algebras under the standard binary pointwise convex combination operations:

 (μ+rν)(B)=rμ(B)+(1−r)ν(B)

In particular, for any set , the barycentric algebra of probability measures over with finite support is the free barycentric algebra over , with universal arrow the Dirac delta function (see [18, 20]); if is a map to a barycentric algebra , then the unique extension of along is given by:

 ¯¯¯f(μ)=∑s∈supp(μ)μ(s)f(s)

Barycentric algebras originate with the work of M.H. Stone in , and convex spaces with that of T. Šwirszcz in . For further bibliographic references and historical discussion, see, e.g., .

Turning to quantitative algebras, we work with the category of metric spaces and nonexpansive maps, and its subcategory of complete metric spaces. These categories have all finite products with the one-point metric space as the final object and with the max metric on binary products , where:

 dX×Y(⟨x,y⟩,⟨x′,y′⟩)=max{dX(x,x′),dY(y,y′)}

We remark that nonexpansive maps are continuous.

A finitary signature is a collection of operation symbols and an assignment of an arity to each; given such a signature, we write to indicate that is an operation symbol of arity . A (metric space) quantitative -algebra is then a metric space equipped with a nonexpansive function for each operation symbol . We often omit the suffix on the operation symbol and also confuse the metric space with the algebra. A homomorphism of -algebras is a nonexpansive map such that for all and we have:

 h(fX(x1,…,xn))=fY(h(x1),…,h(xn))

This defines a category of quantitative -algebras and homomorphisms; it has an evident subcategory of complete quantitative -algebras.

For any , a Wasserstein (barycentric) algebra of order is a quantitative algebra forming a barycentric algebra such that for all we have:

 d(x+ry,x′+ry′)p≤rd(x,x′)p+(1−r)d(y,y′)p (∗)

We remark that the hypothesis of non-expansiveness of the is redundant as, setting

 m=max{d(x,x′),d(y,y′)}

we have:

 d(x+ry,x′+ry′)≤(rd(x,x′)p+(1−r)d(y,y′)p)1/p≤(rmp+(1−r)mp)1/p=m

We gather some other remarks about convex combinations. Recollect that a function between metric spaces is -Hölder continuous if, for all , we have for some constant .

Let be a Wasserstein algebra of order . Then:

1. The functions are -Lipschitz in their first argument and -Lipschitz in their second argument.

2. Considered as a function of , is -Hölder continuous.

3. The following generalisation of equation (3) to finite convex sums holds:

 d(∑ni=1rixi,∑ni=1rix′i)p≤∑ni=1rid(xi,x′i)p
###### Proof.
1. Taking ( in (3) we respectively obtain:

 d(x+ry,x′+ry)≤r1/pd(x,x′)andd(x+ry,x+ry′)≤(1−r)1/pd(y,y′)

as required.

2. Fix and choose . Suppose that and set . Assuming , we have

 x+ry=x+e(x+s/1−ey) and x+sy=y+e(x+s/1−ey)

So by part 1 we have

 d(x+ry,x+sy)≤e1/pd(x,y)

This also holds when as then and . As , we therefore have:

 d(x+ry,x+sy)≤d(x,y)d(r,s)1/p

By symmetry this also holds when . As , , so, as a function of , is -Hölder continuous, as required.

3. This is a straightforward induction. ∎

For any metric space , we turn into a barycentric algebra by equipping it with the standard convex combination operations. It is evident that has finite moment of order if and do.

For any metric space , forms a Wasserstein algebra of order .

###### Proof.

We need only show that for any and we have:

 Wp(μ+rν,μ′+rν′)p≤rWp(μ,μ′)p+(1−r)Wp(ν,ν′)p

(recalling that theWasserstein condition implies nonexpansivness of the operations). To this end, choose . Let be a coupling between and and let be a coupling between and . Then is a coupling between and , and we have:

 ∫dpXd(α+rβ)= ∫dpXdα+r∫dpXdβ\par

The conclusion then follows, as we have:

 Wp(μ+rν,μ′+rν′)p=infγ(∫dpXdγ)≤infα,β(∫dpXd(α+rβ))=infα,β(∫dpXdα+r∫dpXdβ)=infα(∫dpXdα)+rinfβ(∫dpXdβ)=Wp(μ,ν)p+rWp(μ′,ν′)p\qed

We next turn to characterising the free Wasserstein algebras of order . For any metric space X, let be the sub-Wasserstein algebra of order of that consists of the finitely supported probability measures. For any metric space X, is the free Wasserstein algebra of order over , with universal arrow the Dirac delta function .

###### Proof.

First, is an isometric embedding, since, for any , the only coupling between and is , and so we have:

 Wp(δ(x)),δ(y))=(d(x,y)p)1/p=d(x,y)

Next, let be any nonexpansive affine map to a Wasserstein algebra of order . As a set, is the free barycentric algebra over with unit the Dirac function, and the unique affine map extending along the unit is given by the formula:

 ¯¯¯f(μ)=∑s∈supp(μ)μ(s)f(s)

So it only remains to show that is nonexpansive. Let , be finitely supported probability measures with respective finite non-empty supports and . Let be a coupling between them. Then has support . Consequently we can write it as a convex sum, as follows:

 γ=∑s∈S,t∈Tγ(⟨s,t⟩)δ(⟨s,t⟩)

As and are the marginals of , we have:

 μ=∑s∈S,t∈Tγ(⟨s,t⟩)δ(s)andν=∑s∈S,t∈Tγ(⟨s,t⟩)δ(t)

We can now calculate:

 d(¯¯¯f(μ),¯¯¯f(ν))=d(¯¯¯f(∑s∈S,t∈Tγ(⟨s,t⟩)δ(s)),¯¯¯f(∑s∈S,t∈Tγ(⟨s,t⟩)δ(t)))=d(∑s∈S,t∈Tγ(⟨s,t⟩)