# Lifting couplings in Wasserstein spaces

This paper makes mathematically precise the idea that conditional probabilities are analogous to path liftings in geometry. The idea of lifting is modelled in terms of the category-theoretic concept of a lens, which can be interpreted as a consistent choice of arrow liftings. The category we study is the one of probability measures over a given standard Borel space, with morphisms given by the couplings, or transport plans. The geometrical picture is even more apparent once we equip the arrows of the category with weights, which one can interpret as "lengths" or "costs", forming a so-called weighted category, which unifies several concepts of category theory and metric geometry. Indeed, we show that the weighted version of a lens is tightly connected to the notion of submetry in geometry. Every weighted category gives rise to a pseudo-quasimetric space via optimization over the arrows. In particular, Wasserstein spaces can be obtained from the weighted categories of probability measures and their couplings, with the weight of a coupling given by its cost. In this case, conditionals allow one to form weighted lenses, which one can interpret as "lifting transport plans, while preserving their cost".

## Authors

• 10 publications
12/14/2017

### A Probability Monad as the Colimit of Finite Powers

We define a monad on the category of complete metric spaces with short m...
08/06/2019

### Generalized Lens Categories via functors C^ op→Cat

Lenses have a rich history and have recently received a great deal of at...
05/19/2022

### A Note on Categories about Rough Sets

Using the concepts of category and functor, we provide some insights and...
05/31/2019

### Optimal transport and information geometry

Optimal transport and information geometry are both mathematical framewo...
04/10/2018

### Bimonoidal Structure of Probability Monads

We give a conceptual treatment of the notion of joints, marginals, and i...
08/03/2021

### Taking Cognition Seriously: A generalised physics of cognition

The study of complex systems through the lens of category theory consist...
11/05/2021

### A space of goals: the cognitive geometry of informationally bounded agents

Traditionally, Euclidean geometry is treated by scientists as a priori a...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This paper explores a connection between conditional probabilities, and more generally disintegrations, and liftings in geometry and topology.

#### Projections and liftings.

Forming the marginal of a probability measure is a destructive operation. Given a probability measure on a product space , its marginal on is often called a projection, borrowing from the language of geometry. The idea is that the marginal distribution captures only the variability in , while the original joint measure may exhibit also variability in (as well as possible correlations), in a way that’s analogous to the projection onto the -axis of a geometrical figure in the -plane. Similar intuitions can be had, more generally, whenever a probability measure is pushed forward along a noninjective measurable map, or when one restricts the measure to a coarser -algebra.

There are several situations, in geometry and related fields, where one has a projection , and one wants to “translate motions in to corresponding motions in ”. This idea is made mathematically precise in different ways, such as the following.

• In algebraic topology, a fibration satisfies in particular a path-lifting property. Given a point and a curve in starting at , we can lift to a curve in starting at which is projected by back to .

• In metric geometry, a submetry can be seen as “lifting distances”. Consider a point , and a point which has distance from . Then we can lift to a point of which has distance from . (See also Section 3.)

• In differential topology, a bundle may be equipped with a connection

, which can be interpreted as a way to lift vectors, or (locally) also paths, in a way which preserves concatenation.

We are particularly interested in liftings that preserve composition (of curves, displacements, etc.). These can be effectively modelled via the category-theoretic notion of a lens, which we recall in Section 3.

The main focus of this work is the idea that conditional distributions, and more generally disintegrations, allow to construct a similar lifting to the ones described above. Namely, take a surjective, measurable function between measurable spaces , and consider a probability measure on . Take the pushforward measure on , and consider a coupling (or transport plan) from to another measure on . Under what conditions can we lift the transport plan on to a transport plan on ? And when does this notion of lifting preserve the composition of transport plans?

We will show that these liftings exist whenever is equipped with the structure of a measurable lens (creftypecap 5.4). This in particular includes the case of product projections, which at the level of probability measures give marginals – and in that case, the lifting is given by forming a particular conditional product (creftypecap 5.6).

#### Cost functions, distances, and weights.

In the context of transport plans, one is often interested in the cost of a plan whenever the underlying space is equipped with a cost function, for instance, a metric. This is part of the so-called theory of (optimal) transport (see the first chapters of [villani] for an overview).

A cost function on is a function . For the purposes of the present paper, we can interpret the quantity as the “cost of going from to ”. We are in particular interested in cost functions such that , and satisfying the triangle inequality . If is equipped with such a cost function, transport plans can be assigned a cost as well. The cost of a coupling between the distributions and can be interpreted as the cost of “moving the mass from the pile to the pile according to the specified plan”. If the cost function satisfies a triangle inequality, the costs of transport plans satisfy one too, whenever two plans are composed. The resulting structure is called a weighted category, or normed category (see Remark 2.2 for the terminology), a particular kind of enriched category.

Weighted categories have been around at least since [lawvere], and have recently sparked some interest in different fields of mathematics, including geometry and topological data analysis interleaving,categorieswithnorms,schroderbernstein. They can be considered a common generalization of categories and metric spaces. Given objects and of a weighted category, one can consider all the weights of the morphisms and take their infimum, the “minimal cost of going from to ”, and obtain this way a pseudo-quasimetric between the objects. Several metric spaces appearing in mathematics can be obtained in this way, including Wasserstein spaces, as we show in Section 4. We cannot really claim originality of this result: in the traditional construction of the space (for example, as in [villani, Section 6]), the triangle inequality is proven by implicitly constructing a weighted category. The same can be said for the Gromov-Hausdorff distance, as constructed for example in [burago, Proposition 7.3.16 and Exercise 7.3.26]. The notion of weighted category is therefore very “primitive”, it underlies several contemporary geometric constructions.

As we said above, the main theme of this work is the idea of lifting, which is common to both category theory and geometry, and therefore well-suited for treatment in terms of weighted categories. Indeed, lenses in category theory and submetries in metric geometry are very much related. We incorporate the two concepts by defining weighted lenses in Section 3, which are lenses where the liftings are weight-preserving.

The main result of this work shows that under some natural conditions, a metric lens between two spaces gives rise to a weighted lens between the respective categories of couplings (creftypecap 5.4), making mathematically precise the idea of “lifting transport plans while preserving their cost”.

#### Outline.

In Section 2 we review some of the basic concepts of weighted category theory. The definitions of embedding and of symmetry for weighted categories seem to be new, but can be seen as special cases of the enriched-categorical notion.

In Section 3 we consider lenses, the categorical structures formalizing the idea of “lifting” used in this work. We review two related notions of lens, for sets and for categories, and define weighted versions of them. We then show the link with submetries in metric geometry, which seems to be a new idea.

In Section 4 we define, for each standard Borel space , a category whose objects are probability measures on , whose morphisms are couplings, and where composition is given by means of conditional products. If is equipped with a cost function, we can turn into a weighted category (in different ways, one for each integer, analogously to spaces), which after optimization give the famous Wasserstein spaces.

In Section 5 we state and prove the main statements of this work. creftypecap 5.2 and creftypecap 5.3 show that pushforwards of probability measures are functorial, and that pushforwards along embeddings are embeddings. The most important result is creftypecap 5.4, which shows that a set-based lens between two spaces gives rise to a categorical lens between the respective spaces of probability measures, effectively lifting entire couplings along the chosen lifting of points. In creftypecap 5.6 we show that this includes the special case of product projections, where the lifting is given by taking the conditional product. All these results come with a weighted version in the metric and pseudo-quasimetric case.

#### Acknowledgements.

I would like to thank Tobias Fritz, Slava Matveev, and Sharwin Rezagholi for the fruitful conversations which eventually led to this work, and hopefully to many more. I would also like to thank Bryce Clarke and Matthew DiMeglio for the pointers on the theory of lenses, and Sam Staton and his whole group for the support and for the interesting discussions.

## 2 Weighted categories

###### Definition 2.1.

A weighted category is a category where to each arrow we assign a value called the weight of , such that

• All identities have weight zero;

• For every pair of composable arrows and , we have the “triangle inequality”

 w(g∘f)≤w(g)+w(f). (2.1)

In this work we will mostly consider small categories, where the objects form a set. This will always be implicitly assumed from now on, unless otherwise stated.

###### Remark 2.2.

Some other authors use other names, such as “normed categories”. Also [grandisweights] and [interleaving] call these “weighted categories”. In particular, in [grandisweights] it is argued that these structures are more lax than what one may want from a norm, they are more similar to seminorms. Probably for similar reasons, in [schroderbernstein] these structures are called “seminormed categories”. The original works [lawvere] and [categorienormate], call them “normed categories” (the latter in Italian, “categorie normate”), while [categorieswithnorms] and [schroderbernstein] reserve that term for weighted categories which satisfy (different) special properties. (For the latter, see also our Remark 2.13.) We choose the name “weighted categories” as it seems to be the one giving rise to the least ambiguity.

Readers interested in enriched category theory [basicconcepts] can view weighted categories as enriched categories, where the base of the enrichment is the category of weighted sets [grandisweights, Section 1.4]. We will not explicitly adopt this point of view in this work, except for some conventions. The enrichment approach is worked out in detail in [grandisweights].

Every ordinary category can be considered a weighted category where all arrows have weight zero. Because of that, all the statements that we make for weighted categories also hold for ordinary categories. At the other end of the spectrum, weighted categories also generalize metric spaces, and pseudo-quasimetric spaces.

###### Example 2.3.

A pseudo-quasimetric space (more briefly, pq-metric space) or Lawvere metric space is a set together with a “cost” function , such that

• For each , we have ;

• For each , and , we have the triangle inequality

 c(x,z)≤c(x,y)+c(y,z).

A pq-metric space can be seen as a weighted category with exactly one arrow between any two objects and , and the quantity is the weight of the arrow.

In analogy to metric spaces, we will sometimes call distance. Note that in general, for a pq-metric,

• If , it is not necessarily the case that ;

• Symmetry is not required: can be different from . In particular, one of the two may be zero when the other one is not;

• Infinities are allowed.

Here is another example, again from metric geometry.

###### Example 2.4.

Consider the category where

• Objects are metric spaces;

• Morphisms are (all) functions;

• The weight of a morphism is the logarithm of its Lipschitz constant when positive,

which is possibly infinite, and if negative.

is a (large) weighted category. (One can also define multiplicative weights and avoid taking the logarithm, see [grandisweights] for more on this.)

Every weighted category gives a pq-metric space by “optimization over the arrows”, as follows.

###### Definition 2.5.

Let be a weighted category. The optimization of is the pq-metric space where the points are the objects of , and the distance is given by

 c(X,Y)\coloneqqinff:X→Yw(f), (2.2)

where by convention, if there are no arrows from to , the infimum over the empty set gives . As it can be easily checked, this quantity is indeed a pq-metric.

This construction was “left as exercise” in [lawvere, Introduction, pg. 139-140], and fully worked out (and generalized) in [categorienormate]. It is an instance of the “change of enrichment” construction of enriched category theory.

Note that in general the minimum is not realized.

###### Definition 2.6.

A weighted category is called optimization-complete if for every two objects and of , there exists a morphism of minimal weight. That is, the infimum in (2.2) is attained.

As functors between weighted categories it is useful to take those functors which are “weight-nonincreasing” (or “1-Lipschitz”):

###### Definition 2.7.

Let and be weighted categories. A weighted functor is a functor such that for each morphism of

 w(Ff)≤w(f).

If and are ordinary categories, and we consider them as weighted categories where all the weights are zero, a weighted functor is the same thing as an ordinary functor.

If and are pq-metric spaces and we consider them as weighted categories, a function is a weighted functor if and only if it is 1-Lipschitz. (Since arrows are unique, identities and composition are trivially preserved.) Somewhat conversely, any weighted functor between weighted categories gives a 1-Lipschitz function between the pq-metric spaces obtained via optimization.

The following definition can be seen as the many-arrow analogue of the symmetry of a metric, or as a weighted analogue of dagger categories. Similarly to the unweighted case, denote by the category obtained by reversing all arrows of , with the same weights.

###### Definition 2.8.

A symmetric weighted category is a weighted category equipped with a weighted functor , called the symmetry or dagger, such that

• For each object , (i.e.  is the identity on the objects);

• For each morphism , we have that (i.e.  is an involution, ).

In other words, in a symmetric weighted category to each morphism there is a morphism in the opposite direction, , necessarily of the same weight as . (And moreover, this correspondence preserves identities and composition).

For example, if we take a pq-metric space and consider it as a weighted category as in creftypecap 2.3, it is symmetric as a weighted category if and only if the cost function is symmetric (for example, if it is a metric). Just as well, if is a symmetric weighted category, then the cost function of the optimization will be symmetric.

### 2.1 Isomorphisms

###### Definition 2.9.

Let and be objects in a weighted category . A quasi-isomorphism between and is a pair of morphisms and of finite weight, such that and .

An isomorphism between and is a quasi-isomorphism pair such that both and have weight zero.

We also write , , and sometimes we denote the pair simply by . If an isomorphism (resp. quasi-isomorphism) between and exists, we say that and are isomorphic (resp. quasi-isomorphic), and we write (resp. ).

Our term “quasi-isomorphism” is unrelated to the notion in homological algebra with the same name.

###### Remark 2.10.

Just as for the definition weighted categories, different authors call weighted isomorphisms in different ways. In general, only invertible morphisms of weight zero tend to behave like “full” isomorphisms. (See the next two examples.) This intuition can be made mathematically precise: considering weighted categories as enriched categories, the invertible morphisms of weight zero are exactly the isomorphisms in the sense of enriched category theory [basicconcepts]. In [schroderbernstein] these morphisms are called norm isomorphisms, in [categorieswithnorms] they are called -isomorphisms.

###### Example 2.11.

If is a pq-metric space, considered as a weighted category, the points are isomorphic if and only if and . In particular, in a metric space, of and only if . The points and are quasi-isomorphic if and only if they have finite distances (both ways).

###### Example 2.12.

The isomorphisms in the category of creftypecap 2.4 are precisely the isometries. The quasi-isomorphisms are precisely the bi-Lipschitz maps.

###### Remark 2.13.

Given isomorphic objects and in a weighted category , they will also be isomorphic in , i.e. have distance zero in both directions. On the contrary, if and are isomorphic in , they are not necessarily isomorphic in : all we can say is that for every , there are morphisms and of weight less than . If is optimization-complete, then there are also morphisms and of weight zero realizing the two minima. However, this still does not guarantee that and are inverses. Similar consideration can be given for quasi-isomorphisms.

The authors of [schroderbernstein] define a normed category as a weighted category where, in our terminology, for every two objects and we have that in if and only if in . (Recall however that the term “normed category” is used differently by different authors, see Remark 2.2.)

### 2.2 Embeddings

Embeddings of weighted categories are the common generalization of isometric embeddings of metric spaces and of fully faithful functors.

###### Definition 2.14.

Let and be weighted categories. An embedding of into is a weighted functor such that for every pair of objects and of , the function

is a weight-preserving bijection.

###### Example 2.15.

If and are metric spaces, considered as weighted categories, a map is an embedding in the sense above exactly if it is an isometric embedding. Somewhat conversely, an embedding of weighted categories induces an isometric embedding of the pq-metric spaces .

In ordinary category theory, fully faithful functors are injective up to isomorphism, and in metric geometry, isometric embeddings are injective up to zero distances. Both these statements can be considered a special case of the following.

###### Proposition 2.16.

Let be an embedding of weighted categories, and let and be objects of . If and are isomorphic (resp. quasi-isomorphic) in , then and are already isomorphic (resp. quasi-isomorphic) in .

###### Proof.

Let be an isomorphism of , with inverse , both of weight zero. Since is an embedding, it is surjective on morphisms, and so necessarily and for some and , also of weight zero. Now,

 F(~g∘~f)=F~g∘F~f=g∘f=idFX=F(idX),

but since is injective on morphisms, we have . Similarly, . Therefore and form an isomorphism pair between and .

The quasi-isomorphism case works analogously. ∎

## 3 Lenses and submetries

In the setting of weighted categories, we can bring together the concept of lens (from category theory) and the concept of submetry (from metric geometry).

When it comes to lenses, different authors mean different specific concepts under the same name. Here we will use two (related) flavors of lenses: set-based lenses and delta lenses, and we will provide weighted versions of both.

In our context, we can interpret lenses, as well as submetries, in terms of “lifting”. Indeed, suppose that we have two sets and , and a “projection” of onto . We sketch the situation in the following picture.

Given a point of , as in the picture above, we can project it down to a point of . If we now move from to a different point in , can we lift back to a point of which “corresponds to in a canonical way”? Here, “canonical” may mean different things in different contexts. For metric spaces we may want, for example, that . In other situations, we may want a form of parallel transport instead. Lenses and submetries make this intuition precise in these different settings. In particular, set-based lenses can be seen as lifting points, while delta lenses can be seen as lifting paths, in a way that preserves identities and composition.

### 3.1 Lifting points

Here we use the notion of (set-based) lens given for example in [setbased].

###### Definition 3.1.

A set-based lens consists of

• Two sets and ;

• A (necessarily surjective) function ;

• A function ,

such that the following equations are satisfied for every and ,

 f(φ(x,y))=yφ(x,f(x))=xφ(φ(x,y),y′)=φ(x,y′) (3.1)

or equivalently, such that the following diagrams commute,

 (3.2)

where the maps are the product projections onto the respective factors.

We will refer to the conditions (3.1) (or equivalently, (3.2)) as the lens laws. In particular we call them, in order, the lifting law, the identity law, and the composition law. We can interpret the maps and the laws as follows.

• The map is a projection from to ;

• The map takes a point and a point , and gives a lifting of back to a point which in some sense “corresponds to , but above ”;

• The lifting law says that indeed, the chosen lifting is in the preimage , i.e. “lies above ”;

• The identity law says that the point over corresponding to is itself;

• The composition law says that the procedure can be iterated, and the resulting lift does not change.

The definition of set-based lens can be given in any category with products. For example, we are interested in measurable lenses, which are set-based lenses where and are measurable spaces, and the functions and are measurable functions. Note that this notion of lens in an arbitrary category is not the same as the notion of “internal lens” given, for example in [bryce]. More on that later.

Lenses can be composed. Indeed, if is a lens between and and is a lens between and , the composite lens between and is given by

and

This can be interpreted in terms of sequential liftings: given and , we first lift to using , and then lift the result to using .

Let’s now recall the notion of submetry. Traditionally, it is given as follows.222In the Riemannian geometry literature, sometimes people use a definition similar to this one, but where the condition holds only for for some fixed bound . We will not use such notion here.

###### Definition 3.2.

Let and be metric spaces. Denote by the closed ball in of center and radius , and define analogously. A map is a submetry if for every and for every we have that

 f(BX(x,r))=BY(f(x),r).

This definition can be rephrased in the following equivalent way, which is closer to our idea of “lifting”.

###### Definition 3.3.

Let and be metric spaces. A function is a submetry if it is 1-Lipschitz, and for every and there exists such that

 f(x′)=y′andd(x,x′)=d(f(x),y′).

This definition allows us to draw a link between submetries and lenses.

###### Definition 3.4.

A metric lens is a lens where and are metric spaces, is 1-Lipschitz, and and satisfy the requirement that

 d(x,φ(x,y))=d(f(x),y). (3.3)

Note that this is not the internal notion of lens in a category of metric spaces: is not required to be 1-Lipschitz, or even continuous.

Every metric lens is canonically a submetry. Indeed, let be a metric lens, and let and . A point such that

 f(x′)=y′andd(x,x′)=d(f(x),y′).

is simply given by . Here is a partial converse to this fact.

###### Proposition 3.5.

Let be a submetry. For every and , choose an such that

 f(x′)=y′andd(x,x′)=d(f(x),y′).

Denote this chosen by , so that we have a function .

Then the pair satisfies the lifting and identity axioms of lenses (3.1).

###### Proof.

The lifting law says that

 f(φ(x,y))=y,

which is just the condition that .

The identity law says that

 φ(x,f(x))=x.

To prove it, note that by the condition on distances,

 d(x,φ(x,f(x)))=d(f(x),f(x))=0,

therefore . ∎

Therefore, one can view a metric lens as “a submetry with a choice of liftings, compatible with composition”.

One can generalize the notion of metric lens to arbitrary pq-metric spaces. (Note that for pq-metric spaces a “submetry” only satisfies the identity law up to distance zero.)

###### Example 3.6.

A trivial, but important example of lens is given by product projections. Consider sets and , set and take the projection on the first factor,

We can construct a lens using the canonical lifting

One can interpret this as “choosing the point in which lies over , but which is at the same height as ”. The three lens laws are easily checked.

###### Example 3.7.

One can instance the construction of the previous example in any category with products, and more generally in any monoidal category with projections. For example, in the category of metric spaces (or pq-metric spaces) and 1-Lipschitz maps, we get a metric lens. This happens as long as we equip with a metric such that for every ,

 d((y,z),(y′,z))=d(y,y′).

This is true for the product metric, as well as for most of the metrics of interest on .

### 3.2 Lifting paths

We now turn to categories and weighted categories. This time we don’t just want to lift points, but also paths, or arrows. We will consider a weighted analogue of the structure which in [bryce] is called delta lens for ordinary categories and internal lens for internal categories.

We recall the original definition, and then present the weighted version.

###### Definition 3.8.

Let and be categories. Let be a functor, let be a morphism of , and let be an object of with . A lifting of at is an object of such that , together with an arrow of such that .

###### Definition 3.9.

Let and be categories. A delta lens from consists of

• A functor ;

• For each morphism of and each object of with , a chosen lifting of such that moreover , and such that

• The identities are lifted to identities, i.e. , and

• The choice of liftings preserves composition. That is, given and , consider the chosen lifting of , and then, at its codomain consider the lifting of . We require that .

A famous particular case of lens is given by split Grothendieck opfibrations, where liftings are moreover initial in some sense. See for example [bryce] for more on this.

Here is the weighted analogue of liftings.

###### Definition 3.10.

Let and be weighted categories. Let be a weighted functor, let be a morphism of , and let be an object of with . A weight-preserving lifting of at is a lifting of such that .

A weighted lens is defined similarly to a delta lens, but where all the liftings are required to be weight-preserving.

Consider now pq-metric spaces and as weighted categories as in creftypecap 2.3. A weighted lens between them consists of

• A 1-Lipschitz map ;

• For every and , a chosen lifting of to a point such that .

This is, in other words, a metric lens in the sense of creftypecap 3.4.

Also delta lenses can be composed, analogously to set-based lenses. For the details, see [bryce].

## 4 The category of couplings

Here we briefly review some basic notions of optimal transport theory, mostly to fix the notation. All the measurable spaces we consider in this work are going to be standard Borel, also known as “Polish”: measurable spaces whose sigma-algebra can be obtained as the Borel sigma-algebra of a complete, separable metric space.

For a measurable space, denote by its -algebra, and by the space of probability measures over it. If and are measurable spaces, denote by their Cartesian product, equipped with the product sigma-algebra.

###### Definition 4.1.

Let and be measurable spaces. Let and be probability measures. A coupling of and is a probability measure such that its marginals on and are and , respectively.

We denote by the space of couplings of and .

Given a coupling of and , and if and are measurable subsets, we will write as shorthand for the measure of the “cylinder”

 s((π−11(A)×Y)∩(X×π−12(B))),

where and are the product projections.

###### Definition 4.2.

Let be a measurable space, and let be a pq-metric. Let and be Borel probability measures on , and let be a coupling of and . For an integer , the -cost of the coupling is the quantity

 costk(s)\coloneqqk√∫X2c(x,y)kr(dxdy).

Given a standard Borel space , and fixing , we now construct a category whose objects are probability measures on , and whose morphisms are couplings with the weight given by the -cost.

Given , the identity coupling is given, intuitively, by “a copy of supported on the diagonal”. In rigor,

 Ip(A×A′)\coloneqqp(A∩A′).

Therefore, the cost of the identity is zero for every :

 costk(Ip)k=∫X2c(x,y)kIp(dxdy)=∫Xc(x,x)kp(dx)=0.

In order to define composites, we have to use conditional probabilities.

### 4.1 Composition

Let and be measurable spaces. Recall that a Markov kernel from to is a mapping

such that, for each , is a probability measure, and for each , is a measurable function.

Markov kernels are often used to denote conditional probabilities, and this works particularly well in standard Borel spaces. We use the following standard “conditioning” result, which will underlie most of the constructions in the rest of this work. (For a reference, see for example [fremlin, Section 452] and [bogachev, Section 10.4].)

###### Theorem 4.3.

Let and be standard Borel spaces, let and be probability measures on and respectively, and let be a coupling of and . There exists a Markov kernel from to such that for every pair measurable subsets and ,

 ∫A→s(B|x)p(dx)=s(A×B);

Moreover, such a kernel is unique -almost surely.

We call the conditional distribution of given (associated to the coupling ).

One can also condition on , obtaining a kernel from to . We denote this by

. We then have a form of Bayes’ theorem, saying that for

and measurable,

A conditional of the identity coupling is given by the indicator function:

or equivalently by the map assigning to the Dirac delta measure .

Let’s now turn to the composition of couplings. This is a standard construction, it is used in optimal transport, under the name “gluing” [villani, Chapter 1], as well as in statistics, where it’s related to the “conditional product” construction [cprod], and in categorical probability (see [simpson, Example 6.2] and the references therein, as well as the more general construction of [markov, Definition 12.8]).

###### Definition 4.4.

Let , and be probability measures on the standard Borel spaces , and , respectively. Let be a coupling of and , and let be a coupling of and . The distribution on is defined by

for all measurable subsets and .

We have that the marginal of on is

and the same can be done for , so that is a coupling of and . Moreover, does not depend on the choice of the conditionals and , since they are unique -almost everywhere.

The composite coupling gives, up to measure zero, the (Chapman-Kolmogorov) composition of conditionals. That is, for -almost surely , and for all measurable ,

 −−→t∘s(C|x)=∫Y→t(C|y)→s(dy|x).
###### Proposition 4.5.

Let be a standard Borel space, and let be a measurable pq-metric. Let , and let be couplings of and and of and , respectively. Then

 costk(t∘s)≤costk(s)+costk(t).
###### Proof.

Using Minkowski’s inequality,

 cost(t∘s) =k√∫X×Xc(x,z)k(t∘s)(dxdz) ≤k√∫X×X×X(c(x,y)+c(y,z))k\reflectbox$→\reflectbox$s$$(dx|y)→t(dz|y)q(dy) =cost(s)+cost(t).\qed Therefore, couplings and their costs form a weighted category, one for each . ###### Proposition 4.6. Let be a standard Borel space, and let be a measurable pq-metric. For each , the set of probability measures forms a weighted category, where the arrows are couplings with -cost as their weight. We will denote the weighted category by . ###### Proof. The only properties that need to be checked are associativity and unitality of the composition. Note that those need to hold strictly, not just, for example, almost surely. For unitality, let , and let . We have that for all measurable sets ,  (s∘Ip)(A×B)=∫X1A(x)→s(B|x)p(dx)=∫A→s(B|x)p(dx)=s(A×B), and similarly for , so that . For associativity, let , and consider composable couplings , , and . Then for all measurable sets ,  ((u∘t)∘s)(A×B) =∫X\reflectbox→\reflectboxs$$(A|y)−−→u∘t(B|y)q(dy) =∫X←−−t∘s(A|z)→u(B|z)m(dz) =(u∘(t∘s))(A×B).\qed

Optimization on the arrows gives the following pq-metrics,

 ck(p,q)=infs∈Γ(p,q)costk(s)=infs∈Γ(p,q)k√∫X2c(x,y)s(dxdy), (4.1)

which are the celebrated Wasserstein pq-metrics. Moreover:

• If the cost function on is symmetric, then is a symmetric weighted category, and hence (4.1) is a symmetric cost function as well (i.e. a pseudometric);

• It is well known that the infimum in (4.1) is actually attained in many cases [villani, Theorem 4.1], and in that case is an optimization-complete weighted category.

• If we see as a weighted category (with unique morphisms between any two points), the map assigning to each point the Dirac measure is an embedding of weighted categories.

## 5 Main statements

Here are now some results involving the constructions defined in the previous sections. In Section 5.1 we show that pushforwards of probability measures are functorial, and that the pushforward of an embedding is an embedding of (weighted) categories. Most importantly, in Section 5.2 we show that lenses between the underlying spaces induce (weighted) lenses between the respective categories of couplings, where the liftings are formed by suitable conditionals.

### 5.1 Pushforward measures

Recall that given a measurable function and a probability measure , the pushforward of along is the probability measure given by

 f♯p(B)\coloneqqp(f−1(B))

for all measurable subsets . The assignment gives then a function .

Examples of pushforward measures are given by the marginal projections: the marginalization is the pushforward along the product projection .

First, an auxiliary result.

###### Lemma 5.1.

Let be a measurable function between standard Borel spaces. Let be a coupling of and . A conditional for the pushforward coupling between and satisfies

 −−→f2♯s(B′|f(x))=→s(f−1(B′)|x)

for -almost all and all measurable .

###### Proof.

For all measurable subsets , we have that

 ∫f−1(B)−−→f2♯s(B′|f(x))p(dx) =∫B−−→f2♯s(B′|y)f♯p(dy) =f2♯s(B×B′) =s(f−1(B)×f−1(B′)) =∫f−1(B)→s(f−1(B′)|x)p(dx).\qed

Now we turn to pushforwards.

###### Proposition 5.2.

Let and be standard Borel spaces. Let be a measurable function. The pushforward mapping induces a functor between the respective categories of couplings, where the pushforward of a coupling is given by the map

or more briefly, .

Moreover, if and are equipped with measurable cost functions, and is 1-Lipschitz, then is a weighted functor for all .

Explicitly, given and measurable subsets the function gives,

 f2♯s(B×B′)=s(f−1(B)×f−1(B′)).
###### Proof.

For the identity coupling , we have that

 f2♯Ip(B×B′)=p(f−1(B)∩f−1(B′))=p(f−1(B∩B′))=If♯p(B×B′)

for all measurable subsets . For composition, let , let be a coupling from to , and let be a coupling between and . By Lemma 5.1,

 f2♯(t∘s)(B×B′) =(t∘s)(f−1(B)×f−1(B′)) =∫X\reflectbox$→\reflectbox$s(f−1(B)|x)→t(f−1(B′)|x)q(dx) =∫X←−−f2♯s(B|f(x))−→f2♯t(B′|f(x))q(dx) =∫Y←−−f2♯s(B|y)−→f2♯t(B′|y)f♯q(dy) =(f2♯t)∘(f2♯s)(B×B′),

again for all measurable subsets . Therefore is functorial.

Suppose now that and are equipped with cost functions and . The (-th power of the) cost of is

 cost(f2♯s)k =∫Y2cY(y,y′)kf2♯s(dydy′) =∫X2cY(f(x),f(x′))ks(dxdx′) ≤∫X2cX(x,x′)ks(dxdx′) =cost(s)k.\qed

It is easy to see that the assignment also preserves identities and composition, and so we have functors between the following (large) categories:

• For the unweighted case, from the category of standard Borel spaces and measurable maps, to the category of (small) categories and functors;

• For the weighted case, from the category of standard Borel spaces with measurable cost functions and cost-nonincreasing measurable maps, to the category of (small) weighted categories and weighted functors.

This assignment also preserves embeddings, as the next proposition shows.

###### Proposition 5.3.

Let and be standard Borel spaces, and let be a measurable embedding (i.e. an injective function, such that can be written as the pullback -algebra of along the inclusion ). Then the map is a full embedding of categories.

If moreover and are equipped with measurable cost functions and is an isometric embedding, then is an embedding of weighted categories for every .

###### Proof.

Let . Consider the restriction of to the couplings of and , call it again . We have to prove that this new mapping is a (weight-preserving) bijection. To prove that it is injective, consider and suppose that . Since is a measurable embedding, for all measurable subsets we can find subsets such that and . Therefore, for all measurable subsets ,

 r(A×A′)=r(i−1(B)×i−1(B′))=i2♯r(B×B′)=i2♯r′(B×B′)=r′(A×A′),

so that .

To prove surjectivity, let be a coupling of and . Since is a measurable embedding, for each measurable subset , the (non-inverse) image is again measurable. Construct now the coupling by

 r(A×A′)\coloneqqs(i(A)×i(A′)),

so that . Therefore is a bijection.

Suppose now that and have cost functions and , and that is an isometric embedding. Then given ,

 costk(i2♯s)k =∫Y2cY(y,y′)ki2♯r(dydy′) =∫X2cY(i(x),i(x′))kr(dxdx′) =∫X2cX(x,x′)kr(dxdx′) =costk(s)k,

which means that the bijection is weight-preserving. ∎

### 5.2 Lenses to lenses

We now come to the main result of this work. Namely, we show that measurable lenses give rise to lenses between the categories of couplings. Again, conditionals are the key construction, as the formula (5.1) will show.

###### Theorem 5.4.

Let be a measurable lens between standard Borel spaces. There is a categorical lens between and where

• The projection is the usual pushforward of measures;

• The lifting takes and a coupling whose first marginal has to equal , and returns the coupling given by

 ~φ♯(p,s)(A×A′)\coloneqq∫A∫Y1A′(φ(x,y))→s(dy|f(x))p(dx) (5.1)

for all measurable subsets .

Moreover, if and are equipped with measurable cost functions and is a measurable metric lens, is a weighted lens between and for each .

In order to better understand the formula (5.1), we can look at its conditional form,

 −−−→~φ♯(s)(A|x)=∫Y1A(φ(x,y))→s(dy|f(x)). (5.2)

First of all, note that this conditional does not depend on (the joint does). Now (5.2) says that the probability of transitioning from to a point of is intuitively obtained as follows:

• For each , we look at the probability (given by ) of going from to ;

• We lift each point to the point as prescribed by the lens . We can do so measurably;

• We construct a kernel that maps to with the same probability as maps to ;

• We look at the probability, as prescribed by this new kernel, that