# Transport type metrics on the space of probability measures involving singular base measures

We develop the theory of a metric, which we call the ν-based Wasserstein metric and denote by W_ν, on the set of probability measures 𝒫(X) on a domain X ⊆ℝ^m. This metric is based on a slight refinement of the notion of generalized geodesics with respect to a base measure ν and is relevant in particular for the case when ν is singular with respect to m-dimensional Lebesgue measure; it is also closely related to the concept of linearized optimal transport. The ν-based Wasserstein metric is defined in terms of an iterated variational problem involving optimal transport to ν; we also characterize it in terms of integrations of classical Wasserstein distance between the conditional probabilities and through limits of certain multi-marginal optimal transport problems. As we vary the base measure ν, the ν-based Wasserstein metric interpolates between the usual quadratic Wasserstein distance and a metric associated with the uniquely defined generalized geodesics obtained when ν is sufficiently regular. When ν concentrates on a lower dimensional submanifold of ℝ^m, we prove that the variational problem in the definition of the ν-based Wasserstein distance has a unique solution. We establish geodesic convexity of the usual class of functionals and of the set of source measures μ such that optimal transport between μ and ν satisfies a strengthening of the generalized nestedness condition introduced in <cit.>. We also present two applications of the ideas introduced here. First, our dual metric is used to prove convergence of an iterative scheme to solve a variational problem arising in game theory. We also use the multi-marginal formulation to characterize solutions to the multi-marginal problem by an ordinary differential equation, yielding a new numerical method for it.

## Authors

• 1 publication
• 2 publications
01/29/2020

### Multi-Marginal Optimal Transport Defines a Generalized Metric

We prove that the multi-marginal optimal transport (MMOT) problem define...
08/07/2017

12/26/2019

### Learning with Wasserstein barycenters and applications

In this work, learning schemes for measure-valued data are proposed, i.e...
02/14/2021

### Sliced Multi-Marginal Optimal Transport

We study multi-marginal optimal transport, a generalization of optimal t...
12/19/2018

### On The Chain Rule Optimal Transport Distance

We define a novel class of distances between statistical multivariate di...
09/19/2019

### On the Wasserstein Distance between Classical Sequences and the Lebesgue Measure

We discuss the classical problem of measuring the regularity of distribu...
04/01/2020

### Synchronizing Probability Measures on Rotations via Optimal Transport

We introduce a new paradigm, measure synchronization, for synchronizing ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Given probability measures and on a bounded domain , the Wasserstein distance between them is defined as the infimal value in the Monge-Kantorovich optimal transport problem; that is,

 W2(μ0,μ1):=√infγ∈Π(μ0,μ1)∫X×X|x0−x1|2dπ(x0,x1) (1)

where the infimum is over the set of joint measures on whose marginals are and .

Among the many important properties of the Wasserstein distance (reviewed in [18][19] and [20] for example) is the fact that it is a metric on the set of probability measures on . In turn, the geodesics induced by this metric, known as displacement interpolants play key roles in many problems, both theoretical and applied. Variants of displacement interpolants, known as generalized geodesics, are a natural and important tool in the analysis of problems involving a fixed base measure . For example, a variety of problems in game theory, economics and urban planning involve minimizing functionals on involving optimal transport to a fixed . In particular, we mention the optimal transport based formulation of Cournot-Nash equilibria in game theory of Blanchet-Carlier, in which parameterizes a population of players and one searches for their equilibrium distribution of strategies, parameterized by [4][3][5]. When is absolutely continuous with respect to Lebesgue measure on , these interpolants are uniquely defined, and are in fact geodesics for a metric under which is isomorphic to a subset of the Hilbert space ; that is, under this isomorphism, generalized geodesics are mapped to line segments.

On the other hand, it is natural in certain problems to consider singular base measures222For instance, in game theory problems derived from spatial economics, may represent a population of players, parametrized by their location [5]; it is often the case that the population is essentially concentrated along a one dimensional subset, such as a major highway or railroad.,and our goal here is to initiate the development of a framework to study these. We study a metric on an appropriate subset of for each choice of base measure , which we call the -based Wasserstein metric; essentially, this metric arises from minimizing the average squared distance among all couplings of and corresponding to generalized geodesics. It is also very closely related to the concept of linear optimal transport, introduced in [21]333In fact, in the case we are most interested in, when there exists a unique optimal transport between and each of the measures to be compared, the -based Wasserstein metric coincides with the metric derived from solving the linear optimal transport problem, denoted by in [21]. They differ slightly for more general measures; in this case, the -based Wasserstein metric is in fact only a semi-metric. Linear optimal transport was mainly used in [21]

for image comparison problems, and the base measure

was often taken to be discrete. In that work, and, to the best of our knowledge, in the considerable literature on linear optimal transport following from it, the theoretical properties of the metric, especially for base measures supported on lower dimensional submanifolds, to which we pay special attention below, have largely been left undeveloped. Geodesics with respect to the -based Wasserstein metric will always be generalized geodesics with respect to , but, for different structures of , very different geometry is induced. In particular, when is absolutely continuous, we obtain the Hilbert space geometry discussed above, whereas when is a Dirac mass, we obtain the Wasserstein metric, independently of the parameterizing within this class. We pay special attention to the cases in between, in particular when concentrates on a lower dimensional submanifold of , in which case the problem has a natural interplay with the unequal dimensional optimal transport problem explored in [11], [15] and [16]. In the particular case when concentrates on a line segment, we show that our metric coincides with the layerwise-Wasserstein distance introduced in [14] to analyze anisotropic data such as plants’ root shapes.

We establish three equivalent characterizations of the -based Wasserstein metric, roughly speaking:

1. as an optimal transport problem restricted to couplings which are correlated along (we take this as the definition);

2. by optimally coupling conditional probabilities of and after disintegrating with respect to optimal transport to , and

3. as limits of multi-marginal optimal transport between , and .

In many cases of interest, we establish uniqueness of the corresponding geodesics. We also establish geodesic convexity of several functionals which are often studied in optimal transport research; using the standard terminology (see Section 5 for the definition of these terms) convexity of potential energies, interaction energies and the Wasserstein distance to follow immediately from known results, while convexity of the internal energy requires a new proof. We note that this applies in particular to the layerwise-Wasserstein distance, yielding a far reaching improvement to Corollary 4.2 in [14]. We also show that when concentrates on a lower dimensional submanifold, the set of measures for which the model satisfies a strengthening of the generalized nested condition is geodesically convex (we recall that nestedness, introduced in [11] and its higher dimensional generalization from [15], are important properties of unequal dimensional optimal transport problems; when present, they greatly simplify analysis of these problems).

We also introduce a class of metrics which is in a certain sense dual to the -based Wasserstein metric, relevant in the case when the measure on is fixed, and one would like to interpolate between measures on a fixed, lower dimensional submanifold . This is often the case in a variety of applications. The seemingly natural choice, generalized geodesics with base , which essentially interpolates between Kantorovich potentials on the side, does not generally result in interpolants supported on . Here, we instead compare and interpolate between Kantorovich potentials on the side in order to compare and interpolate between measures on . This is in a certain sense complementary, or dual, to our original metric on (which involves comparing potentials on , in order to compare measures on – with an additional embedded optimization problem, since potentials on do not uniquely determine measures supported on ).

We also present two applications of the ideas introduced here. First, we identify conditions under which equilibria in certain game theoretic models are fixed points of a contractive mapping, implying uniqueness of the equilibrium (although this is easily deduced by other methods as well) and, perhaps more importantly, that it can be computed by iterating the mapping. This iteration had already been introduced as a method of computation by Blanchet-Carlier when , but without a proof of convergence, and in higher dimensions, with a proof of convergence but for simpler interaction terms [4]. Here, we prove that the relevant mapping is a contraction with respect to a variant of the dual metric described above444In fact, the actual metric used in the proof differs slightly from the dual metric in general, although they coincide under certain conditions..

Second, we use the characterization of our metric in terms of limits of multi-marginal optimal transport to establish an ordinary differential equation for solutions to the multi-marginal problem and analyze its properties. We note that, even for absolutely continuous base measures, this connection between generalized geodesics and multi-marginal problems does not seem to have been observed before. Since the initial condition in our differential equation arises from solving two marginal optimal transport problems (which are generally much easier to solve than multi-marginal problem ), this ODE yields a numerical method to solve the multi-marginal problem which has certain advantages over existing methods.

The manuscript is organized as follows. In the next section we introduce the -based Wasserstein metric, establish several basic properties of it, and also introduce our class of dual metrics. In Section 3, we recall relevant facts about unequal dimensional optimal transport, and prove a new lemma on the structure of optimal plans which will be crucial in subsequent sections. In the fourth section, we identify conditions under which the variational problem arising in the definition of the -based Wasserstein metric has a unique solution, and establish a result on the structure of geodesics for the -based Wasserstein metric. We use this structure in Section 5 to establish geodesic convexity results. In the sixth section, we identify conditions under which certain game theoretic equilibria can be characterized by fixed points of a contractive mapping with respect to an appropriate metric, while the last section is reserved for the development of an ordinary differential equation description of multi-marginal optimal transport.

## 2 Definition and basic properties

In what follows, will be a fixed reference measure on a bounded domain . Given measures , we denote by the set of probability measures on whose marginals are and , for and by the set of probability measures on with and as marginals. For a measure , we will denote its first, second and third marginals by and , respectively. Similarly, we will denote by , and its projections onto the appropriate product ; for example, is the push forward of under the mapping .

We let be the set of optimal couplings between and with respect to optimal transport for the quadratic cost function; that is:

 Πopt(ν,μi):=argminπ∈Π(ν,μi)∫X×X|xi−y|2dπ(xi,y) (2)

where denotes the euclidean norm. We note that we are especially interested here in the case where is singular with respect to Lebesgue measure. Even in this case, will very often consist of a single probability measure; in fact, by Brenier’s theorem this is the case as soon as is absolutely continuous with respect to Lebesgue measure [7]. It will turn out that our notion of distance below is not a metric on all of , but is when restricted to the set of probability measures on for which the solution to the optimal transport problem to is unique; that is, the set such that is a singleton.

We can then define our metric with base point as follows.

###### Definition 1.

Let . For , we define the -based Wasserstein distance as

 Wν(μ1,μ0):=√infγ∈P(X×X×X)|γyxi∈Πopt(μi,ν),i=0,1∫X×X×X|x0−x1|2dγ(y,x0,x1). (3)

Note that the glueing lemma (see [18][Lemma 5.5], for example) implies the existence of a such that for ; therefore, is well defined. Standard arguments imply the existence of a minimizing in (3).

###### Remark 2.

This definition is closely related to the concept of linear optimal transport, introduced in [21]. The difference is that in linear optimal transport, a fixed optimal transport is selected for each (see equation (3) in [21]), whereas in the definition of one minimizes over the entire set . For , the two concepts clearly coincide, and it is on this set that yields a metric (see Lemma 3 below). Outside of this set, still yields a semi-metric, whereas linear optimal transport might be better described as defining a metric on the selected , since it is dependent on these choices.

The proof of the following Lemma is very similar to the proof that the classical Wasserstein distance is a metric (see for example, [18][Proposition 5.1]). Recall that a semi-metric satisfies the symmetry and identity of indiscernibles axioms in the definition of a metric, but does not satisfy the triangle inequality.

###### Lemma 3.

is a semi-metric on . It is a metric on .

###### Proof.

It is immediate that , with equality if and only if , and that . Therefore, is a semi-metric.

It remains to show that is a metric on ; we must only verify the triangle inequality. Let and belong to . Let and be optimal couplings in (3); that is and . Now note that both and are both optimal transports between and ; by the uniqueness assumption, we therefore have . The glueing lemma (the version in [19], Lemma 7.6, is sufficiently general) then implies the existence of a measure such that and . We note that satisfies and . Therefore, we have

 Wν(μ1,μ2) ≤ √∫X×X×X|x1−x2|2dγyx1x2(y,x1,x2) = √∫X×X×X×X|x1−x2|2dγ(y,x0,x1,x2) = ||x1−x2||L2(γ) ≤ ||x1−x0||L2(γ)+||x2−x0||L2(γ) = √∫X×X×X×X|x1−x0|2dγ(y,x0,x1,x2)+√∫X×X×X×X|x0−x2|2dγ(y,x0,x1,x2) = √∫X×X×X|x1−x0|2dγ1(y,x0,x1)+√∫X×X×X|x0−x2|2dγ2(y,x0,x2) = Wν(μ0,μ1)+Wν(μ0,μ2)

The following example confirms that the triangle inequality can fail if we do not restrict to .

###### Example 4.

(Failure of the triangle inequality outside of ) Let , and take , , , , for some . Then any measure is optimal between and . The only optimal transport plan between and , on the other hand, maps to and to . Similarly, the only optimal transport plan between and , maps to and to . We therefore compute that

 Wν(μ0,μ1)=ε=Wν(μ0,μ2)

but

 Wν(μ1,μ2)=2.

Therefore, for , and so the triangle inequality fails.

Recall that, in general, a curve parametrized by in a metric space is a minimizing geodesic if for all . It is well known that for any , where is absolutely continuous with respect to Lebesgue measure, there is a unique Wasserstein geodesics joining them, also know as the displacement interpolant and given by , where is the Brenier map between and .

We pause now to describe the -based Wasserstein distance for several simple examples of choices for .

###### Example 5.

Let be absolutely continuous with respect to -dimensional Lebesgue measure on . Then by Brenier’s theorem [7] there exist unique optimal couplings of the form in , and therefore the only measure with for is . We then have

 Wν(μ0,μ1)=∫X|~T0(y)−~T1(y)|2dν(y)

so that the metric space is isometric to a subset of the Hilbert space . Geodesics for this metric take the form ; these are the standard generalized geodesics found in, for example, Definition 7.31 of [18].

###### Example 6.

At the other extreme, suppose is a Dirac mass. Then for any coupling , the measure has . Since

 ∫X×X×X|x0−x1|2dγ(y,x0,x1)=∫X×X|x0−x1|2dπ(x0,x1)

we have

 W2ν(μ0,μ1)=infπ∈Π(μ0,μ1)∫X×X|x0−x1|2dπ(x0,x1)

which is exactly the standard quadratic Wasserstein metric.

In this paper, we will be especially interested in the cases in between these extremes, when is singular with respect to Lebesgue measure but not a Dirac mass. One of the simplest such cases is the following example.

###### Example 7.

Suppose that concentrates on a line segment and is absolutely continuous with respect to one dimensional Hausdorff measure. It then turns out that the -based Wasserstein distance coincides with the layerwise-Wasserstein distance introduced in [14]. The proof of this fact is slightly more involved than the previous two examples, and is included as a separate proposition below (Proposition 10 – note that the definition of the layerwise-Wasserstein distance is recalled in equation (6) below as well).

As we show below, is also related to the following multi-marginal optimal transport problem. Fix and set

 MMεν(μ0,μ1):=infγ∈Π(ν,μ0,μ1)∫X×X×X[ε|x0−x1|2+|x0−y|2+|x1−y|2]dγ(y,x0,x1). (4)

The following result establishes two different characterizations of the -based Wasserstein distance.

###### Proposition 8.

The following holds

1. , where is the conditional probability given of the optimal coupling between and .

2. Furthermore, any weak limit point as of minimizers of the multi-marginal problem (4) is an optimal coupling between and for the problem (3) defining .

###### Proof.

The first part is almost immediate: for fixed , if is disintegrated with respect to , , then is equivalent to the conditional probability on having the conditional probabilities of the as marginals for almost every fixed . We can therefore rewrite the integral in (3) as . This is minimized by taking to be an optimal coupling between the for almost every , which yields the desired formula.

Turning to the second point, let be any competitor in the definition of ; that is, assume for . Since this clearly implies , optimality of in the multi-marginal problem yields

 ∫X×X×X[ε|x0−x1|2+|x0−y|2+|x1−y|2]dγε(y,x0,x1)≤∫X×X×X[ε|x0−x1|2+|x0−y|2+|x1−y|2]dγ(y,x0,x1). (5)

Taking the limit as gives

 ∫X×X×X[|x0−y|2+|x1−y|2]d¯¯¯γ(y,x0,x1)≤∫X×X×X[|x0−y|2+|x1−y|2]dγ(y,x0,x1),

or

 ∫X×X|x0−y|2d¯¯¯γyx0(y,x0)+∫X×X|x1−y|2d¯¯¯γyx1(y,x1) ≤∫X×X|x0−y|2]dγyx0(y,x0)+∫X×X|x1−y|2dγyx1(y,x1)

which immediately implies the optimality of the two-fold marginals of in (2), for .

Furthermore, the optimality of the two-fold marginals of , means that ; combined with (5), this implies that we must have, for all ,

 ∫X×X×X|x0−x1|2dγε(y,x0,x1)≤∫X×X×X|x0−x1|2dγ(y,x0,x1).

Passing to the limit gives

 ∫X×X×X|x0−x1|2d¯¯¯γ(y,x0,x1)≤∫X×X×X|x0−x1|2dγ(y,x0,x1).

Since this holds for every with , it implies the desired result.

###### Remark 9.

In this paper, we will pay special attention to the case where is concentrated on a lower dimensional submanifold, parametrized by , where , with , and is a smooth injection. In this case, by a slight abuse of notation, we will often consider to be a probability measure on , and the quadratic optimal transport problem in the definition (2) is equivalent to optimal transport between and with cost function , or, equivalently, .

We next consider the case when concentrates on a line; we show below that in this case, corresponds to the layerwise-Wasserstein metric from [14]. Recall that the layerwise-Wasserstein distance between measures is given by

 d2LW(μ0,μ1):=W22(μV0,μV1)+∫10W22(~μl0,~μl1)dl (6)

where the are the vertical marginals of the , are rescaled versions of the , defined by where

is the cumulative distribution function of

, and is disintegrated with respect to its uniform vertical marginal (see [14] for a more detailed description).

###### Proposition 10.

Suppose is concentrated on the line segment and is absolutely continuous with respect to -dimensional Hausdorff measure. Then is equal to the layerwise Wasserstein metric.

###### Proof.

Using the framework described in Remark 9 with and , we consider optimal transport between on and on with cost function . This is an index cost in the terminology of [15], and so the optimal transport problem can be solved semi-explicitly:

The level set of the optimal transport map

consists of the hyperplane

where the fixed is chosen so that

 μi({(x1i,x2i,...xni):x1i≤zi(y)})=ν(−∞,y). (7)

By the first part of Proposition 8, the optimal arrangement in (3) then pairs the conditional probability on with the corresponding conditional probability on .

By Proposition 8, we then have

 W2ν(μ0,μ1)=∫YW22(μy0,μy1)dν(y)

Note that , so that , where the measures are measures on . Letting

be the quantile function of

, and changing variables via , we have

 W2ν(μ0,μ1) = ∫Y[W22(^μy0,^μy1)+(z0(y)−z1(y))2]dν(y) = ∫10[W22(^μF−1(l)0,^μF−1(l)1)+(z0(F−1(l))−z1(F−1(l))2]dl

Note that the second term is exactly the Wasserstein distance between the first marginals of and . Since the conditional probabilities are both supported on , and , we have . The last line above is then exactly the definition of the layerwise-Wasserstein metric. ∎

As was noted in [14], in two dimensions the layerwise-Wasserstein distance corresponds to the Knothe-Rosenblatt rearrangement, and so we immediately obtain the following.

###### Corollary 11.

Let . Then, under the assumptions in the preceding Proposition, the optimal rearrangement in (3) satisfies , where is the Knothe-Rosenblatt rearrangement.

###### Remark 12.

One can recover a similar characterization of the Knothe-Rosenblatt rearrangement in higher dimensions by looking at iterated optimization problems along an orthogonal basis. This corresponds to the limit of a multi-marginal problem where the interactions are weighted iteratively, as we show in Appendix A. This is closely related to the main result in [8], where the Knothe-Rosenblatt rearrangement is characterized as a limit of optimal transport maps with anisotropic costs.

### 2.1 Dual metrics

We now define a class of metrics on a lower dimensional space which are dual to in a certain sense.

Given a reference measure on and a fixed on with , absolutely continuous with respect to and dimensional Lebesgue measure, respectively, and a cost function , assume that satisfies the twist condition; that is, injectivity of for each fixed .We define

 W∗μ,σ,c,p(ν0,ν1):=||Dv0−Dv1||Lp(σ) (8)

where is the -concave Kantorovich potential corresponding to the optimal transport problem between and .

###### Proposition 13.

is a metric on the set of probability measures on which are absolutely continuous with respect to .

###### Proof.

Since the norm clearly induces a metric, all that needs to be proven is that for each measure , the gradient of the Kantorovich potential is in , and that the mapping is a bijection. Since the potential is Lipschitz, absolute continuity of with respect to (and therefore Lebesgue measure) ensures that it exists and is uniquely determined almost everywhere.

Absolute continuity of with respect to , and therefore Lebesgue measure, ensures that is uniquely determined by [18, Proposition 1.15].

On the other hand, it is well known that the twist condition ensures that the unique optimizer to the optimal transport problem between and concentrates on a graph, , where is uniquely determined by the gradient of the potential via the equation . Since , we have that is uniquely determined by and therefore . The -concave Kantorovich potential is then clearly in one to one correspondence with and, therefore, it uniquely determines . Thus and are in one-to-one correspondence, as desired.

We emphasize that the motivating example of this proposition is when is quadratic between and in a lower dimensional submanifold, using the convention in Remark 9. In this case, is dual to the based Wasserstein metric, in the sense that it relies on the Kantorovich potential on the target, rather than source, side.

The metric arises naturally in certain variational problems among measures on involving optimal transport to the fixed measure on , which is more naturally described by the potentials on than on . We expand on this viewpoint in section 6 below, where we establish a fixed point characterization of solutions to a class of these problems. In the proof, we actually use a metric which corresponds to using the mass splitting , introduced in [15] and recalled in equation (16) below, rather than potentials (these are the same when the problem satisfies the nestedness condition introduced in [15], as is the case for the optimal under appropriate conditions identified in [16] and recalled in (18) below, but not in general).

## 3 Unequal dimensional optimal transport

When the base measure concentrates on a lower dimensional submanifold, the definition (3) of the -Wasserstein metric relies on unequal dimensional optimal transport. We briefly review the known theory on this topic here, and establish a lemma which we will need later on.

For as in Remark 9, and we define, as in [15],

 X1(y,p):={x∈Z:Dyc(x,y)=−x⋅Df(y)=p}. (9)

Note that, since our cost is linear in , each is an affine -dimensional submanifold of , provided that has full rank. For a fixed , each is parallel.

The following lemma shows that when we disintegrate the optimizer with respect to , the conditional probabilities are absolutely continuous with respect to -dimensional Hausdorff measure, and provides a formula for their densities.

###### Lemma 14.

Consider optimal transport between probability measures on and on , both absolutely continuous with respect to Lebesgue measure, with and cost function where is injective and nondegenerate, that is has full rank everywhere.

Then there is a unique optimal measure in (3), which concentrates on the graph of a function . The Kantorovich potential is differentiable for almost every , and at each such point is concentrated on the affine manifold . The conditional probability (which is the marginal of from ) corresponding to the disintegration of with respect to is absolutely continuous with respect to - dimensional Hausdorff measure on for almost every . Furthermore, there is a subset of full volume such that this conditional probability is concentrated on with density given by

 ¯¯¯μy(x)=¯¯¯μ(x)/[¯¯¯ν(y)JTy