# Quantitative Stability and Error Estimates for Optimal Transport Plans

Optimal transport maps and plans between two absolutely continuous measures μ and ν can be approximated by solving semi-discrete or fully-discrete optimal transport problems. These two problems ensue from approximating μ or both μ and ν by Dirac measures. Extending an idea from [Gigli, On Hölder continuity-in-time of the optimal transport map towards measures along a curve], we characterize how transport plans change under perturbation of both μ and ν. We apply this insight to prove error estimates for semi-discrete and fully-discrete algorithms in terms of errors solely arising from approximating measures. We obtain weighted L^2 error estimates for both types of algorithms with a convergence rate O(h^1/2). This coincides with the rate in [Berman, Convergence rates for discretized Monge–Ampère equations and quantitative stability of Optimal Transport, Theorem 5.4] for semi-discrete methods, but the error notion is different.

## Authors

• 24 publications
• 12 publications
10/23/2019

### Two Stage Algorithm for Semi-Discrete Optimal Transport on Disconnected Domains

In this paper we present a two-stage algorithm to solve the semi-discret...
06/10/2022

### Meta Optimal Transport

We study the use of amortized optimization to predict optimal transport ...
03/09/2022

### A new implementation of the geometric method for solving the Eady slice equations

We present a new implementation of the geometric method of Cullen Pu...
07/04/2021

### Rates of Estimation of Optimal Transport Maps using Plug-in Estimators via Barycentric Projections

Optimal transport maps between two probability distributions μ and ν on ...
01/20/2021

### A Damped Newton Algorithm for Generated Jacobian Equations

Generated Jacobian Equations have been introduced by Trudinger [Disc. co...
07/15/2020

### 𝒲_∞-transport with discrete target as a combinatorial matching problem

In this short note, we show that given a cost function c, any coupling π...
08/18/2021

### Quantitative Uniform Stability of the Iterative Proportional Fitting Procedure

We establish the uniform in time stability, w.r.t. the marginals, of the...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

The optimal transportation problem (OT), first proposed by Monge [32] and later generalized by Kantorovich [27], has been extensively studied from the theoretical point of view [10, 11, 13, 14, 15, 23, 42, 45]. It has a wide variety of applications in economics [17, 4], fluid mechanics [11], meteorology [18, 19], image processing [41, 40], transportation [46, 16], and optics design [24, 39]. In this section, we briefly introduce the basic theory and numerical methods of OT, and point out our contribution to the numerical analysis of OT in this paper.

### 1.1. Introduction to Optimal Transport

If are subdomains of and

are given probability measures, the Monge formulation of OT is to find an optimal map

which minimizes the transport cost, i.e.

 (1.1) infT:T#μ=ν∫Xc(x,T(x))dμ(x),

where the cost function is given. Hereafter, denotes the push-forward of measure through , namely means that for any measurable set , we have , or equivalently

 (1.2) ∫Xϕ(T(x))dμ(x)=∫Yϕ(y)dν(y)

for all continuous functions . Since this problem is difficult to study, and sometimes the optimal map does not exist, Kantorovich [27] generalized the notion of transport map, and considered the following problem:

 (1.3) infγ∈Π(μ,ν)∫X×Yc(x,y)dγ(x,y),

where is the set of transport plans between and , namely

 Π(μ,ν):={γ∈P(X×Y):(π1)#γ=μ,(π2)#γ=ν}.

Here and are the projections defined as . This definition implies that for and any measurable sets , we have . For and , let be the set of probability measures with bounded moment, i.e.

 (1.4) Pp(Rd):={μ∈Pp(Rd):∫Rd|x|pdμ(x)<∞},

which clearly contains those probabilities measures with bounded support. If , then for all because

 (1.5) ∫R2d(|x|p+|y|p)dγ(x,y)=∫Rd|x|pdμ(x)+∫Rd|y|pdν(y)<∞.

Moreover, if , the Kantorovich problem (1.3) always has a finite minimum value. In fact, this defines the well-known Wasserstein metric on :

 Wp(μ,ν):=(minγ∈Π(μ,ν)∫Rd×Rd|x−y|pdγ(x,y))1/p∀μ,ν∈Pp(Rd).

In addition, for quadratic cost, i.e. , [10] shows that there exists a unique optimal map for a convex function ( is uniquely determined -a.e.) provided that gives no mass to -surfaces of class . If are absolutely continuous measures with respect to Lebesgue measure with densities , then

 μ(S)=∫Sf(x)dx=ν(∇φ(S))=∫∇φ(S)g(y)dy=∫Sg(∇φ(x))detD2φ(x)dx,

whence satisfies the generalized Monge-Ampère equation

 (1.6) g(∇φ(x))detD2φ(x)=f(x)∀x∈X,

with second type boundary condition [21, section 4.6] provided that the domains are chosen so that:

 (1.7) X={x∈Rd:f(x)>0},Y={y∈Rd:g(y)>0}.

The optimal transport map induces an optimal transport plan given by , where denotes the identity map, namely

 γ(A)=μ{x∈X:(x,T(x))∈A}

for all measurable sets . Similar results also hold for any , and for general costs satisfying a twist condition [45, Chapter 10].

### 1.2. Numerical Methods for OT and Our Contribution

In view of the numerous and diverse applications of OT, developing fast and sound numerical methods for OT is of paramount importance. Several algorithms do exist, ranging from those inspired by PDEs and variational techniques for absolutely continuous measures and [1, 5, 7, 8, 22, 26, 30, 38] to those approximating one or both measures by Dirac masses and then solving the approximate OT [6, 12, 20, 28, 29, 31, 36, 44, 43]. However, intrinsic difficulties make their numerical analysis far from complete.

In this paper, we develop stability and error analyses for quadratic costs that accounts for the effect of approximating measures and by Dirac masses. To this end, we extend the stability estimates of Gigli [23], originally derived for optimal maps, to optimal plans . We do not develop new techniques to approximate .

If at least one of the two measures is discrete, then it is possible to solve for the exact transport map numerically without further approximations, which includes semi-discrete algorithms [2, 28, 29, 31] and fully-discrete methods [12, 44, 43]. For these methods, since their errors solely come from approximating absolutely continuous measures with discrete measures, our results directly lead to error estimates. We also point out that an error estimate for a semi-discrete method was recently obtained by Berman [9], but there are no such results for fully-discrete schemes.

We now describe our results. Let be compact sets of and be absolutely continuous measures with respect to the Lebesgue measure with densities , respectively. Let

 μh=N∑i=1fiδxi,νh=M∑j=1gjδyj

be approximations of governed by a parameter , which means the Wasserstein distances satisfy

 W2(μ,μh)≤h,W2(ν,νh)≤h.

To make this more concrete, we briefly introduce one way to obtain an approximation of . Choose such that , where denotes the open ball with radius centered at . Then consider Voronoi tessellations: let

 (1.8) Vi:={x∈X:|x−xi|=min1≤j≤N|x−xj|}

be the Voronoi cell for and

 fi:=μ(Vi),μh:=N∑i=1fiδxi.

Notice that we may assume since otherwise for we could just drop the Dirac measure at . Define the map such that for all . It can be easily checked that this map is well defined and satisfies a.e. in because the intersection between and for is of zero Lebesgue measure and ; in particular for all . Therefore is a transport map from to and thus

 (1.9) W2(μ,μh)≤(∫X|x−Uh(x)|2dμ(x))1/2≤(∫Xh2dμ(x))1/2=h.

Similarly, we could approximate the absolutely continuous measure with density and bounded support with satisfying .

The semi-discrete algorithms solve the OT between and upon finding a nodal function such that

 (1.10) fi=ν(Fi)=∫Fig(y)dy,Fi:=∂φh(xi)∩Y,

where the discrete subdifferential is given by

 ∂φh(xi):={y∈Rd:φh(xj)≥φh(xi)+y⋅(xj−xi)∀j=1,⋯,N}.

This type of discretization was introduced by Oliker and Prussner [37] for Dirichlet boundary conditions whereas error estimates have been derived in [35, 33]. The set coincides with the subdifferential of the convex envelope associated with [35, Lemma 2.1]. The function is piecewise linear and induces a mesh with nodes whose elements may be quite elongated; see [35, Section 2.2]. We also refer to Berman [9], who has derived error estimates for the second type boundary condition involving .

Denote the barycenter of with respect to the measure by , namely , and define the map such that for all . Under proper assumptions on measures , to be stated in Section 2, we prove a weighted error estimate in creftypecap 4.2 for the exact optimal transport map

 (1.11) (N∑i=1fi|T(xi)−Th(xi)|2)1/2≤C(μ,ν)h1/2.

This rate of convergence coincides with that in [9, Theorem 5.4] for , but the error notion is different; we refer to Section 4 for details. Our approach is taylored to discrete transport plans and thus extends to fully-discrete methods.

Fully-discrete methods aim to find the discrete optimal transport plan

 γh=N∑i=1M∑j=1γh,ijδxiδyj

between and through the constrained minimization problem

 (1.12) minγhN,M∑i,j=1γh,ijcij:γh,ij≥0,N∑i=1γh,ij=gj,M∑j=1γh,ij=fi.

If we construct the map from for , then we also obtain a weighted error estimate in creftypecap 5.1 for the optimal map

 (1.13) (N∑i=1fi|T(xi)−Th(xi)|2)1/2≤C(μ,ν)h1/2,

under suitable assumptions on measures described in Section 2. This is a new error estimate for fully-discrete schemes, and the convergence rate in (1.13) is the same as (1.11) for semi-discrete methods.

### 1.3. Outline

In Section 2 we introduce the notion of -regularity and show that it leads to -regularity of the transport map . We prove in Section 3 that regularity implies a stability bound characterizing how the optimal transport plan changes under perturbations of both and ; this hinges on an idea from [23]. We measure the change of transport plans using either a weighted norm in creftypecap 3.5 or the Wasserstein metric in creftypecap 3.9. We apply the stability bound to derive error estimates in Sections 4 and 5. For semi-discrete schemes, we obtain weighted error estimates in creftypecap 4.1 and creftypecap 4.2 of Section 4 with a convergence rate . We also compare our geometric estimate of creftypecap 4.2 with a similar one due to Berman [9]; however, the two error notions are different. Moreover, we obtain in creftypecap 5.1 of Section 5 an entirely new error estimate of order for fully-discrete schemes with errors measured in both the weighted norm and the Wasserstein distance.

## 2. Regularity and Nondegeneracy

We now discuss the assumptions we make on measures in order to prove quantitative stability bounds for optimal transport plans and error estimates for numerical methods.

###### Assumption 2.1 (λ-regularity).

We say that is regular for if is a convex function such that is the optimal transport map from to and is smooth in the sense that:

 (2.1) φ((1−t)x0+tx1)+t(1−t)λ2|x0−x1|2≥(1−t)φ(x0)+tφ(x1),

for any and , i.e. is convex. We also say is regular if there exists a such that is regular.

We will see below that (2.1) implies ; note that if , then for all . Although we require to be defined in the whole in creftypecap 2.1, this is not restrictive because any convex function defined in a bounded convex set can be extended as in [21, Theorem 4.23]

 Eφ(z)=supx∈X,p∈∂φ(x)φ(x)+⟨p,z−x⟩.

Then one can show that in , and satisfies (2.1) for any if (2.1) holds for and any .

Now we introduce the Legendre transform of , which is defined by

 (2.2) φ∗(y)=supz∈Rd⟨y,z⟩−φ(z).

Since is convex, given we readily get for all

 φ(x)+⟨y,z−x⟩≤φ(z)⇒⟨y,z⟩−φ(z)≤⟨y,x⟩−φ(x)

whence in view of (2.2) the following two key properties of are valid:

 φ∗(y)=⟨y,x⟩−φ(x)<+∞∀y∈∂φ(x),

and if and only if . To see that implies , let be arbitrary and notice that (2.2) for yields

 φ∗(y)+⟨x,z−y⟩=⟨x,z⟩−φ(x)≤φ∗(z)⇒x∈∂φ∗(y).

Consequently, if and are of class , then is the inverse of the transport map . Moreover,

 (2.3) −∞<∫Yφ∗(y)dν(y)<+∞,

provided is –regular. In fact, since , in view of (1.2) we have

 ∫Yφ∗(y)dν(y)=∫Xφ∗(∇φ(x))dμ(x)=∫X(⟨∇φ(x),x⟩−φ(x))dμ(x).

We exploit the convexity of and to obtain

 φ(0)−λ2|x|2≤φ(x)−⟨∇φ(x),x⟩≤φ(0).

Since , (1.4) imply that the following integrals are finite and yield (2.3)

 −∫Xφ(0)dμ(x)≤∫X(⟨∇φ(x),x⟩−φ(x))dμ(x)≤∫X(−φ(0)+λ2|x|2)dμ(x).

The following lemma states that –regularity of a convex function is equivalent to uniform convexity of its Legendre transform [3, Proposition 2.6].

###### Lemma 2.2 (λ–regularity vs uniform convexity).

If is convex, then the following statements are valid:

1. If is smooth, then its Legendre transform must satisfy

 (2.4) φ∗((1−t)y0+ty1)+t(1−t)2λ|y0−y1|2≤(1−t)φ∗(y0)+tφ∗(y1)

for all and , i.e. is convex.

2. If is convex, then its Legendre transform is smooth.

###### Proof.

For completeness, we repeat the proof of [3, Proposition 2.6] to show (a); the proof of (b) is similar. Given and , for any we set and and use (2.2) to write

 (1−t)φ∗(y0) +tφ∗(y1)≥(1−t)⟨y0,x0⟩+t⟨y1,x1⟩−(1−t)φ(x0)−tφ(x1) ≥(1−t)⟨y0,x0⟩+t⟨y1,x1⟩−φ(xt)−t(1−t)λ2|v|2 =⟨yt,x0+tv⟩−φ(x0+tv)+t(1−t)⟨y1−y0,v⟩−t(1−t)λ2|v|2.

We exploit that are arbitrary. Taking supremum with respect to , we get

 (1−t)φ∗(y0)+tφ∗(y1)≥φ∗(yt)+t(1−t)⟨y1−y0,v⟩−t(1−t)λ2|v|2.

Maximizing the last two terms with respect to , we obtain and

 (1−t)φ∗(y0)+tφ∗(y1)≥φ∗(yt)+t(1−t)2λ|y0−y1|2,

which is the asserted inequality (2.4). That is convex is a straightforward calculation that completes the proof. ∎

###### Lemma 2.3 (W2∞–regularity).

A –smooth convex map is of class and the Lipschitz constant of is , namely

 (2.5)
###### Proof.

According to creftypecap 2.2(a), the function is convex. If and , then

 x1−1λy1∈∂ψ(y1)⇒ψ(y2)≥ψ(y1)+⟨x1−1λy1,y2−y1⟩, x2−1λy2∈∂ψ(y2)⇒ψ(y1)≥ψ(y2)+⟨x2−1λy2,y1−y2⟩.

Adding the two inequalities and rearranging terms yields

 1λ|y1−y2|2≤⟨y1−y2,x1−x2⟩≤|y1−y2||x1−x2|.

This implies the assertion (2.5). ∎

It is now apparent from creftypecap 2.3 that the constant dictates the stability of the transport map ; is also the stability constant that appears in our error estimates of Sections 4 and 5. If , then (2.5) is equivalent to for all . As we have already mentioned at the end of Section 1.1, the optimal map from to for quadratic cost is given by under suitable assumptions on . By Caffarelli’s regularity results [13, 14, 15], if both and are uniformly convex domains of with boundary and the densities of are bounded away from and , then the solution of the corresponding second boundary value Monge-Ampère problem (1.6) is of class . This implies that is regular for some if we extend to . Moreover, the same assumptions imply that is regular for the Legendre transform of and some .

## 3. Quantitative Stability of Optimal Transport Plans

In this section, we generalize the quantitative stability bounds of Gigli [23, Corollary 3.4] for optimal transport maps, and show some consequences of our theorem under suitable assumptions on measures and . They will be useful in Sections 4 and 5 to derive error estimates. The stability estimate in [23] is based on bounding the difference between an optimal transport map and another feasible transport map by the difference between their transport costs. Proposition 3.1 below is just a generalization of this important property in that it replaces transport maps by transport plans.

###### Proposition 3.1 (stability of transport plans).

Let be regular for , and be the optimal transport map from to . Then, for any transport plan , there holds

 (3.1) ∫X×Y|y −T(x)|2dγ(x,y) ≤λ(∫X×Y|y−x|2dγ(x,y)−∫X|T(x)−x|2dμ(x)).
###### Proof.

We proceed in the same way as in [23]. If is the Legendre transform of , then is finite from (2.3). Since , using (1.2), we have

 0=∫Yφ∗(y)dν(y)−∫Xφ∗(T(x))dμ(x)=∫X×Y(φ∗(y)−φ∗(T(x))) dγ(x,y).

In view of creftypecap 2.3 (-regularity), the map is of class . It thus follows that for all because . Using

 2⟨z,z−y⟩=|z|2−|y2|+|z−y|2∀y,z∈Rd,

together with creftypecap 2.2 (-regularity vs uniform convexity), namely is convex, we further obtain that

 0 =∫X×Y(φ∗(y)−φ∗(T(x))) dγ(x,y) ≥∫X×Y⟨x, y−T(x)⟩ dγ(x,y)+12λ∫X×Y|y−T(x)|2 dγ(x,y).

Noticing that

 2⟨x, y−T(x)⟩=|T(x)−x|2−|y−x|2−|T(x)|2+|y|2

we deduce

 2∫X×Y⟨x, y −T(x)⟩ dγ(x,y)=∫X|T(x)−x|2dμ(x) −∫X×Y|y−x|2dγ(x,y)−∫X|T(x)|2dμ(x)+∫Y|y|2dν(y).

In view of (1.2) the last two terms are equal and cancel out. Consequently,

 2∫X×Y⟨x, y−T(x)⟩ dγ(x,y)=∫X|T(x)−x|2dμ(x)−∫X×Y|y−x|2dγ(x,y),

whence

 0≥∫X|T(x)−x|2dμ−∫X×Y|y−x|2dγ(x,y)+1λ∫X×Y|y−T(x)|2 dγ(x,y).

Rearranging the equation above gives (3.1). ∎

###### Remark 3.2 (interpretation of (3.1)).

Notice that the right hand side in (3.1) without factor is the difference of transport costs between the given transport plan and the optimal transport map . The factor acts as a stability constant.

###### Remark 3.3 (stability of transport maps).

We point out that the left hand side of (3.1) can be seen as the square of a weighted -error between the transport plan and the optimal map . To understand this, notice that if the plan in creftypecap 3.1 is induced by a map , i.e. , then (3.1) can be written as

 ∫X|S(x)−T(x)|2dμ(x)≤λ(∫X|S(x)−x|2dμ(x)−∫X|T(x)−x|2dμ(x)),

which is the same as [23, Proposition 3.3]. This is an estimate of .

Let us recall the gluing lemma for measures (see [45, p.23]), which plays an important role in proving the triangle inequality of Wasserstein distance in the theory of optimal transport. We refer to [42, Lemma 5.5] for a proof.

###### Lemma 3.4 (gluing of measures).

Let be probability spaces for , with , and , . Then there exists (at least) one measure such that and , where is the projection defined as .

We use this lemma to glue the measures with their approximations as follows. Given optimal transport plans between and , between and , and between and , we let be a probability measure so that

 (3.2) (π1,2)#Γh=αh,(π2,3)#Γh=γh,(π3,4)#Γh=βh,

where and ; creftypecap 3.4 guarantees the existence of . This in particular implies that projections on the coordinate satisfy

 (3.3) (π1)#Γh=μ,(π2)#Γh=μh,(π3)#Γh=νh,(π4)#Γh=ν,

along with . Moreover, the following formula holds

 (3.4) ∫X1×X2×X3×X4F(x1,x2)dΓh(x1,x2,x3,x4)=∫X1×X2F(x1,x2)dαh(x1,x2),

for and similar expressions are valid for , and . If , and , then according to (1.5).

Now we state and prove our main perturbation estimates for transport plans.

###### Theorem 3.5 (perturbation of optimal transport plans).

Let be the optimal transport map from to and be regular for . Let be approximations of , let be two transport plans between and with -errors

 (3.5) eαh:=(∫X2|x−x′|2dαh(x,x′))12,eβh:=(∫Y2|y′−y|2dβh(y′,y))12,

and let . If is an optimal transport plan between and , then

 (3.6) (∫X×Y|T(x′)−y′|2dγh(x′,y′))12≤2λ12 e12h(W2(μ,ν)+eh)12+λeαh+eβh.
###### Remark 3.6 (interpretation of (3.6)).

We emphasize that, in view of Remark 3.3 (stability of transport maps), the left hand side of (3.6) can be viewed as the square of the -error between the optimal map and the transport plan . The quantity is a measure of the approximation errors between and . A typical choice of is to take the optimal transport plans between and respectively, which implies .

###### Remark 3.7 (non-uniqueness).

In general, an optimal transport plan between and may be neither unique nor induced by a map, no matter how small