    # Two Stage Algorithm for Semi-Discrete Optimal Transport on Disconnected Domains

In this paper we present a two-stage algorithm to solve the semi-discrete Optimal Transport Problem in the case where the support of the source measure is disconnected. We establish global linear convergence and local superlinear convergence. We also find convergence of the associated Laguerre cells in the vein of <cit.>.

## Authors

08/30/2019

### A Newton algorithm for semi-discrete optimal transport with storage fees and quantitative convergence of cells

In this paper we will continue analysis of the semi-discrete optimal tra...
03/02/2020

### Optimal transport: discretization and algorithms

This chapter describes techniques for the numerical resolution of optima...
04/11/2020

### Quantitative Stability and Error Estimates for Optimal Transport Plans

Optimal transport maps and plans between two absolutely continuous measu...
07/05/2017

### An algorithm for optimal transport between a simplex soup and a point cloud

We propose a numerical method to find the optimal transport map between ...
06/09/2022

### Symmetrized semi-discrete optimal transport

Interpolating between measures supported by polygonal or polyhedral doma...
11/11/2019

### Optimal partitions and Semi-discrete optimal transport

In the current book I suggest an off-road path to the subject of optimal...
12/01/2020

### Computation of Optimal Transport with Finite Volumes

We construct Two-Point Flux Approximation (TPFA) finite volume schemes t...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

We begin by recalling the semi-discrete optimal transport problem. Let , be compact and a fixed collection of finite points, along with a cost function

. We also fix Borel probability measures

with and . Furthermore is absolutely continuous with respect to Lebesgue measure and we denote its density by . We set .

We want to find a measurable mapping such that for any measurable , and satisfies

 (1) ∫Xc(x,T(x))dμ=min\tiT#μ=ν∫Xc(x,\tiT(x))dμ.

In this paper our goal is to propose and show convergence of a two stage algorithm. The first stage is a regularized gradient descent which achieves global linear convergence with respect to the regularized problem. The second stage is a damped Newton algorithm similar to that of [KMT19] which has superlinear local convergence. The main difference between our algorithm and that of [KMT19] is ours doesn’t require to be connected. Without a connected support the algorithm in [KMT19] may not converge at all.

Furthermore, we will show that the Laguerre cells, , for the approximate optimizers constructed by our algorithm converge both in the sense and in the sense of Hausdorff distance. This result is analogous to that of [BK19]. While the convergence follows directly from the results there, the Hausdorff convergence result in [BK19] heavily relies on the connectedness of .

## 2. Setup

### 2.1. Notations and Conventions

Here we collect some notations and conventions for the remainder of the paper. We fix positive integers with and a collection . We will write

to denote the vector in

whose components are all . For any vector , we will write its components as superscripts so is the -th component of . We reserve the notation for the -Euclidean norm, i.e. . We use for the and Euclidean norms, i.e. and . We use to denote -dimensional Lebesgue measure and to denote -dimensional Hausdorff measure. Also we use the notation

 Λ={λ∈\RN:λi∈[0,1],N∑i=1λi=1}

for the set of all admissible weight vectors.

We will assume that the cost function satisfies the following standard conditions:

 (Reg) c(⋅,yi) ∈C2(X),∀i∈{1,…,N}, (Twist) ∇xc(x,yi) ≠∇xc(x,yk), ∀x∈X, i≠k.

We also assume the following condition, originally studied by Loeper in [Loe09]. The cost, , is said to satisfy Loeper’s condition if for each there exists a convex set and a diffeomorphism such that

 (QC) ∀ t∈\R, 1≤k,i≤N, {p∈Yi∣−c(\cExpip,yk)+c(\cExpip,yi)≤t} is convex.

See Remark 2.1 below for further discussion of these conditions.

We also say that a set is -convex with respect to if is a convex set for every .

For any and , we define the th Laguerre cell associated to as the set

 \Lagi(ψ):={x∈X∣c(x,yi)+ψi=minic(x,yi)+ψi}.

We also define the function by

 G(ψ):=(G1(ψ),…,GN(ψ))=(μ(\Lag1(ψ)),…,μ(\LagN(ψ))),

and denote for any ,

 Kϵ:={ψ∈\RN∣Gi(ψ)>ϵ, ∀i∈{1,…,N}}.

The above conditions (Reg), (Twist), (QC) are the same ones assumed in [KMT19] and [BK19]. As is also mentioned in those papers, (Reg) and (Twist) are standard conditions in optimal transport. Furthermore, (QC) holds if is a finite set sampled from from a continuous space, and is a cost function satisfying what is known as the Ma-Trudinger-Wang condition (first introduced in a strong form in [MTW05], and in [TW09] in a weaker form).

If is absolutely continuous with respect to Lebesgue measure (or even if doesn’t give mass to small sets), then (Twist) implies that the Laguerre cells are pairwise -almost disjoint. In this case the generalized Brenier’s theorem [Vil09, Theorem 10.28], tells us that for any vector , the map defined by whenever is a minimizer in the optimal transport problem (1), where the source measure is and the target measure is defined by .

We define .

Throughout the rest of the paper we will we assume that is compact and -convex with respect to and that , the density of , is -Hölder continuous.

In order to obtain our convergence results we will need to be at least locally connected in a quantitative way. First we recall a definition.

A non-zero measure on a compact set satisfies a -Poincaré-Wirtinger inequality for some if there exists a constant such that for any ,

 \normf−∫ZfdμLq(μ)≤\Cpw\norm∇fL1(μ).

For brevity, we will write this as “ satisfies a -PW inequality on ”. If is defined on a larger set then then the phrase “ satisfies a -PW inequality on ” is understood to mean that and the restriction of to satisfies a -PW inequality.

We remark that some version of all of our results hold with only -PW inequalities but will give better constants.

We now define a local version of the PW inequality that will serve as our quantitative measure of the local connectivity of .

A probability measure on a compact set satisfies a local -Poincaré-Wirtinger inequality if there are compact which are pairwise disjoint and -almost cover , i.e. such that satisfies a -PW inequality on each with constant .

We define the local PW constant to be the -th power mean of the , i.e. . Note that .

When we have we recover the global connectivity assumption of [KMT19, Definition 1.3]. Throughout the paper we use the notation and the vector .

In order to obtain convergence of our algorithm we require that optimal map , “splits up” the . In order to assure this we assume that the subsets sums of and are separated. For any vector we introduce the notation for the set of subset sums of , i.e

 SV={∑a∈Aa:A⊂{V1,…,Vk}}

and the notation for the set of proper subset sums of , i.e.

 SpropV={∑a∈Aa:A⊊{V1,…,Vk},A≠∅}.

With this notation, we say that the subset sums of and are separated if .

Note that for any fixed with fixed decomposition , the collection of all such that our condition is not satisfied (i.e. ) is a “small set” in the sense that it has Hausdorff dimension whereas has Hausdorff dimension . In particular if is randomly chosen from (with respect to any probability measure that is absolutely continuous with respect to ), then with probability 1, we have . We defer a formal proof of this till Proposition 8.

For the remainder of the paper we use the notation for any . Note that since , is never empty and so is always defined. We defer a discussion of methods to compute and/or bound to section 8.

It is well-known that the optimal transport problem has a dual problem with strong duality (we refer the reader to [Vil03, Chapter 1] for background),

 (2) min\tiT#μ=ν∫Xc(x,\tiT(x))dμ=supψ∈\RN∫X(minjc(x,yj)+ψj)\dmu−\innerψλ.

Furthermore given any dual optimizer the map from Remark 2.1 is a optimal transport map for the primal problem. Because of this we are able to optimize the finite dimensional dual problem instead of the infinite dimensional primal problem. We define Kantorovich’s functional as in [KMT19, Theorem 1.1]

 Φ(ψ):=∫X(minic(x,yi)+ψi)\dmu−\innerψλ=∑i∫\Lagi(ψ)(c(x,yi)+ψi)\dmu−\innerψλ.

Our algorithm will find approximate maximizers of . We also recall that .

### 2.2. Previous Results

There are many papers that apply a Newton-type algorithm to solve semi-discrete optimal transport problems and Monge-Ampère equations. In [OP88] the authors prove local convergence of a Newton Method for solving a semi-discrete Monge-Ampère equation. Global convergence is proved in [Mir15].

For the semi-discrete optimal transport problem the authors of [KMT19] give a Newton method similar to our Algorithm 1, in the case when the source measure is given by a continuous probability density and has connected support. An analogous result for the case when is supported on a union of simplexes is given in [MMT18], however this paper also requires a connectedness condition on the support of .

In [BK19], Kitagawa and the author present a Newton method to solve a generalization of the semi-discrete optimal transport problem in which there is a storage fee. Our results do not require a connectedness condition on the support of , however we require a strict convexity assumption on the storage fee function that doesn’t hold for the classical semi-discrete optimal transport problem. Furthermore we show that the Laguerre cells associated to the approximate transport maps converge quantitatively both in the sense of -symmetric distance and in the sense of Hausdorff distance.

### 2.3. Outline of the Paper

In section 3 we obtain quantitative bounds on the strong concavity of . In section 4 we exploit these bounds to obtain local convergence of a damped Newton algorithm. In section 5 we obtain global convergence of a gradient descent algorithm applied to a regularized version of . In section 6 we discuss convergence of our two stage algorithm. In section 7 we obtain quantitative convergence of the associated Laguerre cells. In section 8

we describe some methods to compute and/or estimate the parameter

from the initial data.

## 3. Spectral Estimates

In this section we work toward quantitative bounds on the strong concavity of .

To start off we recall some notation in order to remain consistent with [KMT19]. We will write to denote the interior of the set . Given an absolutely continuous measure , where is continuous, and a set with Lipschitz boundary, we will write

 \abs∂Aρ,Xi: =∫∂A∩\interior(Xi)ρdHn−1,\absAρ:=μ(A).

The next lemma is virtually identical to [BK19, Lemma 7.4]. We omit the proof.

Suppose that satisfies a -PW inequality on . Then

 infA⊂Xi\abs∂Aρ,Ximin(\absAρ,\absXi∖Aρ)1/q≥121/qκi,

where the infimum is over whose boundary is Lipschitz with finite -measure, and .

Recall from [KMT19] that is negative semidefinite (as it is the Hessian of a concave function). Furthermore, recall that is in the kernel of

, so the largest eigenvalue of

is . We now work toward the following estimate on the second largest eigenvalue.

Suppose that satisfies a local -PW inequality on with constant . Let be fixed. Assume that the subset sums of and are separated i.e., . Then the second largest eigenvalue of is bounded away from zero. In particular it is bounded by where .

At this point, given some we recall the construction of from [KMT19, Section 5.3]. is the simple weighted graph whose vertex set is the collection , and for any and , there is an edge between of weight given by

 wij:=DiGj(ψ)=DjGi(ψ)=∫\Lagi,j(ψ)ρ(x)\norm∇xc(x,yi)−∇xc(x,yj)dHn−1(x)

where we have used the notation

 \Lagi,j(ψ):=\Lagi(ψ)∩\Lagj(ψ)

for , . Under the conditions of Theorem 3, is connected by edges of weight at least .

###### Proof.

Suppose by contradiction that the proposition is false. This implies that removing all edges with weight strictly less than yields a disconnected graph. In other words, we can write where , and are disjoint, such that every edge connecting a vertex in to a vertex in has weight strictly less than . Letting we see that

 \abs∂Aρ,X≤∑{(i,j)∣yi∈W1, yj∈W2}2C∇wij<22−1/qM1/qN2κd1/qG(ψ)\absW1\absW2≤22−1/qM1/qN2κd1/qG(ψ)N24=2−1/qM1/qκd1/qG(ψ).

Now for each we claim that . If or then the claim is trivial. Otherwise the claim follows from Lemma 3, after noting that .

Now note that by construction

 (3) μ(A)=∑yi∈W1μ(\Lagi(ψ))=∑yi∈W1G(ψ)i∈SG(ψ).

On the other hand we can partition the into two types. Let be the index set so that if and only if . We see that

 μ(A)=M∑i=1μ(A∩Xi)=∑i∈Iμ(A∩Xi)+∑i∉Iμ(A∩Xi)=∑i∈Iμ(Xi)−μ(Xi∖A)+∑i∉Iμ(A∩Xi).

Hence

 \absμ(A)−∑i∈Iμ(Xi) ≤∑i∈Iμ(Xi∖A)+∑i∉Iμ(A∩Xi) <∑i∈IκqiMκqdG(ψ)+∑i∉IκqiMκqdG(ψ)=1MκqM∑i=1κqidG(ψ)=dG(ψ).

By definition . Furthermore since both and are nonempty we have , recalling (3). Hence we see that

 dG(ψ)=dist(SpropG(ψ),Sχ)

which is a clear contradiction. ∎

At this point Theorem 3 follows via the exact same method of proof as [BK19, Theorem 8.1]. We omit the proof.

## 4. Convergence of Newton Algorithm

First we recall the standard damped Newton Algorithm (similar to the one proposed in [KMT19]).

We remark that unlike the algorithm proposed in [KMT19] we do not impose a condition on the cells not collapsing. This is because we will prove local convergence for the above algorithm and within the zone of convergence the error reduction requirement will automatically imply that the cells don’t collapse.

Our main result for this section is that Algorithm 1 converges locally with superlinear rate.

Suppose that satisfies a local -PW inequality on . If then Algorithm 1 converges with linear rate and locally with rate .

For , we have .

###### Proof.

Let . We claim that . Indeed if then

 \absx−∑i∈Iλi2=\abs∑i∈I(λi1−λi2)≤∑i∈I\absλi1−λi2≤\normλ1−λ21.

If we apply this to then we get that there is so that . Hence

 dist(Spropλ2,Sχ) ≤dist(y,Sχ) ≤\absx−y+dist(x,Sχ) =dist(Spropλ1,Sχ)+\absx−y ≤dist(Spropλ1,Sχ)+\normλ1−λ21

and so we get . A symmetric argument proves the opposite inequality.

With the above lemma, Theorem 3 gives us that is locally well-conditioned near its maximum.

Suppose that satisfies a local -PW inequality on and . On the set we have that is uniformly and strongly concave, except in the direction . In particular on

 \normΦC2,α

and

 D2Φ≤−23−1/qC∇M1/qN4κ(dλ−η)1/qP

where

is the orthogonal projection onto the hyperplane perpendicular to

and is a constant depending only on , and (the cost).

###### Proof.

Fix . Since and each , we have that each . Furthermore since

 \absGi(ψ)−λi≤\norm∇Φ(ψ)∞≤\norm∇Φ(ψ)1<η

we obtain that and so . In other words

 (4) Jη⊂Kdλ−η.

Hence the first result follows by careful tracing through the proof of [KMT19, Theorem 4.1].

Now for the second claim Lemma 4 gives that

 \absdG(ψ)−dλ≤\normG(ψ)−λ1=\norm∇Φ(ψ)1≤η.

In particular . The second result now follows from Theorem 3.

We briefly note that this proposition gives us uniqueness of the maximizer of up to a multiple of .

If are two maximizers of then there is so that .

###### Proof.

Since is concave and are two maximizers, must be constant along the line segment joining and . In particular if then and so is a maximizer of . Hence for all . In particular we can conclude .

Now since is a maximizer we have so the conditions of Proposition 4 are satisfied. Hence we have where is the orthogonal projection onto the hyperplane perpendicular to . Hence we get which implies the claim.

Given any maximizer, , of we see that is also a maximizer. In particular given any maximizer, , we can construct the maximizer which is a maximizer of that satisfies .

Conversely 4 tells us that there is a unique that maximizes and satisfies . For the remainder of the paper we shall refer to this unique maximizer as .

It is well-known that Proposition 4 gives local convergence of the standard damped Newton Algorithm. However for convenience of the reader we will give a self contained proof.

If then the iterates of Algorithm 1 satisfy

 \norm∇Φ(ψk+1)≤(1−\conjτk2)\norm∇Φ(ψk)

where

 \conjτk=min(β1+1αδL1α\norm∇Φ(ψ)21α,1)

and

 δ :=12minλi L ≤Cδ−2 β ≥23−2/qC∇M1/qN4κd1/qλ

where is a constant depending only on , and (the cost).

Furthermore once we have we get

 \norm∇Φ(ψk+1)≤L\norm∇Φ(ψ)1+αβ1+α.
###### Proof.

The proof is very similar to that of [KMT19, Proposition 6.1]. However we will give all the details.

Note that implies .

For this proof define . Note and so, by (4), (Note that it is NOT true in general that ; our choice of is the largest choice for which this is guaranteed to work). Furthermore is the norm of on and is the bound of the second largest eigenvalue of on . By Proposition 4 we have

 β≥23−1/qC∇M1/qN4κ(dλ2)1/q=23−2/qC∇M1/qN4κd1/qλ.

Also by carefully tracing through the proofs in [KMT19] we have .

We analyze a single iteration of the Algorithm. Define .

Let . We see that

 \normv≤\norm∇Φ(ψ)β,

as is -concave in the direction orthogonal to .

Also define . Let be the first exit time from . We have that

 δ2≤\normG(ψτ1)−G(ψ)≤Lτ1\normv≤Lβ\norm∇Φ(ψ)τ1

and so

 τ1≥βδ2L\norm∇Φ(ψ).

Applying Taylor’s formula to we get

 (5) ∇Φ(ψτ)=∇Φ(ψ−τv)=∇Φ(ψ)−τ(D∇Φ(ψ))v+R(τ)=∇Φ(ψ)−τ∇Φ(ψ)+R(τ)

where

 \normR(τ) =\norm∫τ0(D∇Φ(ψσ)−D∇Φ(ψ))vdσ =\norm∫τ0DG(ψσ)v−DG(ψ)vdσ ≤∫τ0L\normψσ−ψα\normvdσ =∫τ0L\normσvα\normvdσ =L\normvα+1∫τ0σαdσ =L\normvα+1τα+1α+1 (6) ≤L\norm∇Φ(ψ)1+αβ1+ατ1+α

for .

Finally we establish the error reduction estimates. (5) gives

 ∇Φ(ψτ)=(1−τ)∇Φ(ψ)+R(τ)

so we have

 \norm∇Φ(ψτ)≤(1−τ2)\norm∇Φ(ψ)

provided that

 \normR(τ)≤τ2\norm∇Φ(ψ).

Again using (6) this will be true provided

 τ≤min(τ1,β1+1αL1α\norm∇Φ(ψ)21α)=:τ2.

Hence we see that if we set , then the claim is true. Furthermore as the error goes to zero, eventually we must have . When this happens (5) gives

 ∇Φ(ψ1)=R(1)

and so we get the super-linear convergence.

## 5. Convergence of Gradient Descent Algorithm

In order to remedy the fact that the above algorithm only has local convergence we propose a regularized version in order to get within a close enough error.

We define the regularized Kantorovich’s functional

 \tiΦ(ψ)=Φ(ψ)−γ2\normψ2

where is some parameter.

We see that

 ∇\tiΦ(ψ)=∇Φ(ψ)−γψ=G(ψ)−λ−γψ

and

 D2\tiΦ(ψ)=D2Φ(ψ)−γI=DG(ψ)−γI.

In particular we note that is strongly -concave. If we let be the Lipschitz constant of ( for some universal constant ) then has Lipschitz gradient with constant . Hence is well-conditioned.

The projection of in the direction is bounded by

 M0:=√N$$maxx∈X(maxy∈Yc(x,y)−miny∈Yc(x,y))$$.

In other words if and then .

###### Proof.

Since there are some indices so that and . Suppose for sake of contradiction that . Then we must have . In particular there is some index so that .

We first look at the case where . In this case

 \Lagj(ψ) ={x∈X∣c(x,yj)+ψj=minic(x,yi)+ψi} ⊂{x∈X∣c(x,yj)+ψj≤c(x,yj1)+ψj1} ⊂{x∈X∣ψj≤c(x,yj1)−c(x,yj)}=∅

and so we get which contradicts . A similar argument handles the case where .

We remark that since is compact and is continuous, and so the bound isn’t vacuous.

Since is strongly concave it has a unique maximizer. Although it may not be true that this maximizer is in we show that it is still bounded.

Let be the unique maximizer of . Then .

###### Proof.

Let be the maximizer of constructed in Remark 4 so that . Then and so . In particular . But since is the maximizer of we have