1. Introduction
We begin by recalling the semi-discrete optimal transport problem. Let , be compact and a fixed collection of finite points, along with a cost function
. We also fix Borel probability measures
with and . Furthermore is absolutely continuous with respect to Lebesgue measure and we denote its density by . We set .We want to find a measurable mapping such that for any measurable , and satisfies
(1) |
In this paper our goal is to propose and show convergence of a two stage algorithm. The first stage is a regularized gradient descent which achieves global linear convergence with respect to the regularized problem. The second stage is a damped Newton algorithm similar to that of [KMT19] which has superlinear local convergence. The main difference between our algorithm and that of [KMT19] is ours doesn’t require to be connected. Without a connected support the algorithm in [KMT19] may not converge at all.
Furthermore, we will show that the Laguerre cells, , for the approximate optimizers constructed by our algorithm converge both in the sense and in the sense of Hausdorff distance. This result is analogous to that of [BK19]. While the convergence follows directly from the results there, the Hausdorff convergence result in [BK19] heavily relies on the connectedness of .
2. Setup
2.1. Notations and Conventions
Here we collect some notations and conventions for the remainder of the paper. We fix positive integers with and a collection . We will write
to denote the vector in
whose components are all . For any vector , we will write its components as superscripts so is the -th component of . We reserve the notation for the -Euclidean norm, i.e. . We use for the and Euclidean norms, i.e. and . We use to denote -dimensional Lebesgue measure and to denote -dimensional Hausdorff measure. Also we use the notationfor the set of all admissible weight vectors.
We will assume that the cost function satisfies the following standard conditions:
(Reg) | ||||
(Twist) |
We also assume the following condition, originally studied by Loeper in [Loe09]. The cost, , is said to satisfy Loeper’s condition if for each there exists a convex set and a diffeomorphism such that
(QC) |
See Remark 2.1 below for further discussion of these conditions.
We also say that a set is -convex with respect to if is a convex set for every .
For any and , we define the th Laguerre cell associated to as the set
We also define the function by
and denote for any ,
The above conditions (Reg), (Twist), (QC) are the same ones assumed in [KMT19] and [BK19]. As is also mentioned in those papers, (Reg) and (Twist) are standard conditions in optimal transport. Furthermore, (QC) holds if is a finite set sampled from from a continuous space, and is a cost function satisfying what is known as the Ma-Trudinger-Wang condition (first introduced in a strong form in [MTW05], and in [TW09] in a weaker form).
If is absolutely continuous with respect to Lebesgue measure (or even if doesn’t give mass to small sets), then (Twist) implies that the Laguerre cells are pairwise -almost disjoint. In this case the generalized Brenier’s theorem [Vil09, Theorem 10.28], tells us that for any vector , the map defined by whenever is a minimizer in the optimal transport problem (1), where the source measure is and the target measure is defined by .
We define .
Throughout the rest of the paper we will we assume that is compact and -convex with respect to and that , the density of , is -Hölder continuous.
In order to obtain our convergence results we will need to be at least locally connected in a quantitative way. First we recall a definition.
A non-zero measure on a compact set satisfies a -Poincaré-Wirtinger inequality for some if there exists a constant such that for any ,
For brevity, we will write this as “ satisfies a -PW inequality on ”. If is defined on a larger set then then the phrase “ satisfies a -PW inequality on ” is understood to mean that and the restriction of to satisfies a -PW inequality.
We remark that some version of all of our results hold with only -PW inequalities but will give better constants.
We now define a local version of the PW inequality that will serve as our quantitative measure of the local connectivity of .
A probability measure on a compact set satisfies a local -Poincaré-Wirtinger inequality if there are compact which are pairwise disjoint and -almost cover , i.e. such that satisfies a -PW inequality on each with constant .
We define the local PW constant to be the -th power mean of the , i.e. . Note that .
When we have we recover the global connectivity assumption of [KMT19, Definition 1.3]. Throughout the paper we use the notation and the vector .
In order to obtain convergence of our algorithm we require that optimal map , “splits up” the . In order to assure this we assume that the subsets sums of and are separated. For any vector we introduce the notation for the set of subset sums of , i.e
and the notation for the set of proper subset sums of , i.e.
With this notation, we say that the subset sums of and are separated if .
Note that for any fixed with fixed decomposition , the collection of all such that our condition is not satisfied (i.e. ) is a “small set” in the sense that it has Hausdorff dimension whereas has Hausdorff dimension . In particular if is randomly chosen from (with respect to any probability measure that is absolutely continuous with respect to ), then with probability 1, we have . We defer a formal proof of this till Proposition 8.
For the remainder of the paper we use the notation for any . Note that since , is never empty and so is always defined. We defer a discussion of methods to compute and/or bound to section 8.
It is well-known that the optimal transport problem has a dual problem with strong duality (we refer the reader to [Vil03, Chapter 1] for background),
(2) |
Furthermore given any dual optimizer the map from Remark 2.1 is a optimal transport map for the primal problem. Because of this we are able to optimize the finite dimensional dual problem instead of the infinite dimensional primal problem. We define Kantorovich’s functional as in [KMT19, Theorem 1.1]
Our algorithm will find approximate maximizers of . We also recall that .
2.2. Previous Results
There are many papers that apply a Newton-type algorithm to solve semi-discrete optimal transport problems and Monge-Ampère equations. In [OP88] the authors prove local convergence of a Newton Method for solving a semi-discrete Monge-Ampère equation. Global convergence is proved in [Mir15].
For the semi-discrete optimal transport problem the authors of [KMT19] give a Newton method similar to our Algorithm 1, in the case when the source measure is given by a continuous probability density and has connected support. An analogous result for the case when is supported on a union of simplexes is given in [MMT18], however this paper also requires a connectedness condition on the support of .
In [BK19], Kitagawa and the author present a Newton method to solve a generalization of the semi-discrete optimal transport problem in which there is a storage fee. Our results do not require a connectedness condition on the support of , however we require a strict convexity assumption on the storage fee function that doesn’t hold for the classical semi-discrete optimal transport problem. Furthermore we show that the Laguerre cells associated to the approximate transport maps converge quantitatively both in the sense of -symmetric distance and in the sense of Hausdorff distance.
2.3. Outline of the Paper
In section 3 we obtain quantitative bounds on the strong concavity of . In section 4 we exploit these bounds to obtain local convergence of a damped Newton algorithm. In section 5 we obtain global convergence of a gradient descent algorithm applied to a regularized version of . In section 6 we discuss convergence of our two stage algorithm. In section 7 we obtain quantitative convergence of the associated Laguerre cells. In section 8
we describe some methods to compute and/or estimate the parameter
from the initial data.3. Spectral Estimates
In this section we work toward quantitative bounds on the strong concavity of .
To start off we recall some notation in order to remain consistent with [KMT19]. We will write to denote the interior of the set . Given an absolutely continuous measure , where is continuous, and a set with Lipschitz boundary, we will write
The next lemma is virtually identical to [BK19, Lemma 7.4]. We omit the proof.
Suppose that satisfies a -PW inequality on . Then
where the infimum is over whose boundary is Lipschitz with finite -measure, and .
Recall from [KMT19] that is negative semidefinite (as it is the Hessian of a concave function). Furthermore, recall that is in the kernel of
, so the largest eigenvalue of
is . We now work toward the following estimate on the second largest eigenvalue.Suppose that satisfies a local -PW inequality on with constant . Let be fixed. Assume that the subset sums of and are separated i.e., . Then the second largest eigenvalue of is bounded away from zero. In particular it is bounded by where .
At this point, given some we recall the construction of from [KMT19, Section 5.3]. is the simple weighted graph whose vertex set is the collection , and for any and , there is an edge between of weight given by
where we have used the notation
for , . Under the conditions of Theorem 3, is connected by edges of weight at least .
Proof.
Suppose by contradiction that the proposition is false. This implies that removing all edges with weight strictly less than yields a disconnected graph. In other words, we can write where , and are disjoint, such that every edge connecting a vertex in to a vertex in has weight strictly less than . Letting we see that
Now for each we claim that . If or then the claim is trivial. Otherwise the claim follows from Lemma 3, after noting that .
Now note that by construction
(3) |
On the other hand we can partition the into two types. Let be the index set so that if and only if . We see that
Hence
By definition . Furthermore since both and are nonempty we have , recalling (3). Hence we see that
which is a clear contradiction. ∎
4. Convergence of Newton Algorithm
First we recall the standard damped Newton Algorithm (similar to the one proposed in [KMT19]).
- Step 1:
-
Compute
- Step 2:
-
Determine the minimum such that satisfies
- Step 3:
-
Set and .
We remark that unlike the algorithm proposed in [KMT19] we do not impose a condition on the cells not collapsing. This is because we will prove local convergence for the above algorithm and within the zone of convergence the error reduction requirement will automatically imply that the cells don’t collapse.
Our main result for this section is that Algorithm 1 converges locally with superlinear rate.
Suppose that satisfies a local -PW inequality on . If then Algorithm 1 converges with linear rate and locally with rate .
We start with a simple lemma.
For , we have .
Proof.
Let . We claim that . Indeed if then
If we apply this to then we get that there is so that . Hence
and so we get . A symmetric argument proves the opposite inequality.
∎
With the above lemma, Theorem 3 gives us that is locally well-conditioned near its maximum.
Suppose that satisfies a local -PW inequality on and . On the set we have that is uniformly and strongly concave, except in the direction . In particular on
and
where
is the orthogonal projection onto the hyperplane perpendicular to
and is a constant depending only on , and (the cost).Proof.
Fix . Since and each , we have that each . Furthermore since
we obtain that and so . In other words
(4) |
Hence the first result follows by careful tracing through the proof of [KMT19, Theorem 4.1].
Now for the second claim Lemma 4 gives that
In particular . The second result now follows from Theorem 3.
∎
We briefly note that this proposition gives us uniqueness of the maximizer of up to a multiple of .
If are two maximizers of then there is so that .
Proof.
Since is concave and are two maximizers, must be constant along the line segment joining and . In particular if then and so is a maximizer of . Hence for all . In particular we can conclude .
Now since is a maximizer we have so the conditions of Proposition 4 are satisfied. Hence we have where is the orthogonal projection onto the hyperplane perpendicular to . Hence we get which implies the claim.
∎
Given any maximizer, , of we see that is also a maximizer. In particular given any maximizer, , we can construct the maximizer which is a maximizer of that satisfies .
Conversely 4 tells us that there is a unique that maximizes and satisfies . For the remainder of the paper we shall refer to this unique maximizer as .
It is well-known that Proposition 4 gives local convergence of the standard damped Newton Algorithm. However for convenience of the reader we will give a self contained proof.
If then the iterates of Algorithm 1 satisfy
where
and
where is a constant depending only on , and (the cost).
Furthermore once we have we get
Proof.
The proof is very similar to that of [KMT19, Proposition 6.1]. However we will give all the details.
Note that implies .
For this proof define . Note and so, by (4), (Note that it is NOT true in general that ; our choice of is the largest choice for which this is guaranteed to work). Furthermore is the norm of on and is the bound of the second largest eigenvalue of on . By Proposition 4 we have
Also by carefully tracing through the proofs in [KMT19] we have .
We analyze a single iteration of the Algorithm. Define .
Let . We see that
as is -concave in the direction orthogonal to .
Also define . Let be the first exit time from . We have that
and so
Applying Taylor’s formula to we get
(5) |
where
(6) |
for .
Finally we establish the error reduction estimates. (5) gives
so we have
provided that
Again using (6) this will be true provided
Hence we see that if we set , then the claim is true. Furthermore as the error goes to zero, eventually we must have . When this happens (5) gives
and so we get the super-linear convergence.
∎
5. Convergence of Gradient Descent Algorithm
In order to remedy the fact that the above algorithm only has local convergence we propose a regularized version in order to get within a close enough error.
We define the regularized Kantorovich’s functional
where is some parameter.
We see that
and
In particular we note that is strongly -concave. If we let be the Lipschitz constant of ( for some universal constant ) then has Lipschitz gradient with constant . Hence is well-conditioned.
The projection of in the direction is bounded by
In other words if and then .
Proof.
Since there are some indices so that and . Suppose for sake of contradiction that . Then we must have . In particular there is some index so that .
We first look at the case where . In this case
and so we get which contradicts . A similar argument handles the case where .
∎
We remark that since is compact and is continuous, and so the bound isn’t vacuous.
Since is strongly concave it has a unique maximizer. Although it may not be true that this maximizer is in we show that it is still bounded.
Let be the unique maximizer of . Then .
Proof.
Let be the maximizer of constructed in Remark 4 so that . Then and so . In particular . But since is the maximizer of we have