The Optimal Transport (OT) problem has a long history in mathematics and operation research, originally used to find the optimal cost to transport masses from one distribution to the other (Villani, 2003)
. Over the last decade, OT has emerged as one of the most important tools to solve interesting practical problems in statistics and machine learning(Peyré & Cuturi, 2019). Recently, the Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses has been used in several applications in computational biology (Schiebinger et al., 2019), computational imaging (Lee et al., 2019)2019) and machine learning and statistics (Frogner et al., 2015; Janati et al., 2019).
The UOT problem is a regularized version of Kantorovich formulation which places penalty functions on the marginal distributions based on some divergence (Liero et al., 2018)
components, the OT problem can be recast as a linear programming problem. The benchmark methods for solving the OT problem are interior-point methods of which the most practical complexity isdeveloped by (Pele & Werman, 2009). Recently, (Lee & Sidford, 2014) used Laplacian linear system algorithms to improve the complexity of interior-point methods to . However, the interior-point methods are not scalable when is large.
To deal with the scalability of computing the OT, (Cuturi, 2013) proposed to regularize its objective function by the entropy of the transportation plan, which results in the entropic regularized OT. One of the most popular algorithms for solving the entropic regularized OT is the Sinkhorn algorithm (Sinkhorn, 1974), which was shown by (Altschuler et al., 2017) to have a complexity of when used to approximate the OT within an -accuracy. In the same article, (Altschuler et al., 2017) developed a greedy version of the Sinkhorn algorithm, named the Greenkhorn algorithm, that has a better practical performance than the Sinkhorn algorithm. Later, the complexity of the Greenkhorn algorithm was improved to by a deeper analysis in (Lin et al., 2019b). To accelerate Sinkhorn and Greenkhorn algorithms, (Lin et al., 2019a) introduced Randkhorn and Gandkhorn algorithms that have complexity upper bounds of . These complexities are better than those of Sinkhorn and Greenkhorn algorithms in terms of the desired accuracy . A different line of algorithms for solving the OT problem is based on primal-dual algorithms. These algorithms include accelerated primal-dual gradient descent algorithm (Dvurechensky et al., 2018), accelerated primal-dual mirror descent algorithm (Lin et al., 2019b), and accelerated primal-dual coordinate descent algorithm (Guo et al., 2019). These primal-dual algorithms all have complexity upper bounds of , which are better than those of Sinkhorn and Greenkhorn algorithms in terms of . Recently, (Jambulapati et al., 2019; Blanchet et al., 2018) developed algorithms with complexity upper bounds of , which are believed to be optimal, based on either a dual extrapolation framework with area-convex mirror mapping or some black-box and specialized graph algorithms. However, these algorithms are quite difficult to implement. Therefore, they are less competitive than Sinkhorn and Greenkhorn algorithms in practice.
Our Contribution. While the complexity theory for OT has been rather well-understood, that for UOT is still nascent. In the paper, we establish the complexity of approximating UOT between two discrete measures with at most components. We focus on the setting when the penalty functions are Kullback-Leiber divergences. Similar to the entropic regularized OT, in order to account for the scalability of computing UOT, we also consider an entropic version of UOT, which we refer to as entropic regularized UOT. The Sinkhorn algorithm is widely used to solve the entropic regularized UOT (Chizat et al., 2016); however, its complexity for approximating the UOT has not been studied. Our contribution is to prove that the Sinkhorn algorithm has a complexity of
This complexity is close to the probably optimal one by a factor of logarithm of and .
The main difference between finding an -approximation solution for OT and UOT by the Sinkhorn algorithm is that the Sinkhorn algorithm for OT knows when it is close to the solution because of the constraints on the marginals, while the UOT does not have that advantage. Despite lacking that useful property, the UOT enjoys more freedom resulting in some interesting equations that relate the optimal value of the primal function to the masses of two measures (see Lemma 4). Those equations together with the geometric convergence of the dual solution prove the almost linear time convergence to an -approximation solution of the UOT.
Organization. The remainder of the paper is organized as follows. In Section 2, we provide a setup for the regularized UOT in primal and dual forms, respectively. Based on the dual form, we show the dual solution has a geometric convergence rate in Section 3. We also show in Section 3 that the Sinkhorn algorithm for the UOT has a complexity of order . Section 4 presents some empirical results confirming the complexity of the Sinkhorn algorithm. Finally, we conclude with Section 5.
Notation. We let stand for the set while
stands for the set of all vectors inwith nonnegative components for any . For a vector and , we denote as its -norm and as the diagonal matrix with on the diagonal. stands for a vector of length with all of its components equal to . refers to a partial gradient of with respect to . Lastly, given the dimension and accuracy , the notation stands for the upper bound where is independent of and . Similarly, the notation indicates the previous inequality may depend on the logarithmic function of and , and where .
2 Unbalanced Optimal Transport with entropic regularization
In this section, we present the primal and dual form of the entropic regularized UOT problem and define an -approximation for the solution of the unregularized UOT.
For any two positive vectors and , the UOT problem takes the form where
is a cost matrix, is a given regularization parameter and the divergence between vectors and is defined as
and is called the transportation plan. When and , the UOT problem becomes the standard OT problem. Similar to the original OT problem, the exact computation of UOT is expensive and not scalable in terms of dimension . Inspired by the recent success of the entropic regularized OT problem as an efficient approximation of OT problem, we also consider the entropic version of the UOT problem (Frogner et al., 2015) of finding , where
is the regularization parameter and is the entropic regularization defined by
For each , the entropic regularized UOT problem is strongly convex.
For any , we call an -approximation transportation plan if the following holds
where is an optimal transportation plan for the UOT problem (1).
We aim to develop an algorithm to obtain -approximation transportation plan for the UOT problem (1). In order to do that, we consider the Fenchel-Legendre dual form of entropic regularized UOT, which is given by
Since and are given non-negative vectors, finding the optimal solution for the above objective is equivalent to finding the optimal solution for the following objective
Problem (4) is referred to as dual entropic regularized UOT.
3 Complexity analysis of approximating unbalanced optimal transport
In this section, we provide a complexity analysis of the Sinkhorn algorithm for approximating UOT solution. We start with some notations and useful quantities followed by the lemmas and main theorems.
3.1 Notations and assumptions
We first denote , . For each , its corresponding optimal transport in the dual form (4) is denoted by , where .
The corresponding solution in (2) is denoted by . Let , and .
Let be the solution returned at the -th iteration of the Sinkhorn algorithm and be the optimal solution of (4). Following the above scheme, we also define , and , correspondingly. Additionally, we define to be the optimal solution of the unregularized objective (1) and .
Different from the balanced OT, the optimal solutions of the entropic regularized UOT and our complexity analysis overall also depend on the masses and the regularization parameter . We will assume the following simple regularity conditions throughout the paper.
, are positive constants.
is a matrix of non-negative entries.
Before presenting the main theorem and analysis, for convenience, we define some quantities that will be used in our analysis and quantify their magnitudes under the regularity conditions.
List of quantities:
As we shall see, the quantities and are used to establish the convergence rate of . We now consider the order of and . Since the order of the penalty function is and should be small for a good approximation, is often chosen such that is sufficiently small. Hence, we can assume the dominant factor in the second term of is . If is a positive constant, then we can expect that be as small as for a constant . In this case, . Overall, we can assume that and if and are positive constants, then and .
3.2 Sinkhorn algorithm
The Sinkhorn algorithm (Chizat et al., 2016) alternatively minimizes the dual function in (4) with respect to and . Suppose we are at iteration for and even, by setting the gradient to we can see that given fixed , the update that minimizes the function in (4) satisfies
Multiplying both sides by , we get:
Similarly with fixed and
We now present the main theorems.
Theorem 1 establishes a convergence rate for the dual solution . It is a geometric convergence similar to the work of (Sejourne et al., ). However, we obtain a specific upper bound for the convergence rate which depends explicitly on the number of components and all other parameters of masses and penalty function. The convergence rate of Theorem 1 plays an important role for complexity analysis in the next theorem.
The next corollary sums up the complexity of Algorithm 1.
Under conditions (A1-A2) and assume that and . Then the complexity of Algorithm 1 is
which is also .
Proof of Corollary 1.
By the assumptions on the order of , and the definition of in (10), we have
Overall, we obtain
Multiplying with arithmetic operations per iteration, we obtain the final complexity. ∎
In comparison to the best well-known OT’s complexity of the similar order of , i.e. (Dvurechensky et al., 2018), our complexity for the OT is better by a factor of . Meanwhile, among the practical algorithms for OT which have similar order of , i.e. Gankhorn and Randkhorn, our bound is better by a factor of .
3.3 Analysis of the Sinkhorn algorithm
The analysis for Unbalanced Optimal Transport is different from that of Optimal Transport, since and are no longer probability measures. The proof of Theorem 1 requires the convergence rate of and an upper bound on the supremum norm of the optimal dual solution , the later of which is presented in Lemma 3.
The optimal solution of (4) satisfies the following equations:
Since is a fixed point of the update in the Algorithm 1, we get
This directly leads to the stated equality for , and that for can be obtained similarly. ∎
Assume the regularity conditions (A1-A2) hold, the following are true
The proof is given in the appendix.
The sup norm of the optimal solution and is bounded by:
where is defined in (5).
We start with the equations for the solution in Lemma 1, i.e.
which can be rewritten as
The second term can be bounded as follows
thus leading to
Choosing such that , combining with the fact that , we have
WLOG assume that , we can easily obtain the stated bound. ∎
Proof of Theorem 1.
We first consider the case when is even. From the update of in Algorithm 1, we have:
This leads to . Similarly, we obtain . Combining the two inequalities yields
Repeating all the above arguments alternatively, we have
Note that for even, then
These two results lead to .
Similarly, for odd we obtain .
Thus the above inequality is true for all . Using the fact that and Lemma 3, we obtain the conclusion. ∎
3.4 Proof of the main theorem
Assume that the function attains its minimum at , then
Similarly, assume that attains its minimum at , then
Both equations in Lemma 4 establish the relationships between the optimal solutions of (1) and (2) with other parameters. Those relationships are very useful for analysing the behaviour of the optimal solution of UOT, because the UOT does not have any conditions on the marginals as the OT does. Consequences of Lemma 4 include Corollary 2 which provides upper bounds for and of (1) and (2) as well as bounds for the entropic functions in the proof of Theorem 2. The key idea of the proof surprisingly comes from the fact that the UOT solution does not have to meet the marginal constraints. We now present the proof of Lemma 4 and defer the proof of Corollary 2 to the Appendix.
Consider the function , where ,
For the KL term of , we have:
Similarly, we get
For the entropic penalty term,
Putting all results together, we obtain
Taking the derivative of with respect to ,
The function is well-defined for all . We know that attains its minimum at . Replace into the above equation, we obtain
The second claim is proved in the same way. ∎
Assume that condition (A1-A2) hold and is sufficiently small. We have the following bounds on and :
Next, we use the condition for in Theorem 2 to bound some relevant quantities at the -th iteration of the Sinkhorn algorithm.
We are now ready to construct a proof for Theorem 2.
Proof of Theorem 2.
From the definitions of and , we have
since , as is the optimal solution of (2). The above two terms can be bounded separately as follows:
Upper bound of .
We first show the following inequalities
for any that and .
Indeed, rewriting as
and using , we thus obtain (15).
Now apply the lower bound of (15) to
where the second inequality is due to being convex and by Corollary 2 and the third inequality is due to .
Similarly, apply the upper bound of (15) to
By combining the two results, we have
where is defined in (9).
Upper bound of .
where with denoting element-wise multiplication.
Denote . By Lemma 4,
Writing , following some derivations using the above equations of and and the definitions of and , we get
By part of Lemma 5, the first term is bounded by .