Let and be probability measures supported on a bounded convex set . The quadratic Wasserstein distance is defined by
where the infimum is taken over all transference plans (probability measures on such that and for all measurable sets ). Computing the quadratic Wasserstein distance is a nonlinear problem. In this paper, we consider the case where and have smooth positive densities and with respect to Lebesgue measure: and . In this case, there is a classic local linearization of based on a weighted negative homogeneous Sobolev norm, which is derived from linearizing a general Monge–Ampére equation, see §2. In particular, the weighted negative homogeneous Sobolev norm is defined for functions such that by
That is, the space is the dual space of . Under fairly general conditions, if denotes a perturbation of , then (informally speaking) we have
see for example [Theorem 7.2.6 ] for a precise statement. Under stronger assumptions, if , then the error term can be shown to be , see §3.5 for details. In this paper, we study how to leverage the connection between the -norm and the metric for computational purposes. In particular, our computational approach is based on a connection between the -norm and the Witten Laplacian, which is a Schrödinger operator of the form
where is a potential that depends on , see Figure 1. We show that this connection provides a method of computation whose computational cost can be controlled by the amount of regularization used when defining the potential, see §4 for details.
For the case of probability distributions on the unit square represented by arrays, we present a code for computing this linearization of based on the Witten Laplacian, and present a number of numerical examples, see §5. This Witten Laplacian perspective leads to several potential applications; in particular, a method of defining an embedding discussed and methods of smoothing discussed in §6.
The quadratic Wasserstein distance is an instance of Monge –Kantorovich optimal transport whose study was initiated by Monge  in 1781 and generalized by Kantorovich  in 1942. The theory of optimal transport has been developed by many authors; for a summary see the book by Villani 
. Recently, due to new applications in data science and machine learning, developing methods to compute and approximate optimal transport distances has become an important area of research in applied mathematics, see the surveys by Peyré and Cuturi and Santambrogio .
In this paper, we focus on the connection between the metric and the -norm, which can be used to approximate , see §2.3 and §3.5. The connection between and follows from the work of Brenier  in 1987 who discovered that under appropriate conditions the solution to Monge–Kantorovich optimal transport with a quadratic cost can be expressed using a map that pushes forward one measure to the other; moreover, this map is the gradient of a convex function, see §2.1. Brenier’s theorem reduces the problem of computing Wasserstein distance to solving a generalized Monge–Ampére equation. Linearizing this general Monge–Ampére equation gives rise to an elliptic equation, which corresponds to a weighted negative homogeneous Sobolev norm, see §2.3 below or see [§4.1.2, §7.6 of ]. The connection between Wasserstein distance and negative homogeneous Sobolev norms has been considered by several different authors as is discussed in the following section.
1.3. Related work
Several authors have considered the connection between negative homogeneous Sobolev norms and the metric in several different contexts. In analysis, this connection has been used to establish estimates by many authors, see for example [13, 15, 23, 26, 27]. Moreover, this connection also arises in the study of how measures change under heat diffusion, see [5, 21, 30]. The weighted negative homogeneous Sobolev norm has also been considered in connection to maximum mean discrepancy (MMD) which is a technique that can be used to compute a distance between point clouds, and has many applications in machine learning, see [1, 19, 20]. Authors have also considered this connection in papers focused on computing the metric, see [4, 9, 14]. Moreover, the negative homogeneous Sobolev norm has been considered in several applications to seismic image and image processing, see [8, 10, 31].
We emphasize three related works. First, Peyre  establishes estimates for the metric in terms of the unweighted negative homogeneous Sobolev norm . Let and be probability measures on with densities and with respect to Lebesgue measure. The result of Peyre says that if , then
a discussion and concise proof of this result can be found in [§5.5.2 of ]. Informally, this result says that if and have their mass spread out over , then the Wasserstein distance is equivalent to the negative unweighted homogeneous Sobolev norm of ; as a consequence, the weighted Sobolev norm is most interesting for probability distributions that have regions of high and low density.
Second, Engquist, Ren, and Yang  study the application of Wasserstein distance to inverse data matching. In particular, they compare the effectiveness of several different Sobolev norms for their applications; we note that the definitions of the norms they consider differ from the norm (2) that we consider, see the discussion in §3.6 below, but their results do indicate that negative homogeneous Sobolev norms may not be an appropriate substitute for the metric for some applications.
2. Preliminaries and motivation
In this section, we briefly summarize material from Chapters 0.1, 0.2, 2.1, 2.3, 4.1, 4.2, and 7.6 of the book by Villani . In order to make these preliminaries as concise as possible we state all definitions and theorems for our special case of interest; in particular, we assume that all probability measures have smooth densities with respect to Lebesgue measure, and restrict our attention to transport with respect to a quadratic cost function. This section is organized as follows: we consider the Monge-Kantorovich transport problem and Brenier’s theorem in §2.1, the Monge-Ampére equation in §2.2, and then discuss linearization of the metric in §2.3. The main purpose of these preliminaries is to provide background for (13), stated at the end of §2.3, which clarifies the statement that the -norm is a linearization of the metric.
2.1. Monge-Kantorovich optimal transport and Brenier’s theorem
Let be a bounded convex set, and and be probability measures on that have positive smooth densities and , respectively, with respect to the Lebesgue measure. A transference plan is a probability measure on such that
for all measurable subsets of . We denote the set of all transference plans by , and define the transportation cost with respect to the quadratic cost function by
In this case, the Monge-Kantorovich optimal transport cost is defined by
Since we are considering a quadratic transportation cost, the Monge-Kantorovich optimal transport cost is the square of the metric:
Under the above conditions, Brenier’s theorem states that the Monge-Kantorovich optimization problem (4) has a unique solution satisfying
where is a convex function such that . Here denotes the push-forward, and denotes a Dirac distribution. To be clear, given , the push-forward is defined by the relation for all measurable sets , or equivalently, by the relation
2.2. Monge-Ampére equation
Recall that by assumption and have densities and , respectively, with respect to the Lebesgue measure. We can express (6) as
for all bounded continuous functions on . Changing variables on the right hand side and using the fact that is arbitrary gives
Upon rearranging this equation as
it is clear that is an instance of the general Monge-Ampére equation
2.3. Linearization of general Monge-Ampére equation
To linearize this general Monge-Ampére equation we assume that is positive, is close to the identity, and . More precisely, assume
where are remainder functions depending on , . It follows from rearranging terms that
3. Characterizing the operator
So far we have presented background material that motivates why the quadratic Wasserstein distance is related to the operator defined by
In this section, we discuss the connection between the operator , the negative weighted homogeneous Sobolev norm , and the quadratic Wasserstein distance in detail. The section is organized as follows: We start, in §3.1, by stating and proving a version of Green’s first identity that satisfies. Second, in §3.2 we state a result connecting the -norm to the solution of an elliptic boundary value problem involving . Third, in §3.3, we give a characterization of the -norm in terms of a divergence optimization problem. Fourth, in §3.4 we consider a characterization of the -norm involving the Witten Laplacian, which can be derived from by a change of variables. Fifth, in §3.5 we provide a more precise version of the statement that the -norm is a linearization of the metric. Finally, in §3.6
, we discuss weighted Sobolev norms in relation to the Fourier transform.
3.1. Green’s first identity analog for
Let be a bounded convex domain in , be a once differentiable function, and be a twice differentiable function. Green’s first identity states that
where , and is an exterior unit normal to the surface element .
Suppose that is a measure on which has a density with respect to Lebesgue measure: . Further assume that is once differentiable, on , and . Then,
whenever on or on .
3.2. Elliptic boundary value problem
Let be a bounded convex domain in , and suppose that is a measure on which has a density with respect to Lebesgue measure: . Assume that is once differentiable and on . For functions such that we define the -norm by
The following result characterizes the -norm in terms of an elliptic boundary value problem involving the operator .
where is a solution to the elliptic boundary value problem
Proof of Proposition 3.2.
The second and third equalities in (16) are a direct consequence of the definition of and Proposition 3.1, so we only need to show that . Assume that is a solution to (17) substituting . It follows from (15) that
Using the Cauchy-Schwarz inequality gives
which implies that
On the other hand, from (18) we have
so we conclude that as was to be shown. ∎
3.3. Divergence formulation
-norm can also be formulated as an optimization problem over vector fields satisfying a divergence condition. We note that our computational approach does not directly use this divergence formulation, and that our purpose of stating the following result is for completeness and its connection to other methods.
where the minimum is taken over vector fields with continuous first-order partial derivatives that satisfy
Note that it will become clear from the proof that it would suffice to assume that (19) holds in a weak sense.
First, observe that if is a solution to the elliptic boundary value problem (17), then is admissible to the minimization since
and . And from Proposition 3.2 we have
To complete the proof it suffices to show that
If we define , then satisfies
To complete the proof we will show that . Observe that
The first integral on the right hand side is zero since in . The second integral on the right hand side is zero since by the Gauss divergence theorem
and by assumption. This completes the proof. ∎
3.4. Witten Laplacian formulation
The -norm can also be defined in terms of a boundary value problem involving the Witten Laplacian , which is a Schrödinger operator of the form
where is a potential depending on defined by
Alternatively, the Witten Laplacian can be defined by the similarity transform
which symmetrizes in the sense that the resulting operator is self-adjoint with respect to ; for a discussion of the Witten Laplacian and some spectral estimates see . In the following Proposition, we give an elliptic equation involving that can be used to compute the -norm; this proposition is an immediate consequence of the fact that the Schrödinger operator definition (20) is consistent with the similarity transform definition (21).
Using the notation and we have
where is the solution to the elliptic boundary value problem:
This formulation is an immediate consequence of the identity
which is straightforward to verify: expanding gives
and after canceling terms we have
From the above calculation, it is also clear that
since all terms involving cancel. ∎
Remark 3.1 (Alternate form of the potential).
Using the fact that
we can write as
3.5. Linearization remainder estimate
Recall that previously in §2.3 we gave the informal estimate
The purpose of this section, is to make this informal statement more precise; we emphasize that the result proved in this section is for illustrative purposes: results involving weaker assumptions and weaker regularity conditions are possible.
Let and be probability measures with densities and with respect to the Lebesgue measure: and . Assume that
where is a smooth function satisfying on . Further, assume that
where is a smooth function. Assume that satisfies the nonlinear equation
Let be the solution to the elliptic boundary value problem
We have the following result.
We demonstrate this result numerically in §5.3.
Proof of Proposition 3.5.
By the Lagrange remainder formulation of Taylor’s Theorem, and (22) we have
where the remainder functions can be expressed by
where are points on the line segment between and . It follows that
Multiplying both sides of this equation by and integrating over gives
Using (15) to rewrite the left hand side gives
We can bound the norm of by the norm of and the inverse of the smallest positive eigenvalue of ; therefore, we can complete the proof by using Cauchy-Schwarz and almost everywhere bounds on all other quantities. ∎
We note that it is possible to obtain various estimates on from (25) in terms of norms of the quantities and instead of almost everywhere bounds. Moreover, results that guarantee bounds on the solution of general Monge-Ampére equations could be used to provide bounds for and in terms of and , for example see .
3.6. Fourier transform and weighted Sobolev norms
In this section, we discuss a family of weighted Sobolev norms defined by Engquist, Ren, and Yang  using the Fourier transform
In  the authors compare the metric to a family of weighted Sobolev norms defined in Fourier domain for the purpose of inverse data matching; in particular, in [Eq. 8 and Remark 2.1 of ] they define
where and denote the Fourier transform of and , respectively, and denotes convolution
and the -norm defined in (2) by
also see [§7.6 of ]. First, observe that if and is the constant function, then is the Dirac delta distribution and
where the final inequality follows from the Plancherel theorem. However, if and is arbitrary, then in general
where denotes the inverse Fourier transform of , and is assumed to have density with respect to Lebesgue measure. It does follow from the Plancherel theorem that
However, in general, the functions and are not equal, and thus in general their integrals against are not equal; in particular, their integrals against can be very different when is localized in space. Roughly speaking, the issue is that taking the absolute value of does not commute with taking the convolution. It is possible to define the -norm in Fourier domain by defining ; however, this does not seem to lead to a viable way to compute the dual -norm except in dimension ; we note that quadratic Wasserstein distance also has a simple characterization in -dimension, see Remark 3.3.
4. Computation and regularization
In this section, we consider the connection between the -norm and the Witten Laplacian, see §3.4, from a computational point of view. Recall that the Witten Laplacian is a Schrödinger operator of the form
where is a potential. Roughly speaking, the advantage of considering this formulation is that all the complexity of the problem has been distilled into the potential , which can be regularized to manage the computational cost. This section is organized as follows. First, in §4.1 we consider a spectral decomposition of
by its Neumann eigenfunctions and define the fractional Laplacian. Second, in §4.2, we change variables using fractional Laplacians to precondition our elliptic equation involving . Third, in §4.3, we observe how using the heat equation to define a smoothed version of can control the condition number of our problem. Finally, in §4.4 we discuss the computational cost of the described method.
4.1. Spectral decomposition of the Laplacian
Suppose that is a bounded convex domain. Recall that is a Neumann eigenvalue of the Laplacian on if there is a corresponding eigenfunction such that
where denotes the boundary of , and denotes an exterior unit normal to the boundary. The Neumann eigenvalues of the Laplacian are nonnegative real numbers that satisfy
and the corresponding eigenfunctions form an orthogonal basis of square integrable functions on . We can use this basis of Neumann eigenfunctions to define the fractional Laplacian for and by
where we include the constant term , independent of , so that the operator is well-defined and invertible for both positive and negative . With this definition, the operator is invertible, which will become relevant in the following section; in particular, we will use the invertible operator to precondition the elliptic equation .
4.2. Preconditioning the elliptic equation
Recall that our goal is to solve the elliptic equation
It follows that
where denotes the projection onto the space of constant functions,
and denotes the identity operator. We remark that the projection is necessary in the definition of since we have defined to preserve constant functions, while the Laplacian destroys constant functions.
Since is invertible the dimension of the null space of is the same as the dimension of the null space of , which is -dimensional. In particular, we have
Let and denote the smallest positive eigenvalue of and , respectively. If
is a normalized eigenvector associated with, then it follows from the Courant-Fisher Theorem that
where . In the following, we treat as a fixed constant, which empirically we find is the case. Under this assumption, the condition number of on the space of functions orthogonal to is bounded by the operator norm of , which satisfies
If is an arbitrary smooth positive function, then could still take very large values, which could make our problem ill-conditioned. In the following section, we introduce a definition of that includes smoothing which can be used to control its -norm.
4.3. Smoothing when defining the potential
Recall that the potential in the definition of can be defined by
The basic idea is to run the heat equation on and use the resulting smoothed function to define the potential . Given a function , we define a -parameter family of norms parameterized by as follows. Let and denote the Neumann Laplacian eigenvalues and eigenfunctions, see §4.1. We define the Neumann heat kernel for a function by
Next, we define the smoothed potential by
and define the corresponding operator by
By §3.4, the operator defines a -norm, where is the measure with density . Observe that if then , while when then . In particular, we have