 # Understanding the Topology and the Geometry of the Persistence Diagram Space via Optimal Partial Transport

We consider a generalization of persistence diagrams, namely Radon measures supported on the upper half plane for which we define natural extensions of Wasserstein and bottleneck distances between persistence diagrams. Such measures naturally appear in topological data analysis when considering continuous representations of persistence diagrams (e.g. persistence surfaces) but also as limits for laws of large numbers on persistence diagrams or as expectations of probability distributions on the persistence diagrams space. Introducing a formalism originating from the theory of optimal partial transport, we build a convenient framework to prove topological properties of this new space, which will also hold for the closed subspace of persistence diagrams. New results include a characterization of convergence with respect to Wasserstein metrics, and the existence of barycenters (Fréchet means) for any distribution of diagrams. We also showcase the strength of this framework by providing several statistical results made meaningful thanks to this new formalism.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Topological Data Analysis (TDA) is an emerging field in data analysis that has found applications in computer vision

, material science [21, 25], shape analysis [10, 37], to name a few. The aim of TDA is to provide interpretable descriptors of the underlying topology of a given object. One of the most used (and theoretically studied) descriptors in TDA is the persistence diagram. This descriptor consists in a locally finite multiset of points in the upper half plane , each point in the diagram corresponding informally to a topological feature (connected component, loop, hole) of a given object. The space of persistence diagrams is usually equipped with partial matching metrics , sometimes called Wasserstein distances [17, Chapter VIII.2]: for and in , define

 dp(a,b)\vcentcolon=(infπ∈Γ(a,b)∑x∈a∪∂Ω∥x−π(x)∥p)1p, (1)

where is the set of partial matchings between and , i.e. bijections between and , being the boundary of , namely the diagonal (see Figure 1). When , we recover the so-called bottleneck distance:

 d∞(a,b)\vcentcolon=infπ∈Γ(a,b)maxx∈a∪∂Ω∥x−π(x)∥. (2) Figure 1: An example of optimal partial matching between two diagrams. The bottleneck distance between these two diagrams is—by definition—the length of the longest edge in this matching, while their Wasserstein distance dp is the p-th root of the sum of all edge lengths to the power p.

An equivalent viewpoint, developed in [11, Chapter 3], is to consider a persistence diagram as a measure of the form , where is locally finite and for all , so that is a locally finite measure supported on with integer mass on each point of its support. Considering persistence diagrams with such a perspective suggests to study more general Radon measures111A short reminder about Radon measure theory is provided in Appendix A. supported on the upper half-plane . Such objects appear naturally in several applications, e.g. when taking representations of persistence diagrams such as persistent surfaces , studying laws of large numbers for persistence diagrams [15, 16, 19], linear expectations of random persistence diagrams 

, or when estimating barycenters of persistence diagrams

.

In Section 3, we study the space of Radon measures supported on . For finite (the case is studied in Section 3.3), we define the persistence of as

 Persp(μ)\vcentcolon=∫Ωd(x,∂Ω)pdμ(x), (3)

where is the distance from a point to the diagonal , and we define

 Mp\vcentcolon={μ∈M, Persp(μ)<∞}. (4)

Let be the space of persistence diagrams with finite persistence . We equip with metrics (see Definition 2.1), originally introduced in a work of Figalli and Gigli . These metrics appear as extensions of metrics , as they both coincide on (Proposition 3.2). Elements of the metric space are referred to as persistence measures in the following. As is closed in (Corollary 3.1), most properties of hold for too (e.g. being Polish, Proposition 3.3).

A sequence of Radon measures is said to converge vaguely to a measure , denoted by , if for any continuous compactly supported function , . We prove the following equivalence between convergence for the metric and the vague convergence:

###### Theorem 3.4.

Let . Let be measures in . Then,

 Dp(μn,μ)→0⇔{μnv→μ,Persp(μn)→Persp(μ). (5)

This characterization in particular holds for persistence diagrams in . This result is analog to the characterization of convergence of probability measures in the Wasserstein space (see [39, Theorem 6.9]). A proof for Radon measures supported on a bounded set can be found in [18, Proposition 2.7]. Our work consists in extending this result to non-bounded sets, in particular to the upper half plane . As a corollary of this theorem, we obtain the following general result on the continuity of linear representations of persistence measures (and diagrams).

###### Corollary 3.2.

Let , , and be a continuous bounded function. The feature map defined by is continuous with respect to .

This new result can be compared to the recent works [26, Proposition 3.2] and [15, Theorem 3], which show that linear feature maps can have more regularity (e.g. Lipschitz or Hölder) under additional assumptions. We also show in Section 3.2, that the problem of computing the metric between two measures of finite mass can be turned into the known problem of computing a Wasserstein distance (see Section 2) between two measures with the same masses, a result having practical implications for the computation of distances between diagrams.

Section 4 studies Fréchet means (i.e. barycenters, see Definition 4.1) for probability distributions of persistence measures. In the specific case of persistence diagrams, the study of Fréchet means was initiated in [31, 36], where authors prove the existence of barycenters for certain types of distributions [31, Theorem 28]. We show that this existence result is actually true for any distributions of persistence measures, and adapt them to persistence diagrams. Namely, we prove the following results:

###### Theorem 4.3.

For any probability distribution supported on with finite

-th moment, the set of barycenters of

is not empty.

###### Theorem 4.4.

If is supported on and has a finite -th moment, then admits a barycenter in .

Theorem 4.3 follows the work of [29, Theorem 2] (itself following the seminal paper of Agueh and Carlier ), where authors prove the existence of Fréchet mean of probability measures endowed with the Wasserstein metric. We adapt this result to the space of Radon measures endowed with the metrics.

Section 5 presents two applications of Theorem 3.4: Proposition 5.1 gives a law of large numbers in terms of the metric , while Proposition 5.3 states a stability result between (Čech) diagrams and input point cloud in a random setting.

## 2 Elements of optimal transport

In this section, denotes a Polish space. Optimal transport is a widely developed theory providing tools to study and compare probability measures supported on [38, 39, 33].

### 2.1 Wasserstein distances

Given two probability measures supported on , the Wasserstein- distance () induced by the metric between these probability measures is defined as

 Wp,d(μ,ν)\vcentcolon=(infπ∈Π(μ,ν)∬Ω×Ωd(x,y)pdπ(x,y))1p, (6)

where denotes the set of transport plans between and , that is the set of measures on which have respective marginals . When there is no ambiguity on the distance used, we simply write instead of . In order to have finite, and are required to have a finite -th moment, that is there exists such that (resp. ) is finite. The set of such probability measures, endowed with the metric , is referred to as .

### 2.2 Extension to Radon measures 

Classic optimal transport only deals with probability measures, that is—up to a renormalization factor—positive measures with the same mass. In , Figalli and Gigli propose to extend Wasserstein distances to Radon measures supported on an open proper subset of , whose boundary is denoted by (and ), considering the following optimal partial transport problem:

###### Definition 2.1.

[18, Problem 1.1] Let . Let be two Radon measures supported on satisfying (resp. ). The set of admissible transport plans (or couplings) is defined as the set of Radon measures on satisfying:

 for all Borel sets A,B⊂Ω,π(A,¯¯¯¯Ω)=μ(A) and π(¯¯¯¯Ω,B)=ν(B).

The cost of is defined as

 Cp(π)\vcentcolon=∬¯¯¯Ω×¯¯¯Ωd(x,y)pdπ(x,y). (7)

The distance is then defined as

Plans realizing the infimum in (8) are called optimal. The set of optimal transport plans is denoted by .

The following definition shows how to build an element of given a map satisfying some balance condition.

###### Definition 2.2.

Let . Consider satisfying for all Borel set

 μ(f−1(B)∩Ω)+ν(B∩f(∂Ω))=ν(B). (9)

Define for all Borel sets ,

 π(A×B)=μ(f−1(B)∩Ω∩A)+ν(Ω∩B∩f(A∩∂Ω)). (10)

is called the transport plan induced by the transport map .

One can easily check that we have indeed and for any Borel sets , so that (see Figure 2). Figure 2: A transport map f must satisfy that the mass ν(B) (light blue) is the sum of the mass μ(f−1(B)∩Ω) given by μ that is transported by f onto B (light red) and the mass ν(B∩f(∂Ω)) coming from ∂Ω and transported by f onto B.
###### Remark 2.1.

Since we have no constraints on , one may always assume that a plan satisfies , so that measures are supported on

 EΩ\vcentcolon=(¯¯¯¯Ω×¯¯¯¯Ω)∖(∂Ω×∂Ω). (11)

Under the assumption that is bounded, and assuming (but the authors mention that their proofs work for any finite ), it is proved in  that:

• [18, Theorem 2.2]: is a indeed a distance over Radon measures supported on .

• [18, Proposition 2.7]: Given Radon measures, we have

 Dp(μn,μ)→0⇔{μnv→μ,∫Ωd(x,∂Ω)pdμn(x)→∫Ωd(x,∂Ω)pdμ(x).

## 3 Structure of the persistence measures and diagrams spaces

For the remainder, we fix endowed with the euclidean metric .

### 3.1 General properties of Mp

It is assumed for now that . The case is studied in Section 3.3. The first proposition states preliminary results on the problem stated in Definition 2.1.

###### Proposition 3.1.

Let . The set of transport plans is sequentially compact for the vague topology on . Moreover, if , for this topology,

• is lower semi-continuous.

• is a non-empty sequentially compact set.

• is lower semi-continuous, in the sense that for sequences in satisfying and , we have .

Moreover, is a distance on .

These properties are mentioned in [18, pages 4-5] in the bounded case, and adapt straightforwardly to our framework. For the sake of completeness, we provide a detailed proof in Appendix B.

###### Remark 3.1.

If a (Borel) measure satisfies , then for any Borel set satisfying , we have:

so that . In particular, is automatically a Radon measure.

The following lemma gives a simple way to approximate a persistence measure (resp. diagram) with ones of finite masses.

###### Lemma 3.1.

Let . Fix , and let . Let be the restriction of to . Then when . Similarly, if , we have .

###### Proof.

Let be the transport plan induced by the identity map on , and the projection onto on . As is sub-optimal, one has:

 Dpp(μ,μ(r))≤Cp(π)=∫Ard(x,∂Ω)pdμ(x)=Persp(μ)−Persp(μ(r)).

Thus, by the monotone convergence theorem applied to with the functions , as . Similar arguments show that as . ∎

The following proposition is central in our work: it shows that the metrics are extensions of the metrics .

For , .

###### Proof of 3.2.

Let be two persistence diagrams. The case where have a finite number of points is already treated in [28, Proposition 1].

In the general case, let . Due to (12), the diagrams and defined in Lemma 3.1 have a finite mass (thus finite number of points). Therefore, . By Lemma 3.1, the left hand side converges to while the right hand side converges to , giving the conclusion. ∎

###### Proposition 3.3.

The space is a Polish space.

As for Proposition 3.1, this proposition appears in [18, Proposition 2.7] in the bounded case, and its proof is straightforwardly adapted to our framework. For the sake of completeness, we provide a detailed proof in Appendix B.

We now state one of our main result: a characterization of convergence in .

###### Theorem 3.4.

Let be measures in . Then,

 Dp(μn,μ)→0⇔{μnv→μ,Persp(μn)→Persp(μ). (13)

This result is analog to the characterization of convergence of probability measures in the Wasserstein space (see [39, Theorem 6.9]) and can be found in [18, Proposition 2.7] in the case where the ground space is bounded. While the proof of the direct implication can be adapted (it can be found in Appendix B), a new proof is needed for the converse implication.

###### Proof of the converse implication.

Let be elements of and assume that and . Since , the sequence is bounded. Thus, if we show that admits as an unique accumulation point, then the convergence holds. Up to extracting a subsequence, we may assume that converges to some limit. Let be corresponding optimal transport plans. The vague convergence of , together with Proposition A.1, imply that is relatively compact with respect to the vague convergence on . Let be the limit of any converging subsequence of , which indexes are still denoted by . As , standard arguments of optimal transport theory assert that is necessarily in (see [18, Proposition 2.3]), i.e.  is supported on . The vague convergence of and the convergence of to imply that for a given compact set , whose complementary set in is denoted by and interior set is denoted by , we have

 limsupn→∞∫Kcd(x,∂Ω)pdμn(x) =limsupn→∞(Persp(μn)−∫Kd(x,∂Ω)pdμn(x)) ≤Persp(μ)−∫˚Kd(x,∂Ω)pdμ(x) by Portmanteau theorem (see Appendix ???) =∫¯¯¯¯¯¯Kcd(x,∂Ω)pdμ(x).

Therefore, for , there exists some compact set , with

 limsupn∫Kcd(x,∂Ω)pdμn(x)<ε and∫Kcd(x,∂Ω)pdμ(x)<ε. (14)

For some compact set , consider the following transport plan (consider informally that what went from to and from to is now transported onto the diagonal, while everything else is unchanged):

 ⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩~πn=πn on K2⊔(Kc)2,~πn=0 on K×Kc⊔Kc×K,~πn(A×B)=πn(A×(Kc⊔B)) for A⊂K, B⊂∂Ω,~πn(A×B)=πn(A×(K⊔B)) for A⊂Kc, B⊂∂Ω,~πn(A×B)=πn((Kc⊔A)×B) for A⊂∂Ω, B⊂K,~πn(A×B)=πn((K⊔A)×B) for A⊂∂Ω, B⊂Kc. (15)

Note that : for instance, for a Borel set, , and it is shown likewise that the other constraints are satisfied. As is suboptimal, . The latter integral is equal to a sum of different terms, and we show that each of them converges to . Assume that the compact set belongs to an increasing sequence of compact sets of so that and for all .

• We have . The of the integral is smaller than by Portmanteau theorem (applied to the sequence ), and, recalling that is supported on the diagonal of , this integral is equal to .

• Figalli and Gigli [18, Proposition 2.3] show that an optimal transport plan, such as , must be supported on . It follows that

 ∬(Kc)2d(x,y)pd~πn(x,y) =∬(Kc)2d(x,y)pdπn(x,y) ≤∫Kcd(x,∂Ω)pdμn(x)+∫Kcd(y,∂Ω)pdμ(y).

Taking the lim sup in , and then letting goes to , this quantity converges to by (14).

• We have . By Portmanteau theorem applied to the sequence , the of the first term is smaller than . Applying once again Portmanteau theorem on the second term (applied to the sequence ), and using that is supported on the diagonal of , the limsup of the second term is smaller than (recall that ). Therefore, the limsup of the integral is equal to 0.

• The three remaining terms (corresponding to the three last lines of the definition (15)) are treated likewise this last case.

Finally, we have proven that is bounded and that for any converging subsequence, . It follows that . ∎

###### Remark 3.2.

The assumption is crucial to obtain -convergence assuming vague convergence. For example, the sequence defined by converges vaguely to and does converge (as it is constant equal to ), while . This does not contradict Theorem 3.4 since .

The direct implication of Theorem 3.4 implies in particular that the topology of the metric is stronger than the vague topology. As a consequence, the following corollary holds, using Proposition A.5 ( is closed in for the vague convergence).

###### Corollary 3.1.

is closed in for the metric .

We recover in particular that the space is a Polish space (Proposition 3.3), a result already proved in [31, Theorems 7 and 12] with a different approach.

For a persistence measure , let be the measure (with finite mass) defined by

###### Corollary 3.2.

Let be measures in . Then, if and only if converges weakly222The weak convergence is defined in Appendix A. to .

In particular, if is a continuous bounded function, then the feature map defined by is continuous with respect to .

###### Proof.

Consider and assume that . By Theorem 3.4, and . Since for any continuous function compactly supported, the map is also continuous and compactly supported, implies in particular that . By Proposition A.3, the vague convergence along with the convergence of the masses imply . ∎

We end this section with a characterization of relatively compact sets in .

###### Proposition 3.5.

A set is relatively compact in if and only if the set is tight and .

###### Proof.

From Corollary 3.2, the relative compactness of a set for the metric is equivalent to the relative compactness of the set for the weak convergence. Recall that all have a finite mass, as . Therefore, one can use Prokhorov’s theorem (Proposition A.2) to conclude. ∎

###### Remark 3.3.

This characterization is equivalent to the one described in [31, Theorem 21] for persistence diagrams. The notions introduces by the authors of off-diagonally birth-death boundedness, and uniformness are rephrased using the notion of tightness, standard in measure theory.

### 3.2 Persistence measures in the finite setting

In practice, many statistical results regarding persistence diagrams are stated for sets of diagrams with uniformly bounded number of points [27, 9], and the specific properties of in this setting are therefore of interest. Introduce for the subset of defined as , and the set of finite persistence measures, . Define similarly the set (resp. ).

###### Proposition 3.6.

(resp ) is dense in (resp ) for the metric .

###### Proof.

This is a straightforward consequence of Lemma 3.1. ∎

Let be the quotient of by —i.e. we encode the diagonal by just one point. The distance on induces naturally a function (still denoted by ) on . However, is not a distance since one can have . Define

 ρ(x,y)\vcentcolon=min{d(x,y),d(x,∂Ω)+d(y,∂Ω)}. (17)

It is straightforward to check that is a distance on and that is a Polish space. One can then define the Wasserstein distance with respect to for finite measures on which have the same masses, that is the infimum of , for a transport plan with corresponding marginals (see Section 2.1). The following theorem states that computing the metric between two persistence measures with finite masses can be turn into computing the Wasserstein distances between two measures supported on with the same (finite) masses.

###### Theorem 3.7.

Let and . Define and . Then .

###### Proof.

We first introduce a lemma that explicits correspondences between and .

###### Lemma 3.2.

Let and . Let , and be the orthogonal projection on the diagonal.

1. Define the set of plans satisfying along with . Then, .

2. Let be such that . Define by, for Borel sets ,

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩ι(π)(A×B)=π(A×B),ι(π)(A×{∂Ω})=π(A×∂Ω),ι(π)({∂Ω}×B)=π(∂Ω×B),ι(π)({∂Ω}×{∂Ω})=r−μ(Ω)−π(∂Ω×Ω)≥0. (18)

Then, .

3. Let . Define by,

 ⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩κ(~π)(A×B)=~π(A×B) for % Borel sets A,B⊂Ω,κ(~π)(A×B)=~π((A∩s−1(B))×{∂Ω}) for Borel sets A⊂Ω,B⊂∂Ω,κ(~π)(A×B)=~π({∂Ω}×(B∩s−1(A))) for Borel sets A⊂∂Ω,B⊂Ω,κ(~π)(∂Ω,∂Ω)=0.

Then, .

###### Proof.
1. Consider , and define that coincides with on , and is such that we enforce mass transported on the diagonal to be transported on its orthogonal projection: more precisely, for all Borel set , , and . Note that . Since is the unique minimizer of , it follows that , with equality if and only if , and thus .

2. Write . The mass is nonnegative by definition. One has for all Borel sets ,

 ~π(A×~Ω) =~π(A×Ω)+~π(A×{∂Ω}) =π(A×Ω)+π(A×∂Ω)=π(A×¯¯¯¯Ω)=μ(A)=~μ(A).

Similarly, for all . Observe also that

 ~π({∂Ω}×~Ω)=~π({∂Ω}×{∂Ω})+~π({∂Ω}×Ω)=r−μ(Ω)=~μ({∂Ω}).

Similarly, . It gives that , so that is well defined. Observe that

 ∬~Ω×~Ωd(x,y)pd~π(x,y) =∬Ω×Ωd(x,y)pdπ(x,y) +∫Ωd(x,∂Ω)pdπ(x,∂Ω)+∫Ωd(∂Ω,y)pdπ(∂Ω,y)+0 =Cp(π) as π∈T(μ,ν).
3. Write . For a Borel set,

 π(A×¯¯¯¯Ω)