On the geometrical properties of the coherent matching distance in 2D persistent homology

01/20/2018 ∙ by Andrea Cerri, et al. ∙ Université de Saint-Boniface University of Bologna Consiglio Nazionale delle Ricerche 0

In this paper we study a new metric for comparing Betti numbers functions in bidimensional persistent homology, based on coherent matchings, i.e. families of matchings that vary in a continuous way. We prove some new results about this metric, including its stability. In particular, we show that the computation of this distance is strongly related to suitable filtering functions associated with lines of slope 1, so underlining the key role of these lines in the study of bidimensional persistence. In order to prove these results, we introduce and study the concepts of extended Pareto grid for a normal filtering function as well as of transport of a matching. As a by-product, we obtain a theoretical framework for managing the phenomenon of monodromy in 2D persistent homology.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

The classical approach to persistent homology is based on the study of the homological changes of the sublevel sets of a topological space filtered by means of a continuous function , when varies in . This theory is interesting both from the theoretical and applicative point of view, since the function can be used to describe both topological properties of and data defined on this space. A description of persistent homology and its use can be found in [19].

The case is intrinsically more difficult to study than the case and calls for the development of new mathematical ideas and methods. One of these methods consists in a reduction from the -dimensional to the -dimensional case by means of a family of functions , with and (cf. [8]), defined by setting . Each pair identifies the positive slope line in defined by the parametric equation The function allows one to represent the set as the set , which describes a -dimensional filtration of for varying in . For technical reasons, we normalize the function by setting . In plain words, the previous 1D filtration associated with the function is obtained by projecting to the plane by means of and considering for each the subset given by the points staying on the bottom left of (see Figure 1). It is well-known that in each degree the collection of the 1D Betti numbers functions associated with the 1D filtrations defined by the filtering functions is equivalent to the 2D Betti numbers function of [8].

Figure 1. The 1D filtration defined by the function . The light blue set is the sublevel set associated with the value .

After fixing , each -dimensional filtration associated with the function defines a persistence diagram , which is the set of pairs describing the time of birth and the time of death of the th homological class in degree along the filtration associated with . If two filtering functions are given, a common way to compare the two collections and consists in computing the supremum of the classical bottleneck distance between the persistence diagrams and over . This idea leads to a metric between the aforementioned families of persistence diagrams (cf. [3, 8]). We observe that, in principle, a small change of the pair can cause a large change in the “optimal” matching, that is, the matching realizing the bottleneck distance between and . In other words, the definition of is based on a family of optimal matchings that is not required to change continuously with respect to the pair .

Experiments concerning the computation of this distance reveal an interesting phenomenon, consisting of the fact that many examples exist where the supremum defining is taken for lines with . Figure 2 illustrates two of these examples.

Figure 2. The bottleneck distance between the persistence diagrams and for two different pairs of functions from to , represented as a function of . The colors correspond to the value of the bottleneck distance at each point , with red meaning higher values and blue, lower values. We can observe that the maximum value is taken at a point with . More details about the considered functions can be found in [3].

A natural question arises: Does the property illustrated in those examples always hold for the distance ?

Unfortunately, we are not able to directly answer this question, because of the lack of geometrical properties in the definition of . Furthermore, we observe that while the metric is rather simple to define and approximate by considering a suitable family of filtering functions associated with lines having positive slope, it has two main drawbacks. First, it forgets the natural link between the homological properties of filtrations associated with lines that are close to each other. As a consequence, part of the interesting homological information is lost. Second, its intrinsically discontinuous definition makes studying its properties difficult.

For these reasons, in the previous paper [10] we have introduced a new matching distance between 2D persistence diagrams (i.e. families of persistence diagrams associated with the lines as changes), called coherent matching distance and based on matchings that change “coherently” with the filtrations we take into account. In other words, the basic idea consists of considering only matchings between the persistence diagrams and that change continuously with respect to the pair . This requirement is both natural and useful, and this paper is devoted to the exploration of its main consequences.

First of all, the idea of “coherent matching” leads to the discovery of an interesting phenomenon of monodromy. We observe that when we require that the matchings change continuously, we have to avoid the pairs at which the persistence diagram contains double points, called singular pairs. This is done by choosing a connected open set of regular (non-singular) pairs in the parameter space, and assuming that . In doing this, we can preserve the “identity” of points in the persistence diagram and follow them when we move in the parameter space. From this easily arises the concept of a family of matchings that is continuously changing. Interestingly, turning around a singular pair can produce a permutation in the considered persistence diagram, so that the considered filtering function is associated with a monodromy group. A basic example of this monodromy phenomenon can be found by taking the filtering function with , and

then being extended linearly for every on the segments respectively joining with , with , and to . On the half-lines and , is then being taken with constant slope in the variable . The graph of is shown in Figure 3.

Figure 3. The graph of in our basic example of monodromy for 2D persistent homology.

The persistence diagram in degree of the function contains a double point, so that is a singular pair for in degree . If we move around the point in the parameter space, we can see that two points of the persistence diagram exchange their position. For more details about this example we refer the interested reader to the paper [9]. We can easily adapt this example and get a smooth filtering function defined on a smooth closed manifold, revealing a similar phenomenon of monodromy.

As a consequence, our definition of “coherent matching” must take a monodromy group into account. This is done in our paper by defining a transport operator , which continuously transports each matching between the persistence diagrams , to a matching between the persistence diagrams , along a path from to in the set . The existence of monodromy implies that the transport of does not depend only on the pairs , but also on the path we consider.

By introducing the transport operator , we can define the coherent cost as the supremum of the classical cost of the matchings that we can obtain from by means of every possible transport operator over .

This done, the definition of the coherent matching distance is straightforward: If two filtering functions are given and does not contain their singular pairs in the considered degree , is the infimum of the coherent costs of the matchings between the sets and computed with respect to the chosen degree , for a pair arbitrarily fixed. We also prove that this definition does not depend on the choice of .

A key point in our paper consists in proving that the function takes its global maximum over when the endpoint of belongs to the vertical line or to the boundary of (Theorem 5.4 in Section 5). This result follows from the maximum principle for the coherent transport (Theorem 5.2) and casts new light on the abundance of examples where the supremum defining the classical distance is taken for lines with . In our opinion, the previous result can be seen as a strong signal that the coherent matching distance should be preferred to the classical matching distance both in theory and applications, since its use allows one to manage the parameter space more efficiently. We observe that the value identifies the planar lines with slope . We think that the filtering functions associated with these lines are worth further study in 2D persistent homology, since they appear to encapsulate most relevant information. It is interesting to point out that these lines also take an important place in the paper [13]

, although in a different context, and that the direction of the vector

has a key role in the definition of interleaving distance between multidimensional persistence modules [22]. The fact that lines of slope appear in various different approaches suggests to us that they would deserve further study. We observe that for the function coincides with the function , so our research suggests that this collection of filtering functions could play an important role in 2D persistent homology. Incidentally, this is also supported by the fact that, fixing , it is possible to replace the classical upper bound for the distance between the 2-dimensional persistent Betti numbers, that is, , by (Proposition 2.3).

We conclude by observing that, while our research highlights the importance of the lines of slope , this does not mean that lines with a different slope are useless in 2D persistent homology. As we will show, the construction of matchings that change coherently with filtrations defined by lines of slope compels us to use lines with slope different from as well. This is due to the need to avoid lines possibly corresponding to singular pairs. Furthermore, the phenomenon of monodromy can appear only if lines with a slope different from are also considered. These facts justify our approach, which is based on the use of every line of positive slope.

Our paper is devoted to illustrating the theoretical model that we have sketched in this introduction. This will require the use of several new concepts and the proof of many properties related to these concepts, so that a by-product of our research is the development of a new theoretical framework to manage 2D persistent homology, based on the concept of extended Pareto grid.

The outline of the paper is as follows. In Section 1 we recall the necessary mathematical background. In Section 2 we illustrate the 2D setting for persistent Betti numbers functions. In Section 3 we introduce the concept of extended Pareto grid as the main mathematical tool in our approach, and prove several results paving the way to the mathematical framework illustrated in the following sections. In Section 4 we introduce the concept of transport of a matching together with its main properties, and present the definition of the coherent 2-dimensional matching distance, also proving its stability. In Section 5 we prove the maximum principle for the coherent transport and present our main result on the coherent matching distance in 2D persistent homology (Theorem 5.4). In Section 6 we conclude the paper by illustrating the relation between the coherent matching distance and the classical matching distance.

Related literature

Studying the persistence properties of vector-valued functions is usually referred to as multidimensional persistence. These concepts were first investigated in [21] with respect to homotopy groups; multidimensional persistence modules were then considered in [7], and subsequently studied in other papers including [6] and the recent [22, 23]. Another approach to the multidimensional setting is the one proposed in [4]. Focusing on 0th homology, the authors introduce a procedure allowing for a reduction of the multidimensional case to the 1-dimensional setting by using a suitable family of derived real-valued filtering functions. Such a result has been partially extended in [5], i.e. for any homology degree but restricted to the case of max-tame filtering functions, and then further refined in [8] for continuous filtering functions. This approach leads to the definition of a multidimensional matching distance between persistent Betti numbers functions and to algorithms for its computation (cf. [3, 11]). More recently, the interleaving distance between multidimensional persistence modules has been formally introduced and discussed in [22]. However, according to the author of [22], the question of if and how this last distance can be computed or approximated remains open, thus justifying the study of other metrics such as the one we propose in this paper. In the same line of thought, some recent papers have been devoted to the computation of bounds for the interleaving distance [1, 2, 16]. The phenomenon of monodromy in 2D persistent homology has been described and studied in [9].

1. Mathematical setting

In what follows we will assume that is a continuous map from a finitely triangulable topological space to the real plane .

1.1. Persistent Betti numbers

As a reference for multidimensional persistent Betti numbers we use [8]. According to the main topic of this paper, we will also stick to the notations and working assumptions adopted in [9]. In particular, we build on the strategy adopted in the latter paper to study certain instances of monodromy for multidimensional persistent Betti numbers. Roughly, the idea is to reduce the problem to the analysis of a collection of persistent Betti numbers associated with real-valued functions, and to their compact representation in terms of persistence diagrams.

We use the following notations: is the open set . represents the diagonal . We can extend with points at infinity of the kind , where . Denote this set . For a continuous function , and for any , if , the inclusion map of the sublevel set into the sublevel set induces a homomorphism from the th homology group of into the th homology group of . The image of this homomorphism is called the th persistent homology group of at , and is denoted by . In other words, the group contains all and only the homology classes of -cycles born before or at and still alive at . By assuming that coefficients are chosen in a field , we get that homology groups are vector spaces. Therefore, they can be completely described by their dimension, leading to the following definition [18].

Definition 1.1 (Persistent Betti Numbers).

The persistent Betti numbers function of in degree , briefly PBN, is the function defined by

Under the above requirements for , it is possible to show that is finite for all [8]. Obviously, for each , we have different PBNs of (which might be denoted by , say), but for the sake of notational simplicity we omit adding any reference to .

Following [8], we assume the use of Čech homology, and refer the reader to that paper for a detailed explanation about preferring this homology theory to others. For the present work, it is sufficient to recall that, with the use of Čech homology, the PBNs of a real-valued function can be completely described by the corresponding persistence diagrams. Formally, a persistence diagram can be defined via the notion of multiplicity [14, 20]. Following the convention used for PBNs, any reference to will be dropped in the sequel.

Definition 1.2 (Multiplicity).

The multiplicity of is the finite, non-negative number given by

The multiplicity of is the finite, non-negative number given by

Definition 1.3 (Persistence Diagram).

The persistence diagram is the multiset of all points such that , counted with their multiplicity, union the singleton , where the point is counted with infinite multiplicity.

Each point will be called proper, while each point will be called a point at infinity or an improper point.

Remark 1.4.

In literature, persistence diagrams are usually defined to contain each single point of the diagonal instead of one point representing the whole diagonal, with infinite multiplicity. The two definitions are equivalent, but we prefer the latter because it will allow us to make easier our exposition and in particular the definition of the set in Section 3.

We endow with the following extended metric . We define

(1.1)

for every , with the convention about points at infinity that when , , , , and . Furthermore, we set if , if , and .

Persistence diagrams are stable under the bottleneck distance (a.k.a. matching distance). Roughly, small changes in the considered function induce small changes in the position of the points of which are far from the diagonal, and possibly produce variations close to the diagonal [14, 15]. A visual intuition of this fact is given in Figure 4. Formally, we have the following definition:

Definition 1.5 (Bottleneck distance).

Let , be two persistence diagrams. For each bijection between and we set . The bottleneck distance is defined as

where varies among all the bijections between and .

In practice, the distance defined in (1.1) compares the cost of moving a point to a point with that of annihilating them by moving both and onto , and takes the most convenient. Therefore, can be considered a measure of the minimum cost of moving to along two different paths.

Sometimes in literature the definition of is given by means of a supremum instead of a maximum, and the bottleneck distance is introduced as an infimum instead of a minimum. We underline that both these presentations are correct, as pointed out in [8]. In other words, a matching and a point always exist, such that . The matching is called an optimal matching between and .

The stability of persistence diagrams can then be formalized as follows [14, 15]:

Theorem 1.6 (Stability Theorem).

Let be two continuous functions. Then .

Figure 4. Changing the function to induces a change in the persistence diagram. In this example, the graphs on the left represent the real-valued functions and , defined on a space (a segment). The corresponding persistence diagrams (restricted to 0th homology) are displayed on the right.

2. 2-dimensional setting

The definition of persistent Betti numbers can be easily extended to functions taking values in [8]. For a continuous function , and for any , if and , the inclusion map of the sublevel set into the sublevel set induces a homomorphism from the th homology group of into the th homology group of . The image of this homomorphism is called the th persistent homology group of at , and is denoted by .

Definition 2.1 (Persistent Betti Numbers in the case ).

The persistent Betti numbers function of in degree , briefly PBN, is the function defined by

We discuss this for the specific case of the above function , referring the reader to Figure 5 for a pictorial representation.

Figure 5. Correspondence between an admissible line and the persistence diagram . Left: a 1D filtration is constructed by sweeping the line . The vector and the point are used to parameterize this line as . Right: the persistence diagram of the 1D filtration can be found on a planar section of the domain of the 2D persistent Betti numbers function .

Let us consider the set of all lines of that have positive slope. This set can be parameterized by the set , by taking each line to the unique pair with and such that is a direction vector of and . The line will be denoted by . is referred to as the set of admissible lines. Each point of can be associated with the subset , that is the set of the points of “whose image by is under and on the left of ” while moves along the line . As a consequence, each admissible line defines a filtration of and a persistence diagram associated with this filtration. The family of the persistence diagrams associated with the lines is called the 2D persistence diagram of .

It is interesting to observe that the filtration can be also defined as the sublevel sets filtration induced by a suitable real-valued function. In fact, we have that where is defined by setting . The Reduction Theorem proved in [8] states that the persistent Betti numbers function can be completely recovered by considering all and only the persistent Betti numbers functions associated with the admissible lines , which are in turn encoded in the corresponding persistence diagrams .

In some sense, the study of persistent homology for -valued functions can be seen as the study of the persistent homology groups associated with the filtrations defined by the lines , varying in . It is natural to wonder which pairs are more relevant for the topological comparison of two functions . This paper is mainly devoted to underline the particular importance of the pairs with , starting from the following results providing an alternative, yet equivalent, formulation of the -distance between and :

Lemma 2.2.

For every set . Then .

Proof.

For every and every , we have

Proposition 2.3.

Let be two continuous functions. Then

Proof.

By Lemma 2.2, we know that if then . Therefore we have that

Let us take a point such that . We can assume that . If then , so that and .

Furthermore, if we also assume that then and . It follows that

2.0.1. 2-dimensional matching distance

Assume now that we have two continuous functions . We consider the persistence diagrams , associated with the admissible line , and normalize them by multiplying their points by . This is equivalent to consider the normalized persistence diagrams , , with and , respectively. The 2-dimensional matching distance [3] is then defined as

with denoting the bottleneck distance between the normalized persistence diagrams and .

Remark 2.4.

It is common in the literature (cf. [8]) to refer to the 2-dimensional matching distance as giving a distance between two 2-dimensional persistent Betti numbers functions (or 2D persistence diagrams). In this paper, in order to simplify the exposition, it will be said to give a pseudo-distance between the functions and themselves, denoted . The same will be the case for the coherent matching distance which will be defined in Section 4.3.

By Lemma 2.2 and the Stability Theorem 1.6 the next result immediately follows.

Corollary 2.5.

.

Remark 2.6.

The introduction of normalized persistence diagrams in the definition of is crucial to obtain a stable pseudo-metric (cf. [8, Thm. 4.4]). Indeed, Lemma 2.2 implies that the bottleneck distance is less than or equal to , while we underline that this is not true for the distance .

2.0.2. Monodromy in 2-dimensional persistent homology

Since each function depends continuously on the parameters and with respect to the -norm, it follows that the set of points in depends continuously on the parameters and . Analogously, the set of points in depends continuously on the parameters and . Suppose that is an optimal matching, i.e. one of the matchings achieving the bottleneck distance . Given the above arguments, a natural question arises, whether changes continuously under variations of and . In other words, we wonder if it is possible to straightforwardly introduce a notion of coherence for optimal matchings with respect to the elements of .

Perhaps surprisingly, the answer is no. A first obstruction is given by the fact that, trying to continuously extend a matching , the identity of points in the (normalized) persistence diagrams is not preserved when considering an admissible pair for which either or has points with multiplicity greater than 1. In other words, we cannot follow the path of a point of a persistence diagram when it collides with another point of the same persistence diagram. On the one hand, this problem can be solved by fixing a degree and replacing with its subset , where is the set of all points such that in degree the persistence diagram does not contain multiple points. Throughout the rest of the paper, we will talk about singular pairs for in degree to denote the pairs , and about regular pairs for in degree to denote the pairs . An analogous convention holds referring to the singular and regular pairs for .

On the other hand, however, continuously extending a matching presents some problems even in this setting. Roughly, the process of extending along a path depends on the homotopy class of relative to its endpoints. This phenomenon is referred to as monodromy in 2-dimensional persistent homology, and has been studied for the first time in [9]. In the following we will show how to overcome this issue in order to define a coherent modification of the standard 2-dimensional matching distance .

There are two different ways we can alleviate the difficulty caused by the monodromy phenomenon in order to construct a coherent 2-dimensional matching distance. We can choose to transport matchings by moving along paths in a covering of the parameter space, or we can rather define the transport of matchings along paths in the parameter space itself. In this paper we will choose this last approach.

3. The extended Pareto grid and its main properties

In order to proceed we will assume that is a closed smooth manifold and our filtering function is sufficiently regular, in the sense described in this section. If not differently stated, we will also assume that a degree has been fixed for the computation of persistence diagrams.

Let be a smooth map from a closed -manifold of dimension to the real plane . Choose a Riemannian metric on so that we can define gradients for and . The Jacobi set is the set of all points at which the gradients of and are linearly dependent, namely or for some . In particular, if the point is said to be a critical Pareto point for . The set of all critical Pareto points of is denoted by and is a subset of the Jacobi set . Obviously, contains both the critical points of and the critical points of .

We assume that

  • No point exists such that both and vanish;

  • is a smoothly embedded 1-manifold in consisting of finitely many components, each one diffeomorphic to a circle;

  • is a 1-dimensional closed submanifold of , with boundary in .

We consider the set of cusp points of , that is, points of at which the restriction of to fails to be an immersion. In other words is the subset of at which both and are orthogonal to .

We also assume that

The connected components of are finite in number, each one being diffeomorphic to an interval. With respect to any parameterization of each component, one of and is strictly increasing and the other is strictly decreasing. Each component can meet critical points for only at its endpoints.

In [24] (see also [17]) it is proved that the previous properties are generic in the set of smooth maps from to .

Property implies that the connected components of are open, or closed, or semi-open arcs in . Following the notation used in [24], they will be referred to as critical intervals of . If an endpoint of a critical interval actually belongs to that critical interval and hence is not a cusp point, then it is a critical point for either or . We denote the critical intervals of by , and parameterize these arcs arbitrarily, that is, , with equal to , or , or , or . Our assumptions also imply that both the set of critical points of and the set of critical points of are finite.

3.1. The extended Pareto grid

Our purpose is to establish a formal link between the position of points of for a function and the intersections of the admissible line with a particular subset of the plane , called the extended Pareto grid of , which we will define here.

Let us list the critical points of and the critical points of (our assumption guarantees that ). Consider the following closed half-lines: for each critical point of (resp. each critical point of ), the half-line (resp. the half-line ). The extended Pareto grid is defined to be the union of with these closed half-lines. The closures of the images of critical intervals of will be called proper contours of associated with those critical intervals of