Persistent topology consists in the study of the properties of filtered topological spaces. From the very beginning, it has been applied to shape comparison Fr91 ; UrVe97 ; VeUr96 ; VeUrFr93 . In this context, data are frequently represented by continuous -valued functions defined on a topological space . As simple examples among many others, these functions can describe the coloring of a 3D object, the coordinates of the points in a planar curve, or the grey-levels in a x-ray CT image. Each continuous function is called a filtering function and naturally induces a (multi)filtration on , made by the sublevel sets of . Persistent topology allows to analyse the data represented by each filtering function by examining how much the topological properties of its sublevel sets “persist” when we go through the filtration. The main mathematical tool to perform this analysis is given by persistent homology EdHa08 . This theory describes the birth and death of -dimensional holes when we move along the considered filtration of the space . When the filtering function takes its values in we can look at it as a time, and the distance between the birthdate and deathdate of a hole is defined to be its persistence. The more persistent is a hole, the more important it is for shape comparison, since holes with small persistence are usually due to noise.
An important property of classical persistent homology consists in the fact that if a self-homeomorphism is given, then the filtering functions cannot be distinguished from each other by computing the persistent homology of the filtrations induced by and . As pointed out in ReKoGu11 , this is a relevant issue in the applications where the functions cannot be considered equivalent. This happens, e.g., when each filtering function describes a grey-level image, since the images respectively described by and may have completely different appearances. A simple instance of this problem is illustrated in Figure 1.
Therefore, a natural question arises: How can we adapt persistent homology in order to prevent invariance with respect to the group of all self-homeomorphisms of the topological space , maintaining just the invariance under the action of the self-homeomorphisms that belong to a proper subgroup of ? For example, the comparison of the letters illustrated in Figure 1 should require just the invariance with respect to the group of similarities of , since they all are equivalent with respect to the group . We point out that depicted letters are constructed from thick lines and therefore have some width in opposite to the concept of geometrical lines.
One could think of solving the previous problem by using other filtering functions, possibly defined on different topological spaces. For example, we could extract the boundaries of the letters in Figure 1 and consider the distance from the center of mass of each boundary as a new filtering function. This approach presents some drawbacks:
It “forgets” most of the information contained in the image that we are considering, confining itself to examine the boundary of the letter represented by . If the boundary is computed by taking a single level of , this is also in contrast with the general spirit of persistent homology.
It usually requires an extra computational cost (e.g., to extract the boundaries of the letters in our previous example).
It can produce a different topological space for each new filtering function (e.g., the letters of the alphabet can have non-homeomorphic boundaries). Working with several topological spaces instead of just one can be a disadvantage.
It is not clear how we can translate the invariance that we need into the choice of new filtering functions defined on new topological spaces.
The purpose of this paper is to present a possible solution for the previously described problem. It is based on a dual approach to the invariance with respect to a subgroup of , and consists in changing the direct study of the group into the study of how the operators that are invariant under the action of act on classical persistent homology. This change of perspective reveals interesting mathematical properties, allowing to treat as a variable in our applications. According to this method, the shape properties and the invariance group can be determined separately, depending on our task. The operators that we consider in this paper act on the space of admissible filtering functions and, in some sense, can be interpreted as the “glasses” we use to look at the data. Their use allows to combine persistent homology and the invariance with respect to the group , extending the range of application of classical persistent homology to the cases in which we are interested in -invariance rather than in -invariance.
The idea of applying operators to filtering functions before computing persistent homology has been already considered in previous papers. For example, in ChEd11 convolutions have been used to get a bound for the norm of persistence diagrams of a diffusing function. Furthermore, in ReKoGu11 scale space persistence has been shown useful to detect critical points of a function by examining the evolution of their homological persistence values through the scale space. As for combining persistent homology and transformation groups, the interest in measuring the invariance of a signal with respect to a group of translations (i.e. the study of its periodicity or quasi-periodicity) has been studied in deSkVe12 ; PeHa* , using embedding operators. However, our approach requires to consider just a particular kind of operators (i.e. non-expanding -invariant operators on the set of admissible filtering functions), and faces the more general problem of adapting persistent homology to any group of self-homeomorphisms of a topological space.
For another approach to this problem, using quite a different method, we refer the reader to Fr12 .
1.1 Our main idea in a nutshell
After choosing a set of admissible filtering functions from the topological space to , and a subgroup of , we consider the set of all non-expanding -invariant operators . Basically, our idea consists in comparing two functions by computing the supremum of the bottleneck distances between the classical persistence diagrams of the filtering functions and , varying in . In our paper we prove that this approach is well-defined, -invariant, stable and computable (under suitable assumptions).
1.2 Outline of the paper
Our paper is organized as follows. In Section 2 we introduce some concepts that will be used in the paper and recall some basic facts about persistent homology. In Section 3 we prove our main results concerning the theoretical properties of our method (Theorems 14, 15 and 16). In Section 4 we illustrate the application of our technique to an experiment concerning 1D-signals. In Section 5 a possible application to image retrieval is outlined. A short discussion concludes the paper.
2 Mathematical setting
Let us consider a (non-empty) triangulable metric space with nontrivial homology in degree . This last assumption is always satisfied for and unrestrictive for , since we can embed in a larger triangulable space with nontrivial homology in degree , and substitute with . Let be the set of all continuous functions from to , endowed with the topology induced by the sup-norm . Let be a topological subspace of , containing at least the set of all constant functions. The functions in the topological space will be called admissible filtering functions on .
We assume that a subgroup of the group of all homeomorphisms from onto is given, acting on the set by composition on the right (i.e., the action of takes each function to the function ). We do not require to be a proper subgroup of , so the equality can possibly hold. It is easy to check that is a topological group with respect to the topology of uniform convergence. Indeed, we can check that if two sequences converge to and in , respectively, then the sequence converges to in . Furthermore, if a sequence converge to in , then the sequence converges to in .
We also notice that if two sequences in and are given, converging to in and to in , respectively, we have that . Since converges uniformly to in and is uniformly continuous on the compact space , . Moreover, , due to the invariance of the sup-norm under composition of the function inside the norm with homeomorphisms. Since converges uniformly to in , . Hence and .
Therefore, the right action of on the set is continuous.
If is a subset of , the set will be denoted by the symbol . Obviously, .
The pseudo-distance is defined by setting
It is called the (-dimensional) natural pseudo-distance associated with the group acting on .
The term “-dimensional” refers to the fact that the filtering functions are real-valued. The concepts considered in this paper can be easily extended to the case of -valued filtering functions, by substituting the absolute value in with the max-norm in . However, the use of -valued filtering functions would require the introduction of a technical machinery that is beyond the purposes of our research (cf., e.g., CeDFFe13 ), in order to adapt the bottleneck distance to the new setting. Therefore, for the sake of simplicity, in this paper we will just consider the -dimensional case.
We observe that the max-norm distance on , defined by setting is just the natural pseudo-distance in the case that is the trivial group , containing only the identity homeomorphism and acting on . Moreover, the definition of immediately implies that if and are subgroups of acting on and , then for every . As a consequence, the following double inequality holds, for every subgroup of and every (see also Theorem 5.2 in CeDFFe13 ):
The proof that is a pseudo-metric does use the assumption that is a group, and we can give a simple example of a subset of for which the function is not a pseudo-distance on . In order to do that, let us set , and consider the set containing just the identity and the counterclockwise rotation of radians. Obviously, is a subset, but not a subgroup of . We have that (because ) and (because ), but
Therefore the triangular inequality does not hold, so that is not a pseudo-distance on .
The rationale of using the natural pseudo-distance is that pattern recognition is usually based on comparing properties that are described by functions defined on a topological space. These properties are often the only accessible data, implying that every discrimination should be based on them. The fundamental assumption is that two objects cannot be distinguished if they share the same properties with respect to a given observer (cf.BiDFFa08 ).
In order to proceed, we consider the set of all operators that verify the following properties:
is a function from to ;
for every and every ;
for every (i.e. is non-expansive).
Obviously, is not empty, since it contains at least the identity operator.
Properties 1 and 2 show that is a -operator, referring to the right action of on .
The operators that we are considering are not required to be linear. However, due to the non-expansivity property, the operators in are -Lipschitz and hence are continuous.
In this paper, we shall say that a pseudo-metric on is strongly -invariant if it is invariant under the action of with respect to each variable, i.e., if for every and every .
It is easily seen that the natural pseudo-distance is strongly -invariant.
Take , equal to the group of all rotations of , and equal to the set of all continuous functions from to . As an example of an operator in we can consider the operator defined by setting for every and every , where denotes the point obtained from by rotating of a fixed angle . It is easy to check that is a non-expansive -invariant (linear) operator defined on . An example of a non-expansive -invariant non-linear operator defined on is given by the operator defined by setting for every and every .
This simple statement holds (the symbol denotes the function taking the value everywhere):
for every and every .
, since is non-expansive.
If is a subset of and is bounded with respect to , then we can consider the function
from to .
If is a non-empty subset of and is bounded then the function is a distance on .
See Appendix B.
The in the definition of cannot be replaced with . As an example, consider the case , , equal to the group containing just the identity and the homeomorphism taking each point to , equal to the constant function taking everywhere the value , and equal to the constant function taking everywhere the value . Both and are non-expansive -operators. We have that , but no function exists, such that . To prove this, we firstly observe that
for any .
Let us consider a sequence of continuous functions , such that and . We have that
so that . Hence .
In order to have , the equality should hold. This is clearly impossible, hence no function exists, such that .
2.1 Persistent homology
Before proceeding, we recall some basic definitions and facts in persistent homology. For a more detailed and formal treatment, we refer the interested reader to EdHa08 ; BiDFFa08 ; CaZo09 ; ChCo*09 . Roughly speaking, persistent homology describes the changes of the homology groups of the sub-level sets varying in , where is a real-valued continuous function defined on a topological space . The parameter can be seen as an increasing time, whose change produces the birth and death of -dimensional holes in the sub-level set . For , the expression “-dimensional holes” refers to connected components, tunnels and voids, respectively. The distance between the birthdate and deathdate of a hole is defined to be its persistence. The more persistent is a hole, the more important it is for shape comparison, since holes with small persistence are usually due to noise.
Persistent homology can be introduced in several different settings, including the one of simplicial complexes and simplicial homology, and the one of topological spaces and singular homology. As for the link between the discrete and the topological settings, we refer the interested reader to CaEtFr13 ; DFFr13 . In this paper we will consider the topological setting and the singular homology functor . An elementary introduction to singular homology can be found in Ha02 .
The concept of persistence can be formalized by the definition of persistent homology group with respect to the function :
If and , we can consider the inclusion of into . Such an inclusion induces a homomorphism between the homology groups of and in degree . The group is called the -th persistent homology group with respect to the function , computed at the point . The rank of this group is said the -th persistent Betti number function with respect to the function , computed at the point .
It is easy to check that the persistent homology groups (and hence also the persistent Betti number functions) are invariant under the action of . For further discussion see Appendix A.
A classical way to describe persistent Betti number functions (up to subsets of measure zero of their domain) is given by persistence diagrams. Another equivalent description is given by barcodes (cf. CaZo09 ). The -th persistence diagram is the set of all pairs , where and are the birthdate and the deathdate of the -th -dimensional hole, respectively. When a hole never dies, we set its deathdate equal to . For technical reasons, the points are added to each persistent diagram. Two persistence diagrams can be compared by computing the maximum movement of their points that is necessary to change into , measured with respect to the maximum norm. This metric naturally induces a pseudo-metric on the sets of the persistent Betti number functions. We recall that a pseudo-metric is just a metric without the property assuring that if two points have a null distance then they must coincide. For a formal definition of persistence diagram and of the distance (named bottleneck distance) that is used to compare persistence diagrams, we refer the reader to EdHa08 . For more details about the existence of pairs of different persistent Betti number functions that are associated with the same persistent diagram, we refer the interested reader to CeDFFe13 .
A key property of the distance is its stability with respect to and , stated in the following result.
If is a natural number and , then
The proof of the inequality in Theorem 11 can be found in CoEdHa07 (Main Theorem) for the case of tame filtering functions and in CeDFFe13 (Theorem 3.13) for the general case of continuous functions. The statement of Theorem 11 easily follows from the definition of (see Theorem 5.2 in CeDFFe13 ). Theorem 11 also shows that the natural pseudo-distance allows to obtain a stability result for persistence diagrams that is better than the classical one, involving . Figure 2 illustrates this fact, displaying two filtering functions such that .
2.2 Strongly -invariant comparison of filtering functions via persistent homology
Let us fix a non-empty subset of . For every fixed , we can consider the following pseudo-metric on :
for every , where denotes the -th persistent Betti number function with respect to the function . We will usually omit the index , when its value is clear from the context or not influential.
is a strongly -invariant pseudo-metric on .
Theorem 11 and the non-expansivity of every imply that
Therefore is a pseudo-metric, since it is the supremum of a family of pseudo-metrics that are bounded at each pair . Moreover, for every and every
because of Property 2 in the definition of and the invariance of persistent homology under the action of homeomorphisms (Remark 10). Due to the fact that the function is symmetric, this is sufficient to guarantee that is strongly -invariant.
A method to approximate is given by the next proposition.
Assume bounded. Let be a finite subset of . If for every at least one index exists, such that , then
for every .
Let us assume and . Because of the definition of , for any we have that and . Hence
because of the stability of persistent homology (Theorem 11). It follows that
The thesis of our proposition immediately follows from the definitions of and .
Therefore, if we can cover by a finite set of balls of radius , centered at points of , the approximation of can be reduced to the computation of the maximum of a finite set of bottleneck distances between persistence diagrams, which are well-known to be computable by means of efficient algorithms.
This fact leads us to study the properties of the topological space . We will do that in the next section.
3 Main theoretical results
We start by proving that the pseudo-metric is stable with respect to both the natural pseudo-distance associated with the group and the sup-norm.
If , then .
For every , every and every , we have that
The first equality follows from the invariance of persistent homology under the action of (Remark 10), and the second equality follows from the fact that is a -operator. The first inequality follows from the stability of persistent homology (Theorem 11), while the second inequality follows from the non-expansivity of .
It follows that, if , then for every and every
for every .
The natural pseudo-distance and the pseudo-distance are defined in completely different ways. The former is based on a variational approach involving the set of all homeomorphisms in , while the latter refers only to a comparison of persistent homologies depending on a family of -invariant operators. Therefore, the next result may appear unexpected.
For every let us consider the operator defined by setting equal to the constant function taking everywhere the value , for every (i.e., for any ).
We observe that
is a -operator on , because the strong invariance of the natural pseudo-distance with respect to the group (Remark 4) implies that if and , then , for every .
is non-expansive, because
For every we have that
Indeed, apart from the trivial points on the line , the persistence diagram associated with contains only the point , while the persistence diagram associated with contains only the point . Both the points have the same multiplicity, which equals the (non-null) -th Betti number of .
Setting , we have that
As a consequence, we have that
By applying Theorem 14, we get for every .
If the metric space is compact, then also the metric space is compact.
Since is bounded, Proposition 7 guarantees that the distance is defined. Furthermore, is a metric space, hence it will suffice to prove that it is sequentially compact. Therefore, let us assume that a sequence in is given.
Since is a compact (and hence separable) metric space, we can find a countable and dense subset of . We can extract a subsequence from , such that for every fixed index the sequence converges to a function in with respect to the -norm. (This follows by recalling that for every index , with compact, and by applying a classical diagonalization argument.)
Now, let us consider the operator defined in the following way.
We define on by setting for each .
Then we extend to as follows. For each we choose a sequence in , converging to in , and set . We claim that such a limit exists in and does not depend on the sequence that we have chosen, converging to in . In order to prove that the previous limit exists, we observe that for every
because each operator is non-expansive.
Since the sequence converges to in , it follows that is a Cauchy sequence. The compactness of implies that converges in .
If another sequence is given in , converging to in , then for every index
and the proof goes as in (3.1) with replaced by .
Since both and converge to , it follows that . Therefore the definition of does not depend on the sequence that we have chosen, converging to .
Now we have to prove that , i.e., that verifies the three properties defining this set of operators.
We have already seen that .
For every we can consider two sequences in , converging to and in , respectively. Due to the fact that the operators are non-expansive, we have that