# Combining persistent homology and invariance groups for shape comparison

In many applications concerning the comparison of data expressed by R^m-valued functions defined on a topological space X, the invariance with respect to a given group G of self-homeomorphisms of X is required. While persistent homology is quite efficient in the topological and qualitative comparison of this kind of data when the invariance group G is the group Homeo(X) of all self-homeomorphisms of X, this theory is not tailored to manage the case in which G is a proper subgroup of Homeo(X), and its invariance appears too general for several tasks. This paper proposes a way to adapt persistent homology in order to get invariance just with respect to a given group of self-homeomorphisms of X. The main idea consists in a dual approach, based on considering the set of all G-invariant non-expanding operators defined on the space of the admissible filtering functions on X. Some theoretical results concerning this approach are proven and two experiments are presented. An experiment illustrates the application of the proposed technique to compare 1D-signals, when the invariance is expressed by the group of affinities, the group of orientation-preserving affinities, the group of isometries, the group of translations and the identity group. Another experiment shows how our technique can be used for image comparison.

## Authors

• 9 publications
• 2 publications
• ### G-invariant Persistent Homology

Classical persistent homology is a powerful mathematical tool for shape ...
12/04/2012 ∙ by Patrizio Frosini, et al. ∙ 0

• ### Position paper: Towards an observer-oriented theory of shape comparison

In this position paper we suggest a possible metric approach to shape co...
03/07/2016 ∙ by Patrizio Frosini, et al. ∙ 0

• ### On the law of the iterated logarithm and strong invariance principles in computational geometry

We study the law of the iterated logarithm (Khinchin (1933), Kolmogorov ...
02/22/2020 ∙ by Johannes Krebs, et al. ∙ 0

• ### On the law of the iterated logarithm and strong invariance principles in stochastic geometry

We study the law of the iterated logarithm (Khinchin (1933), Kolmogorov ...
02/22/2020 ∙ by Johannes Krebs, et al. ∙ 0

• ### Towards a topological-geometrical theory of group equivariant non-expansive operators for data analysis and machine learning

The aim of this paper is to provide a general mathematical framework for...
12/31/2018 ∙ by Mattia G. Bergomi, et al. ∙ 0

• ### Provably Strict Generalisation Benefit for Invariance in Kernel Methods

It is a commonly held belief that enforcing invariance improves generali...
06/04/2021 ∙ by Bryn Elesedy, et al. ∙ 10

• ### Conceptualization of Object Compositions Using Persistent Homology

A topological shape analysis is proposed and utilized to learn concepts ...
03/06/2018 ∙ by Christian A. Mueller, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Persistent topology consists in the study of the properties of filtered topological spaces. From the very beginning, it has been applied to shape comparison Fr91 ; UrVe97 ; VeUr96 ; VeUrFr93 . In this context, data are frequently represented by continuous -valued functions defined on a topological space . As simple examples among many others, these functions can describe the coloring of a 3D object, the coordinates of the points in a planar curve, or the grey-levels in a x-ray CT image. Each continuous function is called a filtering function and naturally induces a (multi)filtration on , made by the sublevel sets of . Persistent topology allows to analyse the data represented by each filtering function by examining how much the topological properties of its sublevel sets “persist” when we go through the filtration. The main mathematical tool to perform this analysis is given by persistent homology EdHa08 . This theory describes the birth and death of -dimensional holes when we move along the considered filtration of the space . When the filtering function takes its values in we can look at it as a time, and the distance between the birthdate and deathdate of a hole is defined to be its persistence. The more persistent is a hole, the more important it is for shape comparison, since holes with small persistence are usually due to noise.

An important property of classical persistent homology consists in the fact that if a self-homeomorphism is given, then the filtering functions cannot be distinguished from each other by computing the persistent homology of the filtrations induced by and . As pointed out in ReKoGu11 , this is a relevant issue in the applications where the functions cannot be considered equivalent. This happens, e.g., when each filtering function describes a grey-level image, since the images respectively described by and may have completely different appearances. A simple instance of this problem is illustrated in Figure 1.

Therefore, a natural question arises: How can we adapt persistent homology in order to prevent invariance with respect to the group of all self-homeomorphisms of the topological space , maintaining just the invariance under the action of the self-homeomorphisms that belong to a proper subgroup of ? For example, the comparison of the letters illustrated in Figure 1 should require just the invariance with respect to the group of similarities of , since they all are equivalent with respect to the group . We point out that depicted letters are constructed from thick lines and therefore have some width in opposite to the concept of geometrical lines.

One could think of solving the previous problem by using other filtering functions, possibly defined on different topological spaces. For example, we could extract the boundaries of the letters in Figure 1 and consider the distance from the center of mass of each boundary as a new filtering function. This approach presents some drawbacks:

1. It “forgets” most of the information contained in the image that we are considering, confining itself to examine the boundary of the letter represented by . If the boundary is computed by taking a single level of , this is also in contrast with the general spirit of persistent homology.

2. It usually requires an extra computational cost (e.g., to extract the boundaries of the letters in our previous example).

3. It can produce a different topological space for each new filtering function (e.g., the letters of the alphabet can have non-homeomorphic boundaries). Working with several topological spaces instead of just one can be a disadvantage.

4. It is not clear how we can translate the invariance that we need into the choice of new filtering functions defined on new topological spaces.

The purpose of this paper is to present a possible solution for the previously described problem. It is based on a dual approach to the invariance with respect to a subgroup of , and consists in changing the direct study of the group into the study of how the operators that are invariant under the action of act on classical persistent homology. This change of perspective reveals interesting mathematical properties, allowing to treat as a variable in our applications. According to this method, the shape properties and the invariance group can be determined separately, depending on our task. The operators that we consider in this paper act on the space of admissible filtering functions and, in some sense, can be interpreted as the “glasses” we use to look at the data. Their use allows to combine persistent homology and the invariance with respect to the group , extending the range of application of classical persistent homology to the cases in which we are interested in -invariance rather than in -invariance.

The idea of applying operators to filtering functions before computing persistent homology has been already considered in previous papers. For example, in ChEd11 convolutions have been used to get a bound for the norm of persistence diagrams of a diffusing function. Furthermore, in ReKoGu11 scale space persistence has been shown useful to detect critical points of a function by examining the evolution of their homological persistence values through the scale space. As for combining persistent homology and transformation groups, the interest in measuring the invariance of a signal with respect to a group of translations (i.e. the study of its periodicity or quasi-periodicity) has been studied in deSkVe12 ; PeHa* , using embedding operators. However, our approach requires to consider just a particular kind of operators (i.e. non-expanding -invariant operators on the set of admissible filtering functions), and faces the more general problem of adapting persistent homology to any group of self-homeomorphisms of a topological space.

For another approach to this problem, using quite a different method, we refer the reader to Fr12 .

### 1.1 Our main idea in a nutshell

After choosing a set of admissible filtering functions from the topological space to , and a subgroup of , we consider the set of all non-expanding -invariant operators . Basically, our idea consists in comparing two functions by computing the supremum of the bottleneck distances between the classical persistence diagrams of the filtering functions and , varying in . In our paper we prove that this approach is well-defined, -invariant, stable and computable (under suitable assumptions).

### 1.2 Outline of the paper

Our paper is organized as follows. In Section 2 we introduce some concepts that will be used in the paper and recall some basic facts about persistent homology. In Section 3 we prove our main results concerning the theoretical properties of our method (Theorems 14, 15 and 16). In Section 4 we illustrate the application of our technique to an experiment concerning 1D-signals. In Section 5 a possible application to image retrieval is outlined. A short discussion concludes the paper.

## 2 Mathematical setting

Let us consider a (non-empty) triangulable metric space with nontrivial homology in degree . This last assumption is always satisfied for and unrestrictive for , since we can embed in a larger triangulable space with nontrivial homology in degree , and substitute with . Let be the set of all continuous functions from to , endowed with the topology induced by the sup-norm . Let be a topological subspace of , containing at least the set of all constant functions. The functions in the topological space will be called admissible filtering functions on .

We assume that a subgroup of the group of all homeomorphisms from onto is given, acting on the set by composition on the right (i.e., the action of takes each function to the function ). We do not require to be a proper subgroup of , so the equality can possibly hold. It is easy to check that is a topological group with respect to the topology of uniform convergence. Indeed, we can check that if two sequences converge to and in , respectively, then the sequence converges to in . Furthermore, if a sequence converge to in , then the sequence converges to in .

We also notice that if two sequences in and are given, converging to in and to in , respectively, we have that . Since converges uniformly to in and is uniformly continuous on the compact space , . Moreover, , due to the invariance of the sup-norm under composition of the function inside the norm with homeomorphisms. Since converges uniformly to in , . Hence and .

Therefore, the right action of on the set is continuous.

If is a subset of , the set will be denoted by the symbol . Obviously, .

We can consider the natural pseudo-distance on the space (cf. FrMu99 ; DoFr04 ; DoFr07 ; DoFr09 ; CaFaLa* ):

###### Definition 1

The pseudo-distance is defined by setting

 dG(φ1,φ2)=infg∈Gmaxx∈X|φ1(x)−φ2(g(x))|.

It is called the (-dimensional) natural pseudo-distance associated with the group acting on .

The term “-dimensional” refers to the fact that the filtering functions are real-valued. The concepts considered in this paper can be easily extended to the case of -valued filtering functions, by substituting the absolute value in with the max-norm in . However, the use of -valued filtering functions would require the introduction of a technical machinery that is beyond the purposes of our research (cf., e.g., CeDFFe13 ), in order to adapt the bottleneck distance to the new setting. Therefore, for the sake of simplicity, in this paper we will just consider the -dimensional case.

We observe that the max-norm distance on , defined by setting is just the natural pseudo-distance in the case that is the trivial group , containing only the identity homeomorphism and acting on . Moreover, the definition of immediately implies that if and are subgroups of acting on and , then for every . As a consequence, the following double inequality holds, for every subgroup of and every (see also Theorem 5.2 in CeDFFe13 ):

 dHomeo(X)(φ1,φ2)≤dG(φ1,φ2)≤d∞(φ1,φ2).
###### Remark 2

The proof that is a pseudo-metric does use the assumption that is a group, and we can give a simple example of a subset of for which the function is not a pseudo-distance on . In order to do that, let us set , and consider the set containing just the identity and the counterclockwise rotation of radians. Obviously, is a subset, but not a subgroup of . We have that (because ) and (because ), but

 μS(sinθ,−sinθ)=min{∥sinθ−(−sinθ)∥∞,∥sinθ−(−sin(ρ(θ)))∥∞}=∥sinθ+cosθ∥∞=√2.

Therefore the triangular inequality does not hold, so that is not a pseudo-distance on .

The rationale of using the natural pseudo-distance is that pattern recognition is usually based on comparing properties that are described by functions defined on a topological space. These properties are often the only accessible data, implying that every discrimination should be based on them. The fundamental assumption is that two objects cannot be distinguished if they share the same properties with respect to a given observer (cf.

BiDFFa08 ).

In order to proceed, we consider the set of all operators that verify the following properties:

1. is a function from to ;

2. for every and every ;

3. for every (i.e. is non-expansive).

Obviously, is not empty, since it contains at least the identity operator.

Properties 1 and 2 show that is a -operator, referring to the right action of on .

###### Remark 3

The operators that we are considering are not required to be linear. However, due to the non-expansivity property, the operators in are -Lipschitz and hence are continuous.

In this paper, we shall say that a pseudo-metric on is strongly -invariant if it is invariant under the action of with respect to each variable, i.e., if for every and every .

###### Remark 4

It is easily seen that the natural pseudo-distance is strongly -invariant.

###### Example 5

Take , equal to the group of all rotations of , and equal to the set of all continuous functions from to . As an example of an operator in we can consider the operator defined by setting for every and every , where denotes the point obtained from by rotating of a fixed angle . It is easy to check that is a non-expansive -invariant (linear) operator defined on . An example of a non-expansive -invariant non-linear operator defined on is given by the operator defined by setting for every and every .

This simple statement holds (the symbol denotes the function taking the value everywhere):

###### Proposition 6

for every and every .

###### Proof

, since is non-expansive.

If is a subset of and is bounded with respect to , then we can consider the function

 dF(F1,F2):=supφ∈Φ∥F1(φ)−F2(φ)∥∞

from to .

###### Proposition 7

If is a non-empty subset of and is bounded then the function is a distance on .

See Appendix B.

###### Remark 8

The in the definition of cannot be replaced with . As an example, consider the case , , equal to the group containing just the identity and the homeomorphism taking each point to , equal to the constant function taking everywhere the value , and equal to the constant function taking everywhere the value . Both and are non-expansive -operators. We have that , but no function exists, such that . To prove this, we firstly observe that

 1≥maxF1(φ)=minF1(φ)=maxφ≥maxF2(φ)=minF2(φ)=∫10φ(x) dx≥0

for any .

Obviously,

 dF(F1,F2)=supφ∈Φ∣∣∣maxφ−∫10φ(x) dx∣∣∣=supφ∈Φ(maxφ−∫10φ(x) dx)≤1.

Let us consider a sequence of continuous functions , such that and . We have that

 ∥F1(φi)−F2(φi)∥∞=∣∣∣maxφi−∫10φi(x) dx∣∣∣=maxφi−∫10φi(x) dx≥1−1/i

so that . Hence .

In order to have , the equality should hold. This is clearly impossible, hence no function exists, such that .

### 2.1 Persistent homology

Before proceeding, we recall some basic definitions and facts in persistent homology. For a more detailed and formal treatment, we refer the interested reader to EdHa08 ; BiDFFa08 ; CaZo09 ; ChCo*09 . Roughly speaking, persistent homology describes the changes of the homology groups of the sub-level sets varying in , where is a real-valued continuous function defined on a topological space . The parameter can be seen as an increasing time, whose change produces the birth and death of -dimensional holes in the sub-level set . For , the expression “-dimensional holes” refers to connected components, tunnels and voids, respectively. The distance between the birthdate and deathdate of a hole is defined to be its persistence. The more persistent is a hole, the more important it is for shape comparison, since holes with small persistence are usually due to noise.

Persistent homology can be introduced in several different settings, including the one of simplicial complexes and simplicial homology, and the one of topological spaces and singular homology. As for the link between the discrete and the topological settings, we refer the interested reader to CaEtFr13 ; DFFr13 . In this paper we will consider the topological setting and the singular homology functor . An elementary introduction to singular homology can be found in Ha02 .

The concept of persistence can be formalized by the definition of persistent homology group with respect to the function :

###### Definition 9

If and , we can consider the inclusion of into . Such an inclusion induces a homomorphism between the homology groups of and in degree . The group is called the -th persistent homology group with respect to the function , computed at the point . The rank of this group is said the -th persistent Betti number function with respect to the function , computed at the point .

###### Remark 10

It is easy to check that the persistent homology groups (and hence also the persistent Betti number functions) are invariant under the action of . For further discussion see Appendix A.

A classical way to describe persistent Betti number functions (up to subsets of measure zero of their domain) is given by persistence diagrams. Another equivalent description is given by barcodes (cf. CaZo09 ). The -th persistence diagram is the set of all pairs , where and are the birthdate and the deathdate of the -th -dimensional hole, respectively. When a hole never dies, we set its deathdate equal to . For technical reasons, the points are added to each persistent diagram. Two persistence diagrams can be compared by computing the maximum movement of their points that is necessary to change into , measured with respect to the maximum norm. This metric naturally induces a pseudo-metric on the sets of the persistent Betti number functions. We recall that a pseudo-metric is just a metric without the property assuring that if two points have a null distance then they must coincide. For a formal definition of persistence diagram and of the distance (named bottleneck distance) that is used to compare persistence diagrams, we refer the reader to EdHa08 . For more details about the existence of pairs of different persistent Betti number functions that are associated with the same persistent diagram, we refer the interested reader to CeDFFe13 .

A key property of the distance is its stability with respect to and , stated in the following result.

###### Theorem 11

If is a natural number and , then

 dmatch(rk(φ1),rk(φ2))≤dHomeo(X)(φ1,φ2)≤d∞(φ1,φ2).

The proof of the inequality in Theorem 11 can be found in CoEdHa07 (Main Theorem) for the case of tame filtering functions and in CeDFFe13 (Theorem 3.13) for the general case of continuous functions. The statement of Theorem 11 easily follows from the definition of (see Theorem 5.2 in CeDFFe13 ). Theorem 11 also shows that the natural pseudo-distance allows to obtain a stability result for persistence diagrams that is better than the classical one, involving . Figure 2 illustrates this fact, displaying two filtering functions such that .

### 2.2 Strongly G-invariant comparison of filtering functions via persistent homology

Let us fix a non-empty subset of . For every fixed , we can consider the following pseudo-metric on :

 DF,kmatch(φ1,φ2):=supF∈Fdmatch(rk(F(φ1)),rk(F(φ2)))

for every , where denotes the -th persistent Betti number function with respect to the function . We will usually omit the index , when its value is clear from the context or not influential.

###### Proposition 12

is a strongly -invariant pseudo-metric on .

###### Proof

Theorem 11 and the non-expansivity of every imply that

 dmatch(rk(F(φ1)),rk(F(φ2)))≤∥F(φ1)−F(φ2)∥∞≤∥φ1−φ2∥∞.

Therefore is a pseudo-metric, since it is the supremum of a family of pseudo-metrics that are bounded at each pair . Moreover, for every and every

 DFmatch(φ1,φ2∘g):=supF∈Fdmatch(rk(F(φ1)),rk(F(φ2∘g)))=supF∈Fdmatch(rk(F(φ1)),rk(F(φ2)∘g))=supF∈Fdmatch(rk(F(φ1)),rk(F(φ2)))=DFmatch(φ1,φ2)

because of Property 2 in the definition of and the invariance of persistent homology under the action of homeomorphisms (Remark 10). Due to the fact that the function is symmetric, this is sufficient to guarantee that is strongly -invariant.

### 2.3 Approximating DFmatch

A method to approximate is given by the next proposition.

###### Proposition 13

Assume bounded. Let be a finite subset of . If for every at least one index exists, such that , then

 ∣∣DF∗match(φ1,φ2)−DFmatch(φ1,φ2)∣∣≤2ϵ

for every .

###### Proof

Let us assume and . Because of the definition of , for any we have that and . Hence

 dmatch(rk(Fi(φ1)),rk(F(φ1)))≤ϵ\ and\ dmatch(rk(Fi(φ2)),rk(F(φ2)))≤ϵ

because of the stability of persistent homology (Theorem 11). It follows that

The thesis of our proposition immediately follows from the definitions of and .

Therefore, if we can cover by a finite set of balls of radius , centered at points of , the approximation of can be reduced to the computation of the maximum of a finite set of bottleneck distances between persistence diagrams, which are well-known to be computable by means of efficient algorithms.

This fact leads us to study the properties of the topological space . We will do that in the next section.

## 3 Main theoretical results

We start by proving that the pseudo-metric is stable with respect to both the natural pseudo-distance associated with the group and the sup-norm.

If , then .

###### Proof

For every , every and every , we have that

 dmatch(rk(F(φ1)),rk(F(φ2)))=dmatch(rk(F(φ1)),rk(F(φ2)∘g))=dmatch(rk(F(φ1)),rk(F(φ2∘g)))≤∥F(φ1)−F(φ2∘g)∥∞≤∥φ1−φ2∘g∥∞.

The first equality follows from the invariance of persistent homology under the action of (Remark 10), and the second equality follows from the fact that is a -operator. The first inequality follows from the stability of persistent homology (Theorem 11), while the second inequality follows from the non-expansivity of .

It follows that, if , then for every and every

 DFmatch(φ1,φ2)≤∥φ1−φ2∘g∥∞.

Hence,

 DFmatch(φ1,φ2)≤infg∈G∥φ1−φ2∘g∥∞≤∥φ1−φ2∥∞=d∞(φ1,φ2)

for every .

The natural pseudo-distance and the pseudo-distance are defined in completely different ways. The former is based on a variational approach involving the set of all homeomorphisms in , while the latter refers only to a comparison of persistent homologies depending on a family of -invariant operators. Therefore, the next result may appear unexpected.

.

###### Proof

For every let us consider the operator defined by setting equal to the constant function taking everywhere the value , for every (i.e., for any ).

We observe that

i)

is a -operator on , because the strong invariance of the natural pseudo-distance with respect to the group (Remark 4) implies that if and , then , for every .

ii)

is non-expansive, because
.

Therefore, .

For every we have that

 dmatch(rk(Fψ(φ1)),rk(Fψ(φ2)))=|dG(φ1,ψ)−dG(φ2,ψ)|.

Indeed, apart from the trivial points on the line , the persistence diagram associated with contains only the point , while the persistence diagram associated with contains only the point . Both the points have the same multiplicity, which equals the (non-null) -th Betti number of .

Setting , we have that

 dmatch(rk(Fφ2(φ1)),rk(Fφ2(φ2)))=dG(φ1,φ2).

As a consequence, we have that

 DFmatch(φ1,φ2)≥dG(φ1,φ2).

By applying Theorem 14, we get for every .

The following two results (Theorem 16 and Corollary 18) hold, when the metric space is compact.

###### Theorem 16

If the metric space is compact, then also the metric space is compact.

###### Proof

Since is bounded, Proposition 7 guarantees that the distance is defined. Furthermore, is a metric space, hence it will suffice to prove that it is sequentially compact. Therefore, let us assume that a sequence in is given.

Since is a compact (and hence separable) metric space, we can find a countable and dense subset of . We can extract a subsequence from , such that for every fixed index the sequence converges to a function in with respect to the -norm. (This follows by recalling that for every index , with compact, and by applying a classical diagonalization argument.)

Now, let us consider the operator defined in the following way.

We define on by setting for each .

Then we extend to as follows. For each we choose a sequence in , converging to in , and set . We claim that such a limit exists in and does not depend on the sequence that we have chosen, converging to in . In order to prove that the previous limit exists, we observe that for every

 ∥∥¯F(φjr)−¯F(φjs)∥∥∞=∥∥∥limh→∞(Fih(φjr))−limh→∞(Fih(φjs))∥∥∥∞=limh→∞∥∥Fih(φjr)−Fih(φjs)∥∥∞≤limh→∞∥∥φjr−φjs∥∥∞=∥∥φjr−φjs∥∥∞ (3.1)

because each operator is non-expansive.

Since the sequence converges to in , it follows that is a Cauchy sequence. The compactness of implies that converges in .

If another sequence is given in , converging to in , then for every index

 ∥∥¯F(φjr)−¯F(φkr)∥∥∞≤∥∥φjr−φkr∥∥∞

and the proof goes as in (3.1) with replaced by .

Since both and converge to , it follows that . Therefore the definition of does not depend on the sequence that we have chosen, converging to .

Now we have to prove that , i.e., that verifies the three properties defining this set of operators.

We have already seen that .

For every we can consider two sequences in , converging to and in , respectively. Due to the fact that the operators are non-expansive, we have that

 ∥∥¯F(φ)−¯F(φ′)∥∥∞=∥∥limr→∞¯F(φjr)−limr→∞¯F(φkr)∥∥∞=∥∥∥limr→∞limh→∞(Fih(φjr))−limr→∞limh→∞(Fi