The increase of available data in both academia and industry have been exponential over the past few decades, making data analysis ubiquitous in many different fields of science. Machine learning has proved to be one of the most prominent field of data science, leading to astounding results in various applications, such as image and signal processing. Topological Data Analysis (TDA)[Carlsson09a] is one specific field of machine learning, which focuses more on complex rather than big data. The general assumption of TDA is that data is actually sampled from geometric or low-dimensional domains, whose geometric features are relevant to the analysis. These geometric features are usually encoded in a mathematical object called persistence diagram, which is roughly a set of points in the plane, each point representing a topological feature whose size is contained in the coordinates of the point. Persistence diagrams have been proved to bring complementary information to other traditional descriptors in many different applications, often leading to large result improvements. This is also due to the so-called stability properties of the persistence diagrams, which state that persistence diagrams computed on similar data are also very close in the diagram distances [Cohen07, Bauer13b, Chazal16a].
Unfortunately, the use of persistence diagrams in machine learning methods is not straightforward, since many algorithms expect data to be Euclidean vectors, while persistence diagrams are sets of points with possibly different cardinalities. Moreover, thediagram distances used to compare persistence diagrams are computed with optimal matchings, and thus quite different from Euclidean metrics. The usual way to cope with such difficult data is to use kernel methods. A kernel is a symmetric function on the data whose evaluation on a pair of data points equals the scalar product of the images of these points under a feature map into a Hilbert space, called the Reproducing Kernel Hilbert Space of the kernel. Many algorithms can be kernelized, such as PCA and SVM, allowing one to handle non-Euclidean data as soon as a kernel or a feature map is available.
Hence, the question of defining a feature map into a Hilbert space has been intensively studied in the past few years, and, as of today, various methods can be implemented, either into finite or infinite dimensional Hilbert spaces [Bubenik15, Carriere15a, Reininghaus15, Kusano16, Adams17, Carriere17e, Hofer17]. Since persistence diagrams are known to enjoy stability properties, it is also natural to ask the same guarantee for their embeddings. Hence, all feature maps defined in the literature satisfy a stability property stating that the Hilbert distance between the image of the persistence diagrams is upper bounded by the diagram distances. A more difficult question is to prove whether a lower bound also holds or not. Even though one attempt has already been made to show such a lower bound for the so-called Sliced Wasserstein distance in [Carriere17e], the question remains open in general.
In this article, we tackle the general question of defining bi-Lipschitz embeddings of persistence diagrams into separable Hilbert spaces. More precisely, we show that:
Such a bi-Lipschitz embedding does not exist if the Hilbert space is finite dimensional (Theorem 4.4),
Finally, we also provide experimental evidence of this behavior by computing the metric distortions of various feature maps for persistence diagrams with increasing cardinalities.
Feature maps for persistence diagrams can be classified into two different classes, depending whether the corresponding Hilbert space is finite or infinite dimensional.
In the infinite dimensional case, the first attempt was that proposed in [Bubenik15], in which persistence diagrams are turned into functions, called Landscapes, by computing the homological rank functions given by the persistence diagram points. Another common way to define a feature map is to see the points of the persistence diagrams as centers of Gaussians with a fixed bandwidth, weighted by the distance of the point to the diagonal. This is the approach originally advocated in [Reininghaus15], and later generalized in [Kusano17], leading to the so-called Persistence Scale Space and Persistence Weighted Gaussian feature maps. Another possibility is to define a Gaussian-like feature map by using the Sliced Wasserstein distance between persistence diagrams, which is conditionnally negative definite. This implicit feature map, called the Sliced Wasserstein map, was defined in [Carriere17e].
In the finite dimensional case, many different possibilities are available. One may consider evaluating a family of tropical polynomials onto the persistence diagram [Verovsek16], taking the sorted vector of the pairwise distances between the persistence diagram points [Carriere15a], or computing the coefficients of a complex polynomial whose roots are given by the persistence diagram points [diFabio15]. Another line of work was proposed in [Adams17]
by discretizing the Persistence Scale Space feature map. The idea is to discretize the plane into a fixed grid, and then compute a value for each pixel by integrating Gaussian functions centered on the persistence diagram points. Finally, persistence diagrams have been incorporated in deep learning frameworks in[Hofer17]
2.1 Persistence Diagrams
Persistent homology is a technique of TDA coming from topological algebra that allows the user to compute and encode topological information of datasets in a compact descriptor called the persistence diagram. Given a dataset , often given in the form of a point cloud in , and a continuous and real-valued function , the persistence diagram of can be computed under mild conditions (the function has to be tame, see [Chazal16a] for more details), and consists in a finite set of points with multiplicities in the upper-diagonal half-plane . This set of points is computed from the family of sublevel sets of , that is the sets of the form , for some . More precisely, persistence diagrams encode the different topological events that occur as increases from to . Such topological events include creation and merging of connected components and cycles in every dimension; see Figure 1. Intuitively, persistent homology records, for each topological feature that appears in the family of sublevel sets, the value at which the feature appears, called the birth value, and the value at which it gets merged or filled in, called the death value. These values are then used as coordinates for a corresponding point in the persistence diagram. Note that several features may have the same birth and death values, so points in the persistence diagram have multiplicities. Moreover, since , these points are always located above the diagonal . A general intuition about persistence diagrams is that the distance of a point to is a direct measure of its relevance: if a point is close to , it means that the corresponding cycle got filled in right after its appearance, thus suggesting that it is likely due to noise in the dataset. On the contrary, points that are far away from represent cycles with a significant life span, and are more likely to be relevant for the analysis. We refer the interested reader to [Edelsbrunner10, Oudot15] for more details about persistent homology.
Let be the space of persistence diagrams with countable number of points. More formally, can be equivalently defined as a functional space , where each point is a point in the corresponding persistence diagram with multiplicity . Let be the space of persistence diagrams with less than points, i.e., . Let be the space of persistence diagrams included in , i.e., . Finally, let be the space of persistence diagrams with less than points included in , i.e., . Obviously, we have the following sequences of (strict) inclusions: , and .
Persistence diagrams can be efficiently compared using the diagram distances, which is a family of distances parametrized by an integer that rely on the computation of partial matchings. Recall that two persistence diagrams and may have different number of points. A partial matching between and is a subset of . It comes along with (resp. ), which is the set of points of (resp. ) that are not matched to a point of (resp. ) by . The -cost of is given as:
The -diagram distance is then defined as the cost of the best partial matching:
Given two persistence diagrams and , the -diagram distance is defined as:
Note that in the literature, these distances are often called the Wasserstein distances between persistence diagrams. Here, we follow the denomination of [Carriere17e]. In particular, taking a maximum instead of a sum in the definition of the cost,
allows to add one more distance in the family, the bottleneck distance .
A useful property of persistence diagrams is stability. Indeed, it is well known in the literature that persistence diagrams computed from close functions are close themselves in the bottleneck distance:
Theorem 2.2 ([Cohen07, Chazal16a]).
Given two tame functions , one has the following inequality:
In other words, the map is 1-Lipschitz. Note that stability results exist as well for the other diagram distances, but these results are weaker than the above Lipschitz condition, and they require more conditions—see [Oudot15].
2.2 Bi-Lipschitz embeddings.
The main question that we adress in this article is the one of preserving the persistence diagram metric properties when using embeddings into Hilbert spaces. For instance, one may ask the images of persistence diagrams under a feature map into a Hilbert space to be stable as well. A natural question is then whether a lower bound also holds, i.e., whether the feature map is a bi-Lipschitz embedding between and .
Let and be two metric spaces. A bi-Lipschitz embedding between and is a map such that there exist constants such that:
for any . The metrics and are called strongly equivalent, and the constants and are called the lower and upper metric distortion bounds respectively. If , is called an isometric embedding.
Note that this definition is equivalent to the commonly used definition that additionally requires .
Finding an isometric embedding of persistence diagrams into a Hilbert space is impossible since geodesics are unique in a Hilbert space while this is not the case for persistence diagrams, as shown in the proof of Proposition 2.4 in [Turner14].
For feature maps that are bounded, i.e., those maps such that there exists a constant for which for all , it is obviously impossible to find a bi-Lipschitz embedding. This involves for instance the Sliced Wasserstein (SW) feature map [Carriere17e], which is defined implicitly from a Gaussian-like function. However, note that if the SW feature map is restricted to a set of persistence diagrams which are close to each other with respect to the SW distance, then the distance in the Hilbert space corresponding to the SW feature map is actually equivalent to the square root of the SW distance. Hence, we added the square root of the SW distance in our experiment in Section 5.
3 Mapping into separable Hilbert spaces
In our first main result, we use separability to determine whether a bi-Lipschitz embedding can exist between the space of persistence diagrams and a Hilbert space.
A metric space is called separable if it has a dense countable subset.
For instance, the following three Hilbert spaces (equipped with their canonical metrics) are separable: , and , where is separable. The two following results describe well-known properties of separable spaces.
Any subspace of a separable metric space is separable as well.
Let and be two metric spaces, and assume there is a bi-Lipschitz embedding , with Lipschitz constants and . Then is separable if and only if is separable.
The following lemma shows that for a feature map which is bi-Lipschitz when restricted to , the limits of the corresponding constants can actually be used to study the general metric distortion in .
Let and let be a metric on persistence diagrams such that is continuous with respect to on . Let
Since is nonincreasing and is nonincreasing with respect to and , we define:
We define , , similarly, since is nondecreasing with respect to and . Then the following inequalities hold:
Note that , , , , and may be equal to or , so it does not necessarily hold that and are strongly equivalent on , or .
We only prove the last inequality, since the proof extends verbatim to the other two. Pick any two persistence diagrams . Let be an optimal partial matching achieving , where (resp. ) is either in (resp. ) or in (resp. ). Given , we define two sequences of persistence diagrams and recursively with and:
Let us define
Note that both and are nondecreasing. We have and thus:
Now, since when , we have by continuity of , and similarly . Hence, we have and with the triangle inequality. We finally obtain the desired inequality by letting in (2). ∎
A corollary of the previous results is that even if a feature map taking values in a separable Hilbert space might be bi-Lipschitz when restricted to , the corresponding bounds have to go to 0 or as soon as the domain of the feature map is not separable.
Let be a feature map defined on a non-separable subspace of persistence diagrams containing every , i.e., for each . Assume takes values in a separable Hilbert space , and that is bi-Lipschitz on each with constants . Then either or when .
Many feature maps defined in the literature, such as the Persistence Weighted Gaussian feature map [Kusano17] or the Landscape feature map [Bubenik15], actually take value in the separable function space , where is the upper half-plane . Hence, to illustrate how Theorem 3.5 applies to these feature maps, we now provide two lemmata. In the first one, we define a set which is not separable with respect to , and in the second one, we show that is actually included in the domain of these feature maps.
Consider the sequence of points , and define the set , where is the set of sequences with values in , with: . Then is not separable.
First note that since the sequences can have infinite support, the spaces and are not countable.
Let be the equivalence relation on defined with:
where denotes the symmetric difference of sets. Since the set of sequences with finite support is countable, it follows that each equivalence class is countable as well. In particular, this means that the set of equivalence classes is uncountable, since otherwise would be countable as a countable union of countable equivalence classes.
We now prove the result by contradiction. Assume that is separable, and let be the corresponding dense countable subset of . Let . Then for each , there is at least one sequence such that and . We now claim that every such satisfies . Indeed, assume and let . Then, since , we would have
which is not possible. Hence, this means that . However, we showed that is uncountable, meaning that is uncountable as well, which leads to a contradiction since is countable by assumption. ∎
We now show that the Persistence Weighted Gaussian and the Landscape feature maps are well-defined on the set . Let us first formally define these feature maps.
Given , , let be the triangular function defined with if and 0 otherwise. Then, given a persistence diagram , let , where kmax denotes the -th largest element. The Landscape feature map is defined as:
Let be a weight function and . The Persistence Weighted Gaussian feature map is defined as:
Let be the weight function . Let be the set of persistence diagrams defined in Lemma 3.6. Then:
Let be the sequence defined with if and otherwise. To show the desired result, it suffices to show that and are Cauchy sequences in . Let , and let us study for each feature map.
Case . We have the following inequalities:
The result simply follows from the fact that is convergent and Cauchy.
Case . Since all triangular functions, as defined in Definition 3.7, have disjoint support, it follows that the only non-zero lambda function is , where is a triangular function defined with if and 0 otherwise. See Figure 2.
Hence, we have the following inequalities:
Again, the result follows from the fact that is convergent and Cauchy. ∎
Proposition 3.9 shows that Theorem 3.5 applies (with the metric between persistence diagrams) to the Persistence Weighted Gaussian feature map with weight function —actually, any weight function that is equivalent to when goes to 0—and the Landscape feature map. In particular, any lower bound for these maps has to go to 0 when since an upper bound exists for these maps due to their stability properties—see Corollary 15 in [Bubenik15] and Proposition 3.4 in [Kusano17].
4 Mapping into finite-dimensional Hilbert spaces
In our second main result, we show that more can be said about feature maps into (equipped with the Euclidean metric), using the so-called Assouad dimension. This involves all vectorization methods for persistence diagrams that we described in the related work.
The following definition and example are taken from paragraph 10.13 of [Heinonen01].
Let be a metric space. Given a subset and , let be the least number of open balls of radius less than or equal to that can cover . The Assouad dimension of is:
Intuitively, the Assouad dimension measures the number of open balls needed to cover an open ball of larger radius. For example, the Assouad dimension of is . Moreover, the Assouad dimension is preserved by bi-Lipschitz embeddings.
Proposition 4.2 (Lemma 9.6 in [Robinson10]).
Let and be metric spaces with a bi-Lipschitz embedding . Then .
We now show that cannot be embedded into with bi-Lipschitz embeddings. The proof of this fact is a consequence of the following lemma:
Let , , and . Then .
Let denote an open ball with . We want to show that, for any and , it is possible to find a persistence diagram , a radius and a factor such that the number of open balls of radius at most needed to cover is strictly larger than . To this end, we pick arbitrary and . The idea of the proof is to define as the empty diagram, and to derive a lower bound on the number of balls with radius needed to cover by considering persistence diagrams with one point evenly distributed on the line such that the distance between two consecutive points is in the -distance. Indeed, the pairwise distance between any two such persistence diagrams is sufficiently large so that they must belong to different balls. Then we can control the number of persistence diagrams, and thus the number of balls, by taking sufficiently small.
More formally, let . We want to show that we have at least balls in the cover, meaning that . Let and . We define a cover of with open balls of radius less than centered on a family as follows:
We now define particular persistence diagrams which all lie in different elements of the cover (3). For any , we let denote the persistence diagram containing only the point . It is clear that each is in . See Figure 3.
Moreover, since , it also follows that .
Hence, according to (3), for each there exists an integer such that . Finally, note that . Indeed, assuming that there are such that , and since the distance between and is always obtained by matching their points to the diagonal, we reach a contradiction with the following application of the triangle inequality:
This observation shows that there are at least different open balls in the cover (3), which concludes the proof. ∎
Let and . Then, for any and , there is no bi-Lipschitz embedding between and .
Interestingly, the integers and are independent in Theorem 4.4: even if one restricts to persistence diagrams with only one point, it is still impossible to find a bi-Lipschitz embedding into , whatever is.
In this section, we illustrate our main results by computing the lower metric distortion bounds for the main stable feature maps in the literature. We use persistence diagrams with increasing number of points to experimentally observe the convergence of this bound to 0, as described in Theorem 3.5. More precisely, we generate 100 persistence diagrams for each cardinality in a range going from 10 to 1000 by uniformly sampling points in the unit upper half-square . See Figure 4 for an illustration.
Then, we consider the following feature maps:
the Persistence Weighted Gaussian with unit bandwidth (PWG) [Kusano17],
the Persistence Scale Space with unit bandwidth (PSS) [Reininghaus15],
the Landscape (LS) [Bubenik15],
the Persistence Image with resolution 10 x 10 and unit bandwidth (IM) [Adams17]
the Topological Vector with 10 dimensions (TV) [Carriere15a],
Since most of these feature maps enjoy stability properties with respect to the first diagram distance , we compute the ratios between the metrics in the Hilbert spaces corresponding to these feature maps and . Moreover, we also look at the ratio induced by the square root of the Sliced Wasserstein distance (SW) [Carriere17e], as suggested by Remark 2.5. All feature maps were computed with the sklearn-tda library333https://github.com/MathieuCarriere/sklearn_tda, which uses Hera444https://bitbucket.org/grey_narn/hera [Kerber17] as backend to compute the first diagram distances between pairs of persistence diagrams. These ratios are then displayed as boxplots in Figure 5.
It is clear from Figure 5 that the extreme values of these ratios (the upper tail of the ratio distributions) increase with the cardinality of the persistence diagrams, as expected from Theorem 3.5. This is especially interesting in the case of the Sliced Wasserstein distance since the question whether the lower bound that was proved in [Carriere17e], which increases with the number of points in the diagrams, was tight or not, i.e., if a lower bound which is oblivious to the number of points could be derived, is still open. Hence, it seems from Figure 5 that this is not the case empirically. It is also interesting to notice that the divergence speed of these ratios differ from a feature map to another. More precisely, it seems like the metric distortion bounds increase linearly with the cardinalities for the TV and LS feature maps and the Sliced Wasserstein distance, while it is increasing at a much lower speed for the other feature maps.
In this article, we provided two important theoretical results about the embedding of persistence diagrams in separable Hilbert spaces, which is a common technique in TDA to feed machine learning algorithms with persistence diagrams. Indeed, most of the recent attempts have defined feature maps for persistence diagrams into Hilbert spaces and showed these maps were stable with respect to the first diagram distance, and conjectured whether a lower bound holds as well or not. In this work, we proved that this is never the case if the Hilbert space is finite dimensional, and that such a lower bound has to go to zero with the number of points for most other feature maps in the literature. We also provided experiments that confirm this result, by showing a clear increase of the metric distortion with the number of points for persistence diagrams generated uniformly in the unit upper half-square.