1 Introduction
The increase of available data in both academia and industry have been exponential over the past few decades, making data analysis ubiquitous in many different fields of science. Machine learning has proved to be one of the most prominent field of data science, leading to astounding results in various applications, such as image and signal processing. Topological Data Analysis (TDA)
[CAR09] is one specific field of machine learning, which focuses more on complex rather than big data. The general assumption of TDA is that data is actually sampled from geometric or lowdimensional domains, whose geometric features are relevant to the analysis. These geometric features are usually encoded in a mathematical object called persistence diagram, which is roughly a set of points in the plane, each point representing a topological feature whose size is contained in the coordinates of the point. Persistence diagrams have been proved to bring complementary information to other traditional descriptors in many different applications, often leading to large result improvements. This is also due to the socalled stability properties of the persistence diagrams, which state that persistence diagrams computed on similar data are also very close in the diagram distances [CEH07, BL15, CdG+16].Unfortunately, the use of persistence diagrams in machine learning methods is not straightforward, since many algorithms expect data to be Euclidean vectors, while persistence diagrams are sets of points with possibly different cardinalities. Moreover, the
diagram distances used to compare persistence diagrams are computed with optimal matchings, and thus quite different from Euclidean metrics. The usual way to cope with such difficult data is to use kernel methods. A kernel is a symmetric function on the data whose evaluation on a pair of data points equals the scalar product of the images of these points under a feature map into a Hilbert space, called the Reproducing Kernel Hilbert Space of the kernel. Many algorithms can be kernelized, such as PCA and SVM, allowing one to handle nonEuclidean data as soon as a kernel or a feature map is available.Hence, the question of defining a feature map into a Hilbert space has been intensively studied in the past few years, and, as of today, various methods can be implemented, either into finite or infinite dimensional Hilbert spaces [BUB15, COO15, RHB+15, KFH16, AEK+17, CCO17, HKN+17]. Since persistence diagrams are known to enjoy stability properties, it is also natural to ask the same guarantee for their embeddings. Hence, all feature maps defined in the literature satisfy a stability property stating that the Hilbert distance between the image of the persistence diagrams is upper bounded by the diagram distances. A more difficult question is to prove whether a lower bound also holds or not. Even though one attempt has already been made to show such a lower bound for the socalled Sliced Wasserstein distance in [CCO17], the question remains open in general.
Contributions.
In this article, we tackle the general question of defining biLipschitz embeddings of persistence diagrams into separable Hilbert spaces. More precisely, we show that:

Such a biLipschitz embedding does not exist if the Hilbert space is finite dimensional (Theorem 4.4),
Finally, we also provide experimental evidence of this behavior by computing the metric distortions of various feature maps for persistence diagrams with increasing cardinalities.
Related work.
Feature maps for persistence diagrams can be classified into two different classes, depending whether the corresponding Hilbert space is finite or infinite dimensional.
In the infinite dimensional case, the first attempt was that proposed in [BUB15], in which persistence diagrams are turned into functions, called Landscapes, by computing the homological rank functions given by the persistence diagram points. Another common way to define a feature map is to see the points of the persistence diagrams as centers of Gaussians with a fixed bandwidth, weighted by the distance of the point to the diagonal. This is the approach originally advocated in [RHB+15], and later generalized in [KFH18], leading to the socalled Persistence Scale Space and Persistence Weighted Gaussian feature maps. Another possibility is to define a Gaussianlike feature map by using the Sliced Wasserstein distance between persistence diagrams, which is conditionnally negative definite. This implicit feature map, called the Sliced Wasserstein map, was defined in [CCO17].
In the finite dimensional case, many different possibilities are available. One may consider evaluating a family of tropical polynomials onto the persistence diagram [KAL18], taking the sorted vector of the pairwise distances between the persistence diagram points [COO15], or computing the coefficients of a complex polynomial whose roots are given by the persistence diagram points [DF15]. Another line of work was proposed in [AEK+17]
by discretizing the Persistence Scale Space feature map. The idea is to discretize the plane into a fixed grid, and then compute a value for each pixel by integrating Gaussian functions centered on the persistence diagram points. Finally, persistence diagrams have been incorporated in deep learning frameworks in
[HKN+17], in which Gaussian functions (whose means and variances are optimized by the neural network during training) are integrated against persistence diagrams seen as discrete measures.
2 Background
2.1 Persistence Diagrams
Persistent homology is a technique of TDA coming from topological algebra that allows the user to compute and encode topological information of datasets in a compact descriptor called the persistence diagram. Given a dataset , often given in the form of a point cloud in , and a continuous and realvalued function , the persistence diagram of can be computed under mild conditions (the function has to be tame, see [CdG+16] for more details), and consists in a finite set of points with multiplicities in the upperdiagonal halfplane . This set of points is computed from the family of sublevel sets of , that is the sets of the form , for some . More precisely, persistence diagrams encode the different topological events that occur as increases from to . Such topological events include creation and merging of connected components and cycles in every dimension; see Figure 1. Intuitively, persistent homology records, for each topological feature that appears in the family of sublevel sets, the value at which the feature appears, called the birth value, and the value at which it gets merged or filled in, called the death value. These values are then used as coordinates for a corresponding point in the persistence diagram. Note that several features may have the same birth and death values, so points in the persistence diagram have multiplicities. Moreover, since , these points are always located above the diagonal . A general intuition about persistence diagrams is that the distance of a point to is a direct measure of its relevance: if a point is close to , it means that the corresponding cycle got filled in right after its appearance, thus suggesting that it is likely due to noise in the dataset. On the contrary, points that are far away from represent cycles with a significant life span, and are more likely to be relevant for the analysis. We refer the interested reader to [EH10, OUD15] for more details about persistent homology.
Notation.
Let be the space of persistence diagrams with countable number of points. More formally, can be equivalently defined as a functional space , where each point is a point in the corresponding persistence diagram with multiplicity . Let be the space of persistence diagrams with less than points, i.e., . Let be the space of persistence diagrams included in , i.e., . Finally, let be the space of persistence diagrams with less than points included in , i.e., . Obviously, we have the following sequences of (strict) inclusions: , and .
Diagram distances.
Persistence diagrams can be efficiently compared using the diagram distances, which is a family of distances parametrized by an integer that rely on the computation of partial matchings. Recall that two persistence diagrams and may have different number of points. A partial matching between and is a subset of . It comes along with (resp. ), which is the set of points of (resp. ) that are not matched to a point of (resp. ) by . The cost of is given as:
The diagram distance is then defined as the cost of the best partial matching:
Definition 2.1.
Given two persistence diagrams and , the diagram distance is defined as:
Note that in the literature, these distances are often called the Wasserstein distances between persistence diagrams. Here, we follow the denomination of [CCO17]. In particular, taking a maximum instead of a sum in the definition of the cost,
allows to add one more distance in the family, the bottleneck distance .
Stability.
A useful property of persistence diagrams is stability. Indeed, it is well known in the literature that persistence diagrams computed from close functions are close themselves in the bottleneck distance:
In other words, the map is 1Lipschitz. Note that stability results exist as well for the other diagram distances, but these results are weaker than the above Lipschitz condition, and they require more conditions—see [OUD15].
2.2 BiLipschitz embeddings.
The main question that we adress in this article is the one of preserving the persistence diagram metric properties when using embeddings into Hilbert spaces. For instance, one may ask the images of persistence diagrams under a feature map into a Hilbert space to be stable as well. A natural question is then whether a lower bound also holds, i.e., whether the feature map is a biLipschitz embedding between and .
Definition 2.3.
Let and be two metric spaces. A biLipschitz embedding between and is a map such that there exist constants such that:
for any . The metrics and are called strongly equivalent, and the constants and are called the lower and upper metric distortion bounds respectively. If , is called an isometric embedding.
Note that this definition is equivalent to the commonly used definition that additionally requires .
Remark 2.4.
Finding an isometric embedding of persistence diagrams into a Hilbert space is impossible since geodesics are unique in a Hilbert space while this is not the case for persistence diagrams, as shown in the proof of Proposition 2.4 in [TMM+14].
Remark 2.5.
For feature maps that are bounded, i.e., those maps such that there exists a constant for which for all , it is obviously impossible to find a biLipschitz embedding. This involves for instance the Sliced Wasserstein (SW) feature map [CCO17], which is defined implicitly from a Gaussianlike function. However, note that if the SW feature map is restricted to a set of persistence diagrams which are close to each other with respect to the SW distance, then the distance in the Hilbert space corresponding to the SW feature map is actually equivalent to the square root of the SW distance. Hence, we added the square root of the SW distance in our experiment in Section 5.
3 Mapping into separable Hilbert spaces
In our first main result, we use separability to determine whether a biLipschitz embedding can exist between the space of persistence diagrams and a Hilbert space.
Definition 3.1.
A metric space is called separable if it has a dense countable subset.
For instance, the following three Hilbert spaces (equipped with their canonical metrics) are separable: , and , where is separable. The two following results describe wellknown properties of separable spaces.
Proposition 3.2.
Any subspace of a separable metric space is separable as well.
Proposition 3.3.
Let and be two metric spaces, and assume there is a biLipschitz embedding , with Lipschitz constants and . Then is separable if and only if is separable.
The following lemma shows that for a feature map which is biLipschitz when restricted to , the limits of the corresponding constants can actually be used to study the general metric distortion in .
Lemma 3.4.
Let and let be a metric on persistence diagrams such that is continuous with respect to on . Let
Since is nonincreasing and is nonincreasing with respect to and , we define:
We define , , similarly, since is nondecreasing with respect to and . Then the following inequalities hold:
Note that , , , , and may be equal to or , so it does not necessarily hold that and are strongly equivalent on , or .
Proof.
We only prove the last inequality, since the proof extends verbatim to the other two. Pick any two persistence diagrams . Let be an optimal partial matching achieving , where (resp. ) is either in (resp. ) or in (resp. ). Given , we define two sequences of persistence diagrams and recursively with and:
Let us define
Note that both and are nondecreasing. We have and thus:
(2) 
Assuming ^{3}^{3}3Note that this is always true if . Even though this is not clear if this assumption also holds in the general case, it is satisfied for the spaces of persistence diagrams defined in our subsequent results Lemma 3.6 and Proposition 3.9. , it follows that by continuity of . We finally obtain the desired inequality by letting in (2). ∎
A corollary of the previous results is that even if a feature map taking values in a separable Hilbert space might be biLipschitz when restricted to , the corresponding bounds have to go to 0 or as soon as the domain of the feature map is not separable.
Theorem 3.5.
Let be a feature map defined on a nonseparable subspace of persistence diagrams containing every , i.e., for each . Assume takes values in a separable Hilbert space , and that is biLipschitz on each with constants . Then either or when .
Many feature maps defined in the literature, such as the Persistence Weighted Gaussian feature map [KFH18] or the Landscape feature map [BUB15], actually take value in the separable function space , where is the upper halfplane . Hence, to illustrate how Theorem 3.5 applies to these feature maps, we now provide two lemmata. In the first one, we define a set which is not separable with respect to , and in the second one, we show that is actually included in the domain of these feature maps.
Lemma 3.6.
Consider the sequence of points , and define the set , where is the set of sequences with values in , with: . Then is not separable.
Proof.
First note that since the sequences can have infinite support, the spaces and are not countable.
Let be the equivalence relation on defined with:
where denotes the symmetric difference of sets. Since the set of sequences with finite support is countable, it follows that each equivalence class is countable as well. In particular, this means that the set of equivalence classes is uncountable, since otherwise would be countable as a countable union of countable equivalence classes.
We now prove the result by contradiction. Assume that is separable, and let be the corresponding dense countable subset of . Let . Then for each , there is at least one sequence such that and . We now claim that every such satisfies . Indeed, assume and let . Then, since , we would have
which is not possible. Hence, this means that . However, we showed that is uncountable, meaning that is uncountable as well, which leads to a contradiction since is countable by assumption. ∎
We now show that the Persistence Weighted Gaussian and the Landscape feature maps are welldefined on the set . Let us first formally define these feature maps.
Definition 3.7.
Given , , let be the triangular function defined with if and 0 otherwise. Then, given a persistence diagram , let , where kmax denotes the th largest element. The Landscape feature map is defined as:
Definition 3.8.
Let be a weight function and . The Persistence Weighted Gaussian feature map is defined as:
Proposition 3.9.
Let be the weight function . Let be the set of persistence diagrams defined in Lemma 3.6. Then:
Proof.
Let be the sequence defined with if and otherwise. To show the desired result, it suffices to show that and are Cauchy sequences in . Let , and let us study for each feature map.

Case . We have the following inequalities:
The result simply follows from the fact that is convergent and Cauchy.

Case . Since all triangular functions, as defined in Definition 3.7, have disjoint support, it follows that the only nonzero lambda function is , where is a triangular function defined with if and 0 otherwise. See Figure 2.
Hence, we have the following inequalities:
Again, the result follows from the fact that is convergent and Cauchy. ∎
Proposition 3.9 shows that Theorem 3.5 applies (with the metric between persistence diagrams) to the Persistence Weighted Gaussian feature map with weight function —actually, any weight function that is equivalent to when goes to 0—and the Landscape feature map. In particular, any lower bound for these maps has to go to 0 when since an upper bound exists for these maps due to their stability properties—see Corollary 15 in [BUB15] and Proposition 3.4 in [KFH18].
4 Mapping into finitedimensional Hilbert spaces
In our second main result, we show that more can be said about feature maps into (equipped with the Euclidean metric), using the socalled Assouad dimension. This involves all vectorization methods for persistence diagrams that we described in the related work.
Assouad dimension.
The following definition and example are taken from paragraph 10.13 of [HEI01].
Definition 4.1.
Let be a metric space. Given a subset and , let be the least number of open balls of radius less than or equal to that can cover . The Assouad dimension of is:
Intuitively, the Assouad dimension measures the number of open balls needed to cover an open ball of larger radius. For example, the Assouad dimension of is . Moreover, the Assouad dimension is preserved by biLipschitz embeddings.
Proposition 4.2 (Lemma 9.6 in [Rob10]).
Let and be metric spaces with a biLipschitz embedding . Then .
Nonembeddability.
We now show that cannot be embedded into with biLipschitz embeddings. The proof of this fact is a consequence of the following lemma:
Lemma 4.3.
Let , , and . Then .
Proof.
Let denote an open ball with . We want to show that, for any and , it is possible to find a persistence diagram , a radius and a factor such that the number of open balls of radius at most needed to cover is strictly larger than . To this end, we pick arbitrary and . The idea of the proof is to define as the empty diagram, and to derive a lower bound on the number of balls with radius needed to cover by considering persistence diagrams with one point evenly distributed on the line such that the distance between two consecutive points is in the distance. Indeed, the pairwise distance between any two such persistence diagrams is sufficiently large so that they must belong to different balls. Then we can control the number of persistence diagrams, and thus the number of balls, by taking sufficiently small.
More formally, let . We want to show that we have at least balls in the cover, meaning that . Let and . We define a cover of with open balls of radius less than centered on a family as follows:
(3) 
We now define particular persistence diagrams which all lie in different elements of the cover (3). For any , we let denote the persistence diagram containing only the point . It is clear that each is in . See Figure 3.
Moreover, since , it also follows that .
Hence, according to (3), for each there exists an integer such that . Finally, note that . Indeed, assuming that there are such that , and since the distance between and is always obtained by matching their points to the diagonal, we reach a contradiction with the following application of the triangle inequality:
This observation shows that there are at least different open balls in the cover (3), which concludes the proof. ∎
Theorem 4.4.
Let and . Then, for any and , there is no biLipschitz embedding between and .
Interestingly, the integers and are independent in Theorem 4.4: even if one restricts to persistence diagrams with only one point, it is still impossible to find a biLipschitz embedding into , whatever is.
5 Experiments
In this section, we illustrate our main results by computing the lower metric distortion bounds for the main stable feature maps in the literature. We use persistence diagrams with increasing number of points to experimentally observe the convergence of this bound to 0, as described in Theorem 3.5. More precisely, we generate 100 persistence diagrams for each cardinality in a range going from 10 to 1000 by uniformly sampling points in the unit upper halfsquare . See Figure 4 for an illustration.
Then, we consider the following feature maps:
Since most of these feature maps enjoy stability properties with respect to the first diagram distance , we compute the ratios between the metrics in the Hilbert spaces corresponding to these feature maps and . Moreover, we also look at the ratio induced by the square root of the Sliced Wasserstein distance (SW) [CCO17], as suggested by Remark 2.5. All feature maps were computed with the sklearntda library^{4}^{4}4https://github.com/MathieuCarriere/sklearn_tda, which uses Hera^{5}^{5}5https://bitbucket.org/grey_narn/hera [KMN17] as backend to compute the first diagram distances between pairs of persistence diagrams. These ratios are then displayed as boxplots in Figure 5.
It is clear from Figure 5 that the extreme values of these ratios (the upper tail of the ratio distributions) increase with the cardinality of the persistence diagrams, as expected from Theorem 3.5. This is especially interesting in the case of the Sliced Wasserstein distance since the question whether the lower bound that was proved in [CCO17], which increases with the number of points in the diagrams, was tight or not, i.e., if a lower bound which is oblivious to the number of points could be derived, is still open. Hence, it seems from Figure 5 that this is not the case empirically. It is also interesting to notice that the divergence speed of these ratios differ from a feature map to another. More precisely, it seems like the metric distortion bounds increase linearly with the cardinalities for the TV and LS feature maps and the Sliced Wasserstein distance, while it is increasing at a much lower speed for the other feature maps.
6 Conclusion
In this article, we provided two important theoretical results about the embedding of persistence diagrams in separable Hilbert spaces, which is a common technique in TDA to feed machine learning algorithms with persistence diagrams. Indeed, most of the recent attempts have defined feature maps for persistence diagrams into Hilbert spaces and showed these maps were stable with respect to the first diagram distance, and conjectured whether a lower bound holds as well or not. In this work, we proved that this is never the case if the Hilbert space is finite dimensional, and that such a lower bound has to go to zero with the number of points for most other feature maps in the literature. We also provided experiments that confirm this result, by showing a clear increase of the metric distortion with the number of points for persistence diagrams generated uniformly in the unit upper halfsquare.
References
 [AEK+17] (2017) Persistence Images: A Stable Vector Representation of Persistent Homology. Journal of Machine Learning Research 18 (8), pp. 1–35. Cited by: §1, §1, 4th item.
 [BL15] (2015) Induced matchings and the algebraic stability of persistence barcodes. Journal of Computational Geometry 6 (2), pp. 162–191. Cited by: §1.
 [BUB15] (2015) Statistical Topological Data Analysis using Persistence Landscapes. Journal of Machine Learning Research 16, pp. 77–102. Cited by: §1, §1, §3, §3, 3rd item.
 [CAR09] (2009) Topology and data. Bulletin of the American Mathematical Society 46, pp. 255–308. Cited by: §1.
 [CCO17] (2017) Sliced Wasserstein Kernel for Persistence Diagrams. In Proceedings of the 34th International Conference on Machine Learning, Cited by: §1, §1, §2.1, Remark 2.5, §5, §5.
 [COO15] (2015) Stable Topological Signatures for Points on 3D Shapes. Computer Graphics Forum 34. Cited by: §1, §1, 5th item.
 [CdG+16] (2016) The Structure and Stability of Persistence Modules. Springer. Cited by: §1, §2.1, Theorem 2.2.
 [CEH07] (2007) Stability of Persistence Diagrams. Discrete and Computational Geometry 37 (1), pp. 103–120. Cited by: §1, Theorem 2.2.
 [DF15] (2015) Comparing persistence diagrams through complex vectors. In Image Analysis and Processing — ICIAP 2015, pp. 294–305. Cited by: §1.
 [EH10] (2010) Computational Topology: an introduction. AMS Bookstore. Cited by: §2.1.
 [HEI01] (2001) Lectures on Analysis on Metric Spaces. Springer. Cited by: §4.
 [HKN+17] (2017) Deep Learning with Topological Signatures. In Advances in Neural Information Processing Systems 30, pp. 1633–1643. Cited by: §1, §1.
 [KAL18] (2018) Tropical coordinates on the space of persistence barcodes. Foundations of Computational Mathematics. Cited by: §1.
 [KMN17] (201709) Geometry helps to compare persistence diagrams. Journal of Experimental Algorithmics 22, pp. 1.4:1–1.4:20. Cited by: §5.
 [KFH16] (2016) Persistence Weighted Gaussian Kernel for Topological Data Analysis. In Proceedings of the 33rd International Conference on Machine Learning, pp. 2004–2013. Cited by: §1.
 [KFH18] (2018) Kernel method for persistence diagrams via kernel embedding and weight factor. Journal of Machine Learning Research 18 (189), pp. 1–41. Cited by: §1, §3, §3, 1st item.
 [OUD15] (2015) Persistence Theory: From Quiver Representations to Data Analysis. Mathematical Surveys and Monographs, American Mathematical Society. Cited by: §2.1, §2.1.
 [RHB+14] (2014) A Stable MultiScale Kernel for Topological Machine Learning. CoRR abs/1412.6821. Cited by: 1st item, 1st item.

[RHB+15]
(2015)
A Stable MultiScale Kernel for Topological Machine Learning.
In
IEEE Conference on Computer Vision and Pattern Recognition
, Cited by: §1, §1, 2nd item.  [ROB10] (2010) Dimensions, Embeddings, and Attractors. Cambridge Tracts in Mathematics, Vol. 186, Cambridge University Press. Cited by: Proposition 4.2.
 [TMM+14] (2014) Fréchet Means for Distributions of Persistence Diagrams. Discrete and Computational Geometry 52 (1), pp. 44–70. Cited by: Remark 2.4.
Comments
There are no comments yet.