1 Introduction
Motivation
Modern data often arrives in complex forms that complicate their analysis. For example, highdimensional data cannot be visualized directly, whereas relational data such as graphs lack the natural vectorized structure required by various machine learning models
(Bhagat et al., 2011; Kazemi and Poole, 2018; Goyal and Ferrara, 2018). Representation learning aims to derive mathematically and computationally convenient representations to process and learn from such data. However, obtaining an effective representation is often challenging, for example, due to the accumulation of noise in highdimensional biological expression data (Vandaele et al., 2021). In other examples such as community detection in social networks, graph embeddings struggle to clearly separate communities due to the few interconnections between them. In such cases, expert prior knowledge of the topological model may improve learning from, visualizing, and interpreting the data. Unfortunately, a general tool for incorporating prior topological knowledge in representation learning is lacking.In this paper, we introduce such tool under the name of topological regularization. Here, we build on the recently developed differentiation frameworks for optimizing data to capture topological properties of interest (Gabrielsson et al., 2020; Solomon et al., 2021; Carriere et al., 2021). Unfortunately, such topological optimization has been poorly studied within the context of representation learning. For example, the used topological losses are indifferent to any structure other than topological, such as neighborhood information, which may be useful for learning. Therefore, topological optimization often destructs natural and informative properties of the data in favor of the topological loss.
Our proposed method of topological regularization effectively resolves this by learning an embedding representation that incorporates the topological prior
. As we will see in this paper, these priors can be directly postulated through topological loss functions. For example, if the prior is that the data lies on a circular model, we design a loss function that is lower whenever a more prominent cycle is present in the embedding. By extending the previously suggested topological losses to fit a wider set of models, we show that topological regularization effectively embeds data according to a variety of topological priors, ranging from clusters, cycles, and flares, to any combination of these.
Related Work
Certain methods that incorporate topological information into representation learning have already been developed. For example, Deep Embedded Clustering (Xie et al., 2016)
simultaneously learns feature representations and cluster assignments using deep neural networks. Constrained embeddings of Euclidean data on spheres have also been studied by
Bai et al. (2015). However, such methods often require an extensive development for one particular kind of input data and topological model. Contrary to this, incorporating topological optimization into representation learning provides a simple yet versatile approach towards combining data embedding with topological priors, that generalizes well to any input data as long as the output is a point cloud embedding.Topological autoencoders
(Moor et al., 2020) already combine topological optimization with a data embedding procedure. The main difference here is that the topological information used for optimization is obtained from the original highdimensional data, and not passed as a prior. While this may sound as a major advantage—and certainly can be as shown by Moor et al. (2020)—obtaining such topological information heavily relies on distances between observations, which are often meaningless and unstable in high dimensions (Aggarwal et al., 2001). Furthermore, certain constructions such as the filtration obtained from the Delanauy triangulation—which we will use extensively and is further discussed in Appendix A—are expensive to obtain from highdimensional data (Cignoni et al., 1998), and are best computed from the lowdimensional embedding.Contributions
We include a sufficient background on persistent homology—the main tool behind topological optimization—in Appendix A (note that all of its concepts important for this paper are summarized in Figure 1). We summarize the previous idea behind topological optimization of point clouds (Section 2.1). We also introduce a new set of losses to model a wider variety of models in a natural manner (Section 2.2), which can be used to topologically regularize embeddings, for which the result—not necessarily the input—is a point cloud (Section 2). We include experiments on synthetic and real data that show the usefulness and versatility of topological regularization, and provide additional insights into the performance of data embedding methods (Section 3). We discuss open problems in topological representation learning and conclude on our work in Section (4).
2 Methods
The main purpose of this paper is to present a method to incorporate prior topological knowledge in a point cloud embedding (dimensionality reduction, graph embedding, …) of a data set . As will become clear below, these topological priors can be directly postulated through topological loss functions . Then, the goal is to find an embedding that that minimizes a total loss
(1) 
where is a loss that aims to preserve structural attributes of the original data, and controls the strength of topological regularization. Note that, itself is not required to be a point cloud, or reside in the same space as , which is especially useful for representation learning.
In this section, we mainly focus on topological optimization of point clouds, that is, the loss . The basic idea behind this recently introduced method—as presented by Gabrielsson et al. (2020)—is illustrated in Section 2.1. We also show that direct topological optimization may neglect important structural information such as neighborhoods, which can effectively be resolved through (1). Hence, as we will also see in Section 3, while representation learning may benefit from topological losses for incorporating prior topological knowledge, topological optimization itself may also benefit from other structural losses as to represent the topological prior in a more truthful manner. Nevertheless, some topological models remain difficult to represent in a natural manner through topological optimization. Therefore, we introduce a new set of topological losses, and provide an overview of how different topological models can be postulated through them in Section 2.2. Experiments with and comparisons to topological regularization of embeddings through (1) will be presented in Section 3.
2.1 Background on Topological Optimization of Point Clouds
Topological optimization is performed through a topological loss function evaluated on the persistence diagram(s) of the data (Carlsson, 2009). These diagrams—obtained through a method termed persistent homology and further discussed in Appendix A—summarize all from the finest to coarsest topological holes (connected components, cycles, voids, …) in the data, as illustrated in Figure 1.
While methods that learn from persistent homology are now both welldeveloped and diverse (Pun et al., 2018), optimizing the data representation for the persistent homology thereof has only been gaining recent attention (Gabrielsson et al., 2020; Solomon et al., 2021; Carriere et al., 2021). Persistent homology has a rather abstract mathematical foundation within the field of algebraic topology (Hatcher, 2002), and its computation is inherently combinatorial (Zomorodian and Carlsson, 2005). This complicates working with usual derivatives for optimization. To accommodate for this, topological optimization makes use of Clarke subderivatives (Clarke, 1990), whose applicability to persistence builds on arguments from ominimal geometry (van den Dries, 1998; Carriere et al., 2021). Fortunately, thanks to the recent work of Gabrielsson et al. (2020) and Carriere et al. (2021)
, powerful tools for topological optimization have been developed for software libraries such as PyTorch and TensorFlow, allowing their application without deeper knowledge of these mathematical subjects.
Mathematically, topological optimization optimizes the data representation with respect to the topological information summarized by its persistence diagram(s) . We will use the same approach by Gabrielsson et al. (2020), where all (birth, death) tuples in are first ordered according to decreasing persistence . The points with (these are usually plotted on top of the diagram, such as in Figure 0(b),) form the essential part of . The points with finite coordinates form the regular part of . For , , and functions , , we can now define a topological loss function
(2) 
It turns out that for many useful definitions of and , has a welldefined Clarke subdifferential with respect to the parameters defining the filtration from which the persistence diagram is obtained. In this paper, we will consistently use the filtration as shown in Figure 0(a) (see Appendix A for its formal definition), and these parameters are entire point clouds of size in the dimensional Euclidean space. can then be easily optimized with respect to these parameters through standard stochastic subgradient algorithms (Carriere et al., 2021).
Within this entire paper, we only use the regular part of the diagram (this coincides with letting ), and let be proportional to the persistence function. By having ordered the points by persistence, is now a function of persistence on , i.e., it is invariant to permutations of the points in (Carriere et al., 2021). The factor of proportionality indicates whether we want to minimize () or maximize () persistence, i.e, the prominence of the topological hole, or thus, how well clusters, cycles, …, are (not) represented. The topological loss function in (2) then reduces to
(3) 
Here, the data matrix (in this paper the embedding) defines the diagram through persistent homology of the filtration of , and a persistence (topological hole) dimension to optimize for.
For example, consider (3) with , , , restricted to 0dimensional persistence (measuring the prominence of connected components) of the filtration. Figure 2 shows the data from Figure 1
optimized for this loss function for various epochs. The optimized point cloud quickly resembles a single connected component for smaller numbers of epochs. This is the single goal of the loss (
3), which neglects all other structural structural properties of the data such as its underlying cycles (e.g., the circular hole in the ‘R’) or local neighborhoods. Larger numbers of epochs mainly affect the scale of the data. While this scale has an absolute effect on the total persistence, the point cloud visually represents a single connected topological component equally well. We also observe that while local neighborhoods are preserved well during the first epochs simply by nature of the topological optimization procedure, they are increasingly distorted for a larger number of epochs.2.2 Newly Proposed Topological Loss Functions
In this paper, the prior topological knowledge incorporated into the point cloud data matrix embedding is directly postulated through a topological loss function. For example, letting be the 0dimensional persistence diagram of , and choosing , , and in (3), corresponds to the criterion that should represents one closely connected component, as illustrated in Figure 2. Therefore, we often regard a topological loss as a topological prior, and vice versa.
Unfortunately, although persistent homology effectively measures the prominence of topological holes, topological optimization is often ineffective for representing such holes in a natural manner. An extreme example of this are clusters, despite the fact that they are captured through the simplest form of persistence, i.e., 0dimensional. This is shown in Figure 3, where we sampled data
from two Gaussian distributions centered at different means in
(Figure 2(a)). Optimizing the point cloud for (at least) two clusters can be done by defining as in (3), letting be the 0dimensional persistence diagram of , , and . However, we observe that topological optimization simply displaces one single point away from all other points (Figure 2(b)). Note that purely topologically, this is indeed a correct representation of two clusters.To encourage more natural holes, we propose to conduct the topological optimization for the loss
(4) 
where is defined as in (3). In practice, during each optimization iteration, is approximated by the mean of evaluated over random samples of . The idea behind this approach is that a topological model that is naturally present in the data should be represented well by many subsets of the data. Figure 3 shows the result for a sampling fraction and . The new data representation visualizes the clusters already well and far more naturally. An added benefit of the new loss (4) is that topological optimization can be conducted significantly faster for reasonably lower , as the filtration and persistent homology are evaluated on smaller samples.
In summary, various topological priors can now be formulated through topological losses as follows.
dimensional holes
Optimizing for dimensional holes ( for clusters), can generally be done through (3) or (4), by letting be the corresponding dimensional persistence diagram. The terms and in the summation are used to express how many holes one exactly, at least, or at most wants. Finally, can be chosen to either decrease () or increase () persistence.
Flares
Persistent homology is invariant to certain topological changes. For example, both a linear ‘I’structured model and a bifurcating ‘Y’structured model consist of one connected component, and no higherdimensional holes. These models are indistinguishable based on the (persistent) homology thereof, even though they are topologically different in terms of their singular points.
Capturing such additional topological phenomena is possible through a refinement of persistent homology under the name of functional persistence, also well discussed and illustrated by Carlsson (2014). The idea is that instead of evaluating persistent homology on a data matrix , we evaluate it on a subset for a well chosen function
and hyperparameter
.Inspired by this approach, for a diagram of a point cloud , we propose the topological loss
(5) 
where is a realvalued function on , possibly dependent on —which changes during optimization—itself, a hyperparameter, and is an ordinary topological loss as defined by (3). In particular, we will focus on the case where equals a scaled centrality measure on :
(6) 
For , . For sufficiently small , evaluates on the points ‘far away’ from the center of . As we will see in the experiments below, this is especially useful in conjunction with 0dimensional persistence to optimize for flares in the point cloud representation.
Combinations
Naturally, through linear combination of loss functions, different topological priors can be combined, e.g., if we want the represented model to both be connected and include a cycle.
3 Experiments
In this section, we show how our proposed topological regularization of data embeddings (1) leads to a powerful and versatile approach for representation learning. In particular, we show that

embeddings benefit from prior topological knowledge through topological regularization;

conversely, topological optimization may also benefit from incorporating structural information as captured through embedding losses, leading to more qualitative representations;

subsequent learning tasks may benefit from expert prior topological knowledge.
In Section 3.1, we show how topological regularization improves standard PCA dimensionality reduction and allows better understanding of its performance when noise is accumulated over many dimensions. In Section 3.2, we present applications to highdimensional single cell trajectory data and graph embedding. Quantitative results are discussed in Section 3.3.
Topological optimization was performed in Pytorch, using code adapted from Gabrielsson et al. (2020). Appendix B discusses a supplementary graph embedding experiment where we embed the Harry Potter network according to a circular prior. Data sizes, hyperparameters, losses, and optimization times are summarized in Tables 2 & 2. All code for this project is available on https://dropbox.com/sh/2n1z9fnh436869e/AAC5LMKIxi7CiCCwILAPgBXDa?dl=0.
3.1 Synthetic Data
We sampled points uniformly from the unit circle in . We then added 500dimensional noise to the resulting data matrix , where the noise in each dimension is sampled uniformly from . Since the additional noisy features are irrelevant to the topological (circular) model, an ideal projection embedding is its restriction to its first two data coordinates (Figure 3(a)).
However, it is probabilistically unlikely that that the irrelevant features will have a zero contribution to a PCA embedding of the data (Figure 3(b)). Measuring the feature importance of each feature as the sum of its two absolute contributions (the loadings) to the projection, we observe that most of the 498 irrelevant features have a small nonzero effect on the PCA embedding (Figure 5). Intuitively, each added feature slightly shifts the projection plane away from the plane spanned by the first two coordinates. As a result, the circular hole is less prominent in the PCA embedding of the data.
We can regularize this embedding using a topological loss function measuring the persistence of the most prominent 1dimensional hole in the embedding ( in (3)). For a simple Pytorch compatible implementation, we used , as to minimize the reconstruction error between and its linear projection obtained through . To this, we added the loss , where is used to encourage orthonormality of the matrix to be optimized, initialized with the PCAloadings. The resulting embedding is shown in Figure 3(d), which better captures the circular hole (with ). Furthermore, we see that irrelevant features now more often contribute less to the embedding according to (Figure 5).
3.2 Real Data
Circular Cell Trajectory Data
We considered a single cell trajectory data set of 264 cells in a 6812dimensional gene expression space (Cannoodt et al., 2018; Saelens et al., 2019). The ground truth model—which can be considered a snapshot of the cells at a fixed time—is a circular model connecting three distinct cell groups through cell differentiation. It has been shown by Vandaele (2020) that real single cell data with such models are difficult to embed in a circular manner.
To explore this, we repeated the experiment with the same losses as in Section 3.1 on this data, where the (expected) topological loss is now modified through (4) with , and . From Figure 5(a), we see that while the ordinary PCA embedding does somehow respect the positioning of the cell groups (marked by their color), it indeed struggles to embed the data in a manner that visualizes the present circular hole. However, as shown in Figure 5(c), by topologically regularizing the embedding we are able to embed the data much better in a circular manner ().
Bifurcating Cell Trajectory Data
We considered a second cell trajectory data set of 154 cells in a 1770dimensional expression space (Cannoodt et al., 2018). The ground truth here is a bifurcating model connecting four different cell groups through cell differentiation. However, this time we used the UMAP loss for the embeddings. We used a topological loss , where measures the total (sum of) finite 0dimensional persistence in the embedding to encourage connectedness of the representation, and is as in (5), measuring the persistence of the third most prominent 0dimensional hole in , where is as in (6). Thus, is used to optimize for a ‘flare’ with (at least) three clusters away from the embedding mean. We observe that while the ordinary UMAP embedding is more ‘blobby’ (Figure 6(a)), the topologically regularized embedding is more constrained towards a connected bifurcating shape (Figure 6(c)).
For comparison, we conducted topological optimization for the loss of the initialized UMAP embedding without the UMAP embedding loss. The resulting embedding is now more fragmented (Figure 6(b)). We thus see that topological optimization may also benefit from the embedding loss.
Graph Embedding
The topological loss in (1) can be evaluated on any embedding, and does not require a point cloud as original input. We can thus use topological regularization for embedding a graph , to learn a representation of the nodes of in that well respects properties of .
To explore this, we considered the Karate network (Zachary, 1977), a well known and studied network within graph mining that consists of two different communities. The communities are represented by two key figures (John A. and Mr. Hi), as shown in Figure 7(a). To embed the graph, we used a DeepWalk variant adapted from Dagar et al. (2020). While the ordinary DeepWalk embedding (Figure 7(b)) well respects the ordering of points according to their communities, the two communities remained close to each other. We thus regularized this embedding using the topological loss as defined by (4), where measures the persistence of the second most prominent 0dimensional hole, and , . The resulting embedding (Figure 7(d)) now nearly perfectly separates the two ground truth communities present in the graph.
Topological optimization of the initialized DeepWalk embedding with the same topological loss but without the DeepWalk loss creates some natural community structure, but also results in a few outliers (Figure
7(c)). Thus, although our introduced loss (4) enables more natural topological modeling to some extent, we again observe that using this in conjunction with embedding losses, i.e., our proposed method of topological regularization, leads to the best qualitative results.data  size  method  lr  epochs  w/o top  with top  

Synthetic Cycle  PCA  1e1  500  1e1  1s  5s  
Cell Cycle  PCA  5e4  1000  1e2  1s  35s  
Cell Bifurcating  UMAP  1e1  100  1e1  1s  8s  
Karate  DeepWalk  1e2  50  5e1  29s  29s  
Harry Potter  InnerProd  1e1  100  1e1  36s  34s 
data  top. loss function  dimension of hole  

Synthetic Cycle  1  gray N/A  gray N/A  
Cell Cycle  1  0.25  1  
Cell Bifurcating  0  0  gray N/A  gray N/A  
Karate  0  0.25  10  
Harry Potter  1  gray N/A  gray N/A 
3.3 Quantitative Evaluation
Table 4 summarizes the embedding and topological losses we obtained for the ordinary embeddings, the topologically optimized embeddings (initialized with the ordinary embeddings, but not using the embedding loss), as well as for the topologically regularized embeddings. As one would expect, topological regularization balances the embedding losses between the embedding losses of the ordinary and topologically optimized embeddings. More interestingly, topological regularization may actually result in a more optimal, i.e., lower topological loss than topological optimization only, here in particular for the synthetic cycle data and Harry Potter graph. This suggest that combining topological information with other structural information may facilitate convergence to the correct embedding model, as we also qualitatively confirmed for these data sets (see also Appendix B). We also observe that there are more significant differences in the obtained topological losses than in the embedding losses with and without regularization. This suggests that the optimum region for the embedding loss may be somewhat flat with respect to the corresponding region for the topological loss. Thus, slight shifts in the local embedding optimum, e.g., as caused by noise, may result in much worse topological embedding models, which can be resolved through topological regularization.
data  embedding loss  topological loss  

ord. emb.  top. opt.  top. reg.  ord. emb.  top. opt.  top. reg.  
Synthetic Cycle  
Cell Cycle  
Cell Bifurcating  
Karate  gray N/A  
Harry Potter 
data  metric  ord. emb.  top. opt.  top. reg. 

Synthetic Cycle  
Cell Cycle  accuracy  
Cell Bifurcating  accuracy  
Karate  accuracy 
We also evaluated the quality of the embedding visualizations presented in this section, by assessing how informative they are for predicting the ground data truth labels. For the Synthetic Cycle data, these labels are the 2D coordinates of the noisefree data on the unit circle in
, and we used a multiouput support vector regressor model. For the cell trajectory data and Karate network, we used the ground truth cell groupings and community assignments, respectively, and a support vector machine model. All points in the 2D embeddings were then split into 90% points for training and 10% for testing. Consecutively, we used 5fold crossvalidation on the training data to tune the regularization hyperparameter
. All other settings were the default from scikitlearn. The performance of the final tuned and trained model was then evaluated on the test data, through thecoefficient of determination for the regression problem, and the accuracy for all classification problems. Finally, we repeated this entire experiment 100 times. The averaged test performance metrics and their standard deviations are summarized in Table
4. From this, we observe that topological regularization consistently leads to the more informative visualization embeddings.4 Discussion and Conclusion
We proposed a new approach for representation learning under the name of topological regularization, which builds on the recently developed differentiation frameworks for topological optimization. This led to a versatile and effective way for embedding data according to expert prior topological knowledge, directly postulated through (some newly introduced) topological loss functions.
A clear limitation of topological regularization is that expert prior topological knowledge is not always available. How to select the best out of a list of topological priors is thus open to further research. Furthermore, designing topological loss functions currently requires some understanding of persistent homology, and it may be useful to study how to facilitate that design process for lay users. From a foundational perspective, our work provides new research opportunities into extending the developed theory for topological optimization (Carriere et al., 2021) to our newly introduced losses and their integration into data embeddings. Finally, topological optimization based on combinatorial structures other than the complex may be of both theoretical and practical interest. For example, point cloud optimization based on graphapproximations such as the minimum spanning tree (Vandaele et al., 2021), or varying the functional threshold in the loss (5) alongside the filtration time (Chazal et al., 2009), may lead to new topological loss functions with fewer hyperparameters.
Nevertheless, through our approach, we already provided new and important insights into the performance of embedding methods, such as their potential inability to converge to the correct topological model due to the flatness of the embedding loss near its (local) optimum, with respect to the topological loss. Furthermore, we quantitatively showed that including prior topological knowledge provides a promising way to improve consecutive—even nontopological—learning tasks. In conclusion, topological regularization enables both improving and better understanding representation learning methods, for which we provided and thoroughly illustrated the first directions in this paper.
References
 On the surprising behavior of distance metrics in high dimensional space. In International conference on database theory, pp. 420–434. Cited by: §1.
 Constrained best euclidean distance embedding on a sphere: a matrix optimization approach. SIAM Journal on Optimization 25 (1), pp. 439–467. Cited by: §1.
 Node classification in social networks. In Social network data analytics, pp. 115–148. Cited by: §1.
 Singlecell omics datasets containing a trajectory. Zenodo. External Links: Link Cited by: §3.2, §3.2.
 Topology and data. Bulletin of the American Mathematical Society 46 (2), pp. 255–308. External Links: ISSN 02730979 Cited by: §2.1.

Topological pattern recognition for point cloud data
. Acta Numerica 23, pp. 289–368. Cited by: §2.2.  Optimizing persistent homology based functions. In International Conference on Machine Learning, pp. 1294–1303. Cited by: §1, §2.1, §2.1, §2.1, §4.
 Gromovhausdorff stable signatures for shapes using persistence. In Computer Graphics Forum, Vol. 28, pp. 1393–1403. Cited by: §4.
 DeWall: a fast divide and conquer delaunay triangulation algorithm in ed. ComputerAided Design 30 (5), pp. 333–341. Cited by: §1.
 Optimization and nonsmooth analysis. SIAM. Cited by: §2.1.
 graph_nets. GitHub. External Links: Link Cited by: §3.2.

A topology layer for machine learning.
In
International Conference on Artificial Intelligence and Statistics
, pp. 1553–1563. Cited by: §1, §2.1, §2.1, §2, §3.  Graph embedding techniques, applications, and performance: a survey. KnowledgeBased Systems 151, pp. 78–94. Cited by: §1.
 Algebraic topology. Cambridge University Press. External Links: ISBN 0521795400 Cited by: §2.1.

Simple embedding for link prediction in knowledge graphs
. In Advances in neural information processing systems, pp. 4284–4295. Cited by: §1.  Topological autoencoders. In International conference on machine learning, pp. 7045–7054. Cited by: §1.

A roadmap for the computation of persistent homology.
EPJ Data Science
6 (1), pp. 17. External Links: ISSN 21931127 Cited by: Appendix A.  Persistenthomologybased machine learning and its applications–a survey. arXiv preprint arXiv:1811.00252. Cited by: §2.1.
 Neural collaborative filtering vs. matrix factorization revisited. In Fourteenth ACM Conference on Recommender Systems, pp. 240–248. Cited by: Appendix B.
 A comparison of singlecell trajectory inference methods. Nature Biotechnology 37, pp. 1. Cited by: §3.2.
 A fast and robust method for global topological functional optimization. In International Conference on Artificial Intelligence and Statistics, pp. 109–117. Cited by: §1, §2.1.
 GUDHI user and reference manual. 3.4.1 edition, GUDHI Editorial Board. External Links: Link Cited by: Appendix A.
 Tame topology and ominimal structures. Vol. 248, Cambridge university press. Cited by: §2.1.
 Stable topological signatures for metric trees through graph approximations. Pattern Recognition Letters 147, pp. 85–92. Cited by: §1, §4.
 Mining topological structure in graphs through forest representations. Journal of Machine Learning Research 21 (215), pp. 1–68. Cited by: Appendix B, Appendix B, Appendix B.
 Topological data analysis of metric graphs for evaluating cell trajectory data representations. Master’s Thesis, Universiteit Gent. Faculteit Wetenschappen, Ghent University, (eng). Cited by: §3.2.

Unsupervised deep embedding for clustering analysis
. In International conference on machine learning, pp. 478–487. Cited by: §1.  An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33, pp. 452–473. Cited by: §3.2.
 Computing Persistent Homology. Discrete & Computational Geometry 33 (2), pp. 249–274. External Links: ISSN 01795376 Cited by: §2.1.
Appendix A Introduction to Persistent Homology
Persistent homology quantifies the change in topological holes (connected components, loops, voids, …) across a filtration, which is an ordered sequence of simplicial complexes
of an initial complex . A simplicial complex can be seen as a generalization of a graph, that apart from nodes (0simplices) and edges (1simplices), also includes higherdimensional simplices such as triangles (2simplices), tetrahedra (3simplices), …, with the added constraint that if contains a simplex , every simplex must also be contained in . A simplex is commonly written as the set of its included vertices, , and its dimension is by definition .
An example filtration is shown in Figure 0(a) in the main paper. Here, the initial complex is the Delanauy triangulation of a point cloud data set (here . This triangulation, i.e., simplicial complex, is a subdivision of the convex hull of into simplices such that any two simplices intersect in a common face of , or not at all, and such that the set of vertices of the simplices are contained in , and such that no point in is inside the circum(hyper)sphere of any simplex. Note that this complex is also shown in Figure 0(a) in the main paper (at time ).
The filtration constructed from in Figure 0(a) equals the filtration. Here, every simplex in is assigned a filtration value , which equals the square of the circumradius of if its circumsphere contains no other vertices than those in , in which case is said to be Gabriel, and as the minimum of the filtration values of the simplices containing that make it not Gabriel otherwise. At time , the complex in the filtration includes all simplices with filtration value at most . Although not required to understand the basic ideas presented in the main paper, for a good overview of how the filtration is constructed, we refer the interested reader to The GUDHI Project (2021).
What is most important is that the filtration constructed from a point cloud is well able to capture topological properties of the underlying model of . For example, in Figure 0(a), we see that at some time in the filtration, the simplicial complex includes four connected components, one for each of the letter ‘I’, ‘C’, ‘L’, and ‘R’. We also see that at some time, the complex captures the cycle in the letter ‘R’, and later, it captures the larger cycle composed by the letters ‘C’ and ‘L’. These correspond to topological holes in the underlying model of . A 0dimensional hole is a gap between components, a 1dimensional hole is a cycle or a loop, a 2dimensional hole is a void, and in general, an dimensional hole can be regarded as the inside of an sphere. Here, true topological holes, i.e., those of the underlying model, tend to persist longer in the filtration.
Persistent homology now tracks and quantifies these topological holes, of which the results are commonly visualized by means of a persistence diagram. A persistence diagram contains a tuple for each topological hole of a fixed dimension that is born at time and that dies at time in a filtration. Persistence diagrams for different dimensions of holes in the same data are usually plotted on top of each other, as in Figure 0(b) in the main paper. Holes that persist longer correspond to more elevated points in the diagram, and capture more prominent topological properties of the underlying model. Tuples for which , which occur when a hole never dies in the filtration (e.g., at some point, the filtration will always remain connected), are usually plotted on top of the diagram.
Note that topological optimization—that is, optimizing the data representation with respect to its persistence diagram(s) and one of the main tools for our proposed method of topological regularization—is especially effective when conducted through the filtration constructed from a lowdimensional data embedding matrix . In particular when
, e.g., for data visualization applications—which was also the focus in the experiments section in the main paper—the
filtration can be rapidly constructed from , whereas its computational cost increases exponentially for larger dimensions . A potential solution to this is to use VietorisRips filtrations instead. These are filtrations constructed from the VietorisRips complex of the data , which includes a simplex for every possible subset of points in , of which the dimensions are constrained by the homology dimension (plus one) of interest in practice. While VietorisRips complexes, and thus, the filtrations thereof, can be constructed more rapidly in higher dimensions, they tend to include far more simplices than the filtration, which inherently complicates the subsequent computation of persistent homology. Thus, optimizing the loss for topological regularization (equation (1) in the main paper) is most efficient through filtrations for lowdimensional data embedding matrices . For more details on the computational cost of persistent homology as well as the associated filtrations, we refer to Otter et al. (2017).Appendix B Supplementary Experiments
We considered an additional experiment on the Harry Potter graph obtained from https://github.com/hzjken/characternetwork. This graph is composed of characters from the Harry Potter novel (the nodes in the graph), and edges marking friendly relationships between them (Figure 9). Only the largest connected component is used. This graph has previously been analyzed by Vandaele et al. (2020), who identified a circular model therein that transitions between the ‘good’ and ‘evil’ characters from the novel.
To embed the Harry Potter graph, we used a simple graph embedding model where the sigmoid of the inner product between embedded nodes captures the (Bernoulli) probability of an edge occurrence
(Rendle et al., 2020). Thus, this probability will be high for nodes close to each other in the embedding, and low for more distant nodes. These probabilities are then optimized to match the binary edge indicator vector. Figure 9(a) shows the result of this embedding, along with the circular model presented by Vandaele et al. (2020). For clarity, character labels are only annotated for a subset of the nodes (the same as by Vandaele et al. (2020)).We furthermore regularized this embedding using a topological loss function that measures the persistence of the most prominent 1dimensional hole in the embedding (see also Table 2 in the main paper), the result of which is shown in Figure 9(c). Interestingly, the topologically regularized embedding now better captures the circularity of the model identified by Vandaele et al. (2020), and focuses more on distributing the characters along it. Note that although this model is included in the visualizations, it is not used to derive the embeddings, nor is it derived from them.
For comparison, Figure 9(b) shows the result of optimizing the initialized ordinary graph embedding for the same topological loss, but without the graph embedding loss. We observe that this results in a sparse enlarged cycle. Most characters are positioned poorly along the circular model, and concentrate near a small region. Interestingly, even though we only optimized for the topological loss here, it is actually less optimal, i.e., higher, than when we applied topological regularization (see also Table 4 in the main paper). This is also a result from the sparsity of the circle, which constitutes to a larger birth time, and thus a lower persistence, of the corresponding hole.