1 Introduction
Graphstructured data is very common in our life. Learning from graphs is important in various scientific and industrial domains (Zhang et al., 2020; Wu et al., 2020). In this paper, we focus on the link prediction task, i.e., to learn to predict whether an edge exists between two target nodes, conditioned on their attributes and local connectivity (LibenNowell and Kleinberg, 2007; Schlichtkrull et al., 2018; Lü and Zhou, 2011). Link prediction is an important step in knowledge discovery in various applications, e.g., recommendation systems (Koren et al., 2009; Adamic and Adar, 2003)
, knowledge graph completion
(Teru et al., 2020), proteinprotein interactions (Coulomb et al., 2005), and gene prediction (Nagarajan et al., 2015).Classic link prediction methods (Barabási and Albert, 1999; Zhou et al., 2009; Brin and Page, 2012) use handcrafted connectivity features and enforce strong assumptions of the distributions of links, e.g., nodes with similar connectivity features tend to be connected. Better performances have been achieved by comparing the similarity of nodes in the embedding space (Perozzi et al., 2014), which encodes more global connectivity information. In recent years, graph neural networks (GNNs) have achieved stateoftheart link prediction performance as they exploit graph connectivity information and node attributes in a completely data driven manner. However, even for GNNs, graph connectivity information such as node degrees is beneficial. It provides valuable contextual information for the graph convolutional operations (Kipf and Welling, 2016; Qiu et al., 2018).
For link prediction, an effective strategy is the direct modeling of the interaction between two target nodes. The path distance of nearby nodes from the target nodes has been shown useful (Zhang and Chen, 2018). However, these distancebased methods mainly focus on the “closeness” between target nodes, but do not explicitly model the “richness of connections” between them. In Figure 1, we show examples with the same distance encoding. But the connections in the right example are much richer than in the left example. It is conceivable that nodes with a wealth of multihop connections have a better chance to share an edge.
To exploit the richness of connections between nodes, we propose a novel method based on the theory of persistent homology (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2010), which encodes highorder structural information of the graph via algebraic topology. The theory captures topological structures of arbitrary shape and scale, e.g., connected components and loops, and encodes them in a robusttonoise manner. To predict whether two given nodes and are connected, our method explicitly counts the number of loops within their vicinity. This count essentially measures the complexity of connections between and , which can be indicative of the existence of links. The method not only counts the numbers of loops, but also measures the range of distance from and for each loop. See Figure 2 for an illustration. This rich set of structural information is mapped into a topological feature space and is integrated into a graph neural network. As will be shown in experiments, our topological loopcounting graph neural network achieves better performance for link prediction.
Persistent homology has been used for learning with graphs (Zhao and Wang, 2019; Hofer et al., 2020, 2017; Carrière et al., 2020). However, most existing works use it as a global structural feature for the whole graph. These global features, although proven useful for graph classification tasks, cannot describe the interaction between a pair of nodes.
In this paper, we propose a pairwise topological feature to capture the richness of the interaction between a specific pair of target nodes. We compute topological information within the vicinity of the target nodes, i.e., the intersection of the hop neighborhoods of the nodes. It has been shown that such local enclosing graph carries sufficient information for link prediction (Zhang and Chen, 2018). To measure the saliency of topological structures, we also introduce a distancebased filter function to measure the interaction between nodes. These choices ensure our pairwise topological feature to be informative of the interaction between the targets.
We propose topological loopcounting graph neural network (TLCGNN) by injecting the pairwise topological feature into the latent representation of a graph neural network. Our method achieves stateoftheart performance in link prediction. Another contribution of this paper is on the computational side. To capture the full information of loops, we use the extended persistent homology (CohenSteiner et al., 2009). The commonly used algorithm is based on a matrix reduction algorithm that is similar to the Gaussian elimination. It is cubic to the input graph/subgraph size, ^{1}^{1}1In theory, the fastest algorithm for persistence diagram has the same complexity as matrix multiplication (Edelsbrunner and Harer, 2010; Milosavljević et al., 2011), i.e., , in which (Alman and Williams, 2021).. Since the computation needs to be executed on all training/validation/testing pairs of nodes, we could significantly benefit from a faster algorithm. In this paper, we propose a novel algorithm for the extended persistent homology, specific for graphs. Instead of the expensive matrix reduction, our algorithm directly operates on the graph and is quadratic to the input graph/subgraph size, . This algorithm is not application specific and can be applied to other graph learning problems (Zhao and Wang, 2019; Hofer et al., 2020, 2017; Carrière et al., 2020).
In summary, our contribution is threefold:

[topsep=0pt, partopsep=0pt,itemsep=0pt,parsep=0pt]

We introduce a pairwise topological feature based on persistent homology to measure the complexity of interaction between nodes. We compute the topological feature specific to the target nodes using a carefully designed filter function and domain of computation.

We use the pairwise topological feature to enhance the latent representation of a graph neural network and achieve stateoftheart link prediction results on various benchmarks.

We propose a generalpurpose fast algorithm to compute extended persistence homology on graphs. The time complexity is improved from cubic to quadratic to the input size. It applies to many other persistencehomologybased learning methods for graphs.
Outline. In Section 2, we briefly introduce existing works on link prediction and on learning with topological information. In Section 3, we present details of extended persistence homology and our model, TLCGNN. In Section 4, we introduce a faster algorithm for extended persistence homology and prove its correctness. In Section 5, we evaluate our method on synthetic and realworld benchmarks.
2 Related Work
Link prediction methods. Early works (Barabási and Albert, 1999; Zhou et al., 2009; Brin and Page, 2012; Jeh and Widom, 2002) predict links based on node similarity scores measured within the local neighborhood of the two target nodes. These methods tend to have strong assumptions on the link distribution and do not generalize well. Graph embeddings have been used to encode more global structural information for link prediction (Koren et al., 2009; Airoldi et al., 2008; Perozzi et al., 2014; Tang et al., 2015; Qiu et al., 2018). However, these methods only rely on graph connectivity and do not take full advantage of node attributes.
In recent years, new GNNbased methods have been proposed to jointly leverage graph connectivity and node attributes. Zhang and Chen (2018) show that local enclosing subgraphs contains sufficient information, and propose a GNN to leverage such information. Considering the nonEuclidean nature of graph metrics, one may generalize graph convolution to the hyperbolic space (Chami et al., 2019; Zhu et al., 2020). However, most existing methods use either limited structural information or node embedding to represent edge features. They do not explicitly model the advanced topological information that arises in node interaction.
Learning with topological features. Persistence homology (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2010) captures structural information from the data using the language of algebraic topology (Munkres, 2018). It captures multiscale topological structures in a provably robust manner (CohenSteiner et al., 2007)
. Different learning methods for persistent homology information have been proposed, such as direct vectorization
(Adams et al., 2017), kernel machines (Reininghaus et al., 2015; Kusano et al., 2016; Carriere et al., 2017), convolutional neural networks
(Hofer et al., 2017) and topological loss (Chen et al., 2019; Hu et al., 2019; Hofer et al., 2019).For graphstructured data, topological features have been used for node classification (Zhao et al., 2020) and graph classification (Zhao and Wang, 2019; Hofer et al., 2020; Carrière et al., 2020). However, these existing methods cannot model interactions between nodes as desired in link prediction tasks. Bhatia et al. (2018) also use persistent homology for link prediction. But their method only exploits 0dimensional topology, i.e., whether the target nodes are connected or not within the local neighborhood. This cannot capture the complexity of connection as we intend to model.
3 Link Prediction with Persistence Homology
In this section, we introduce our topological loopcounting graph neural network (TLCGNN), which computes persistence homology based on the chosen subgraph and incorporates the pairwise topological feature into a graph neural network. The input of the model includes two target nodes and a subgraph encoding their topological information. The output is the probability of whether an edge exists between the two target nodes.
In section 3.1 and 3.2, we will briefly introduce the extended persistence homology and its computation, respectively. In section 3.3, we will illustrate how to combine the topological feature with a standard graph neural network.
3.1 Extended Persistence Homology
In this section, we provide a brief introduction to extended persistence homology and refer the reader to (Edelsbrunner et al., 2000; CohenSteiner et al., 2009) for details. In the setting of graphs, the data only contain 0dimensional (connected components) and 1dimensional (loops) topological structures^{2}^{2}2In graphs, there is no triangle. As a result, all the loops are topological structures (nonbounding cycles). . We define simplices to be all elements in the graph, including nodes and edges. The combinatorial relationship between simplices determines the topological structure of the graph, and persistence homology can count the number of these topological structures. Besides, persistence homology measures the saliency of all topological structures in view of a scalar function defined on all simplices, called the filter function. For example, let , be the sets of nodes and edges, and be the union of and , namely the set of simplices. The filter function for nodes can be defined as the sum of the distance to the target nodes. Then we can further define the filter function for edge as the maximum value of and .
Given the filter function for all the simplices in a graph, we can define as the sublevel set of : . Here is a threshold in the filter function , and is the subset of whose simplices are not greater than . As the threshold increases from to , we obtain the ascending filtration of : . An example is shown in the first half of Figure 2 (b).
With the increasing of threshold , the sublevel set grows from empty to , and new topological structures gradually appear (born) and disappear (die). For example, two connected components appear when reaching and (for simplicity, we replace with ). One of the connected components disappears in . Besides, two loops appear when reaching .
Extended persistence. However, with the settings above, we find that some structures, such as the whole connected component and all the loops, will never disappear. To address this limitation of ordinary persistence homology, CohenSteiner et al. (2009) propose extended persistence by introducing another filtration of the superlevel set . Let decrease from to , and we can obtain the descending filtration of : ^{3}^{3}3Technically, the superlevel set should be the relative homology groups in the second half of the filtration.. An example descending filtration is shown in the second half of Figure 2 (b). Note in the descending filtration, the filter function for an edge is set differently, i.e., = .
For loops and the whole connected component in a graph, the death time can be defined as the filter value in the superlevel set when the structure appears again. For instance, the extended persistence point of the loop in Figure 2 is . It is born when reaching in the ascending filtration and dies when reaching in the descending filtration. It is a bit counterintuitive that the death time can be smaller than the birth time.
After capturing the birth, death times of all the topological structures, we encode them into a 2D point set called persistence diagram. Each topological structure corresponds to one persistence point in the diagram. Its x and y coordinates are the birth and death times. In Figure 2 (c), we show an extended persistence diagram.
After obtaining the extended persistence diagram, we can encode it into a vectorized feature called persistence image (Adams et al., 2017). Persistence image is an approach to convert persistence diagrams to vectors with a fixed dimension. Further details of the persistence image are available in the appendix.
3.2 Matrix Reduction Algorithm for Extended Persistence Diagram
In this section, we will introduce the algorithm to compute the extended persistence diagrams. Let be the total number of simplices (). We write () as the ascending sequence of simplices in , i.e., ^{4}^{4}4Without loss of generality, we assume the filter function of all nodes are distinct.. Similarly, we write () as the descending sequence of simplices in . Every simplex will appear once in the ascending sequence and once in the descending sequence.
To compute the extended persistence diagram, we need a binary valued matrix to encode the adjacency relationship between nodes and edges. Matrix is a matrix consisting of four matrices: . Every column or row of corresponds to a simplex. In particular, the first columns of correspond to the ascending sequence of simplices . The last columns of correspond to the descending sequence of simplices . The setting is the same for the rows of . Matrix encodes the relationship between all the simplices in the ascending sequence. Similar to the incidence matrix of a graph, iff is the boundary of , i.e., is a node adjacent to the edge . The matrix of Figure 2 is shown in Figure 3
is defined similarly, except that it encodes the relationship of the simplices in the descending sequence, i.e., iff is the boundary of . stores the permutation that connects the two sequences of simplices, i.e., iff and denote the same simplex. is a zerovalued matrix. In the appendix, we will provide a complete example of matrix .
Algorithm 1 reduces from left to right. To describe the reduction process, let be the maximum row index for which . For instance, in Figure 3, . If column is zero, then is undefined. is reduced if for any two nonzero columns . Notice that “add” here is the mod2 sum of the binary column vectors. After the matrix reduction is finished, we can record the persistence diagram. Each pair of simplex ids corresponds to an extended persistence pair. By taking the filter function value of the corresponding simplices, we will obtain the corresponding birth and death times of a persistence point. We slightly abuse the notation, that is to denote the birth and death time as and respectively. To better illustrate Algorithm 1, we provide the reduction process of Figure 2 in the appendix.
3.3 TLCGNN for Link Prediction
We have introduced extended persistence homology, its computation and vectorization. Next, we explain how our method computes topological feature for a specific pair of target nodes, and combine it with GNN for link prediction.
To characterize the interaction between two target nodes, we need to choose a vicinity graph and a corresponding filter function. Given a weighted graph, we can define the sum of the distance from a certain node to the target nodes as the filter function. Existing works using persistence homology usually exploit topological information from the whole graph (Zhao and Wang, 2019; Hofer et al., 2020), thus cannot be directly used in our task. Besides, topological structures captured by global persistence homology can be irrelevant to the target nodes if they are far away. As justified in (Zhang and Chen, 2018), local enclosing subgraphs already contain enough information for link prediction. Consequently, we exploit the intersection of the hop neighborhoods of the two target nodes: let be the whole graph, where and are the set of vertices and edges. and are the khop neighborhoods of target nodes and , is the intersection. The subgraph of interest is , where . We compute the persistence image of the subgraphs generated by all possible links, and define as the persistence image of the subgraph generated by target nodes and .
In our setting, the model to obtain node embeddings is a classic layer graph convolutional network (GCN) (Kipf and Welling, 2016). Messages are passed between nodes to update their feature representations. After the layer GCN, the embedding of a certain node can be viewed as a combination of node representation from its hop neighborhood. where is the representation of node in the th layer, , and is the number of nodes. Here, is the input node features, and is the node embeddings of the final layer.
To compute the representation of all the nodes, in the th layer, , where denotes the adjacency matrix with inserted selfloop and is a diagonal matrix with .
represents the activation function, and
is a learned matrix to encode node embedding from dimension to . The nodewise formulation of node is given by where is the neighborhood of node including itself.For given target nodes and , the node embeddings and obtained through the GCN model can be viewed as the node feature, and the persistence image can be viewed as the edge feature. To combine the two features effectively, we use a modified FermiDirac decoder (Krioukov et al., 2010; Nickel and Kiela, 2017): is defined as the distinction between the two nodes. It is then concatenated with
, and passed to a two layer MultiLayer Perceptron (MLP), which outputs a single value. Denote the value after the two layer MLP as distance of the two target nodes:
, and the final probability of whether there exists an edge between is . We then train TLCGNN by minimizing the crossentropy loss using negative sampling.4 A Faster Algorithm for Extended Persistence Diagram
In this section, we propose a new algorithm for extended persistent homology. Recall that Algorithm 1 reduces the matrix . In the new algorithm, we manage to avoid explicit construction and reduction of the matrix . Instead, we create and maintain a rooted tree while going through the descending filtration. When processing a new edge, by inspecting its relationship with the current tree, we can find the corresponding persistence pair efficiently. Afterward, we update the tree and continue with the next edge in the descending filtration. In section 4.1, we explain the algorithm in details. In section 4.2, we prove the correctness of the proposed algorithm.
4.1 A Faster Algorithm
The faster algorithm is shown in Algorithm 2. For 0dim topology, we run existing unionfind algorithm (Edelsbrunner and Harer, 2010) for the ascending filtration once and for the descending filtration once. The algorithm is guaranteed to be correct. In the following section, we will therefore mainly focus on 1dimensional extended persistence.
Take Figure 4 as an example, with the UnionFind algorithm (Edelsbrunner et al., 2000; CohenSteiner et al., 2006), we can procure the set of positive edges and the set of negative edges in the descending filtration: , . Recall that positive edges give rise to new loops, while negative edges do not. And in the descending filtration, every positive edge will be paired with a certain edge in the ascending filtration.
In order to find all the persistence pairs, we construct a tree with all the negative edges and nodes. Every time we add a positive edge into the tree, a loop appears. The newly added edge and the edge which appears the latest in the ascending filtration in the loop form the persistence pair. We then delete the paired edge to update the tree iteratively.
Look at Figure 4, all the negative edges and nodes form (a). The positive edge is then added, and we can discover the loop it forms: . In the loop, the edge that appears the latest in the ascending filtration is . The sequence of simplices in the ascending filtration is shown in Figure 3. Then is paired with , we can obtain the persistence point .
After that, we delete the paired edge to get (c). Then the positive edge is added, with the same process, we can discover the loop: , and the paired edge is . The persistence point of the loop is , so the final 1dimensional extended persistence diagram is .
Complexity of Algorithm 2. For every positive edge, the time cost to get the loop and to form the new path are both , here denotes the number of nodes, and represents the number of edges. So the time cost among all positive edges is . The unionfind algorithm will take . Therefore the final computation cost is . This is much more efficient than the complexity of the matrix reduction algorithm, .
4.2 The Correctness of Algorithm 2
In this section, we prove the proposed faster algorithm is correct, i.e., it produces the same output as the matrix reduction algorithm. Formally, we state the theorem as follows.
We briefly explain the idea of the proof. A complete proof will be provided in the appendix.
The 0dimensional persistent homology is computed using known UnionFind algorithm. We only need to prove the correctness for the 1dimensional topology, i.e., loops. Recall each loop is created by an edge in the ascending filtration and destroyed by an edge in the descending filtration. We plan to show that the pairs of the edges found by our new algorithm are the same as the pairs found by the matrix reduction algorithm. Denote by the reduced matrix of and its submatrices.
We classify edges in the descending filtration into negative descending edges and positive descending edges. A negative descending edge
destroys a connected component created by a descending node . In the reduced matrix , its lowest entry falls into . A positive descending edge adds a new loop to the superlevel set. But for extended persistence, it destroys the loop created by an edge in the ascending filtration. The lowest entry of , falls into . See Figure 2 (c) for an illustration of different types of simplex pairs and their corresponding persistence points in the extended diagram.The core of our proof is to show that for each positive descending edge, its corresponding pair found by the new algorithm is equivalent to the pair in the reduced algorithm. We prove this by induction. As we go through all positive descending edges in the descending filtration, we show that for edge , the pairing by our algorithm is the same as the reduction result. We prove that:

[topsep=0pt, partopsep=0pt,itemsep=0pt,parsep=0pt]

After reducing the part of the column of , i.e., when , the remaining entries of in constitute a unique loop in . Here is the tree after updating the first positive edges.

The lowest entry of the reduced column, , is not paired by any other previous edges, and thus will be paired with . Indeed, it is the last edge in the loop w.r.t. the ascending ordering.

The updating of the tree is equivalent to adding the reduced column of to all descending columns with nonzero entry at row of . Although this will affect the final reduced matrix, it will not change the resulting simplex pairings.
5 Experiments
Method  Cora  Citeseer  PubMed  Photo  Computers 

GCN (Kipf and Welling, 2016)  90.470.240*  82.561.920*  89.563.660*  91.820.000  87.750.000 
HGCN (Chami et al., 2019)  92.900.100*  95.250.000  96.300.000*  95.400.000  93.610.000 
GIL (Zhu et al., 2020)  98.280.850*  99.850.450*  95.490.160*  97.110.007  95.890.010 
SEAL (Zhang and Chen, 2018)  91.330.057  89.810.023  92.420.119  97.830.013  96.750.015 
PEGN (Zhao et al., 2020)  96.300.002  98.930.001  96.590.002  96.610.001  95.060.001 
TLCGNN (nodewise)  97.560.005  98.810.002  97.620.003  96.260.026  96.500.021 
TLCGNN (DRNL)  98.340.002  98.700.002  98.780.001  97.830.003  97.490.004 
TLCGNN (Ricci)  98.090.002  98.670.002  98.580.001  98.370.001  98.340.001 
Mean and standard deviation of ROCAUC on realworld data. “*”: results copied from
(Chami et al., 2019; Zhu et al., 2020).We evaluate our method on synthetic and realworld graph datasets. We compare with different SOTA link prediction baselines. Furthermore, we evaluate the efficiency of the proposed faster algorithm for extended persistent homology.
Baseline methods. We compare our framework TLCGNN with different link prediction methods. We compare with popular GNN models such as GCN (Kipf and Welling, 2016) and GAT (Veličković et al., 2017). We use them for node embedding and then adopt the FermiDirac decoder (Krioukov et al., 2010; Nickel and Kiela, 2017) to predict whether there is a link between two nodes. We also compare with several SOTA methods. SEAL (Zhang and Chen, 2018), which utilize local distances and pretrained transductive node features for link prediction.HGCN (Chami et al., 2019), which introduces hyperbolic space to deal with freescale graphs. GIL (Zhu et al., 2020), which takes advantage of both Euclidean and hyperbolic geometries.
To demonstrate the importance of pairwise topological features, we also compare with two baseline methods using nodewise topological features, i.e., topological feature for each node. PEGN (Zhao et al., 2020) extracts topological features for each node using its neighborhood. Then the nodewise topological feature is used to recalibrate the graph convolution. PEGN was originally designed for node classification. Similar to GCN and GAT, we adapt it to link prediction via the FermiDirac decoder. A second baseline, TCLGNN (nodewise), is a modification of our method TLCGNN (Ricci). Instead of the pairwise topological feature for a pair of target nodes, we concatenate their nodewise topological features, inject it into the node embedding, and then use MLP to predict.
For all settings, we randomly split edges into 85/5/10% for training, validation, and test sets. To compute the distancebased filter function, we need a graph metric. We use hopdistance and OllivierRicci curvature (Ni et al., 2018) respectively. The filter function with hopdistance is the same as DoubleRadius Node Labeling (DRNL) (Zhang and Chen, 2018). For OllivierRicci curvature, we compute the curvature for all training edges and use them as the edge weights. We add 1 to the curvature of all edges to avoid negative edge weights. With this weighted graph, we define the filter function of each node as the total shortest distance from the two target nodes. Recall we need to use the intersection of the khop neighborhoods of the two target nodes to compute pairwise topological features. The choice of is either 1 or 2, depending on the size and density of the graph.
5.1 Synthetic Experiments
We generate synthetic data using a graph theoretical model called Stochastic Block Model (SBM) (Holland et al., 1983). To be specific, we create random graphs with 1000 nodes, forming 5 equalsize communities. Edges are randomly sampled with intracommunity probability and intercommunity probability . We randomly create 12 graphs with ranging in and ranging in . Furthermore, we assign each node with a randomly created feature of dimension 100 and use them as the input of all the methods. For pairwise persistence diagrams, we use intersections of 1hop neighborhoods for all the graphs.
We run the methods on each synthetic graph 10 times and report the mean average area under the ROC curve (ROCAUC) score as the result. More details can be found in the appendix.
Results. Figure 5 reports the results of 6 methods. We observe that TLCGNN outperforms nearly all the SOTA baselines among all the synthetic graphs. This justifies the benefit of pairwise topological feature. We observe that GIL sometimes performs well, but is unstable overall. GCN and GAT are no better than random guessing. This implies that node degree alone cannot provide enough structural information for link prediction.
5.2 RealWorld Benchmarks
We use a variety of datasets: (a) Cora, Citeseer and PubMed (Sen et al., 2008) are standard benchmarks describing citation networks. (b) Photo and Computers (Shchur et al., 2018) are graphs related to Amazon shopping records. More details can be found in the appendix.
Cora  Citeseer  PubMed  Photo  Computers  

Alg. 1  0.0025  0.0018  0.0068  1.6557  4.6531 
Alg. 2  0.0009  0.0008  0.0027  1.1176  2.7033 
For pairwise persistence diagrams, we compute intersections of 1hop neighborhoods for Photo and Computers due to their high density. We use intersections of 2hop neighborhoods for Cora, Citeseer, and PubMed. Following prior works (Chami et al., 2019; Zhu et al., 2020), we evaluate link prediction using the ROCAUC score on the test set. We run the methods on each graph 50 times to get the results.
Results. Table 1 summarizes the performance of all methods. On small and sparse graphs like Cora and Citeseer, our proposed method is on par with SOTA methods. Nevertheless, on large and dense graphs such as PubMed, Photo, and Computers, our method is superior to others. This implies that the highorder topological feature is more effective in these large and dense graphs, which tend to have rich and heterogeneous structures.
Ablation study. Comparing with models using nodewise persistence homology (PEGN and TLCGNN (nodewise)), TLCGNN generally performs better. This proves that for link prediction task, the proposed pairwise persistence homology carries crucial structural information to model node interaction. Whereas nodewise topological feature cannot.
5.3 Algorithm Efficiency
To verify the computational benefit of the proposed new algorithm (Algorithm 2), we compare it with the classic matrix reduction algorithm (Algorithm 1) (implemented in the Dionysus package (Morozov, )). Both implementations are written in python^{5}^{5}5The codes of Dionysus are transformed into Cython, a CExtension for Python.. We compare the two algorithms on all realworld benchmarks. We report the average running time for computing the pairwise topological feature for each edge (in seconds). More details are available in the appendix.
6 Conclusion
In this paper, we propose a novel pairwise topological feature for link prediction, based on the theory of persistent homology. We also introduce a GNN model to leverage this topological feature. Experiments show that our approach outperforms stateofthearts on various benchmarks, especially on large and dense graphs. Besides, we propose a novel algorithm to more efficiently calculate the extended persistence diagrams on graphs. We verify the correctness and efficiency of the algorithm. The algorithm can be generalized to other graph learning tasks.
References
 Friends and neighbors on the web. Social networks 25 (3), pp. 211–230. Cited by: §1.

Persistence images: a stable vector representation of persistent homology.
The Journal of Machine Learning Research
18 (1), pp. 218–252. Cited by: §2, §3.1, A.1 Persistence Image, A.1 Persistence Image.  Mixed membership stochastic blockmodels. Journal of machine learning research 9 (Sep), pp. 1981–2014. Cited by: §2.
 A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACMSIAM Symposium on Discrete Algorithms (SODA), pp. 522–539. Cited by: footnote 1.
 Emergence of scaling in random networks. science 286 (5439), pp. 509–512. Cited by: §1, §2.
 Understanding and predicting links in graphs: a persistent homology perspective. arXiv preprint arXiv:1811.04049. Cited by: §2.
 Reprint of: the anatomy of a largescale hypertextual web search engine. Computer networks 56 (18), pp. 3825–3833. Cited by: §1, §2.

PersLay: a neural network layer for persistence diagrams and new graph topological signatures.
In
International Conference on Artificial Intelligence and Statistics
, pp. 2786–2796. Cited by: §1, §1, §2.  Sliced wasserstein kernel for persistence diagrams. In International Conference on Machine Learning, pp. 664–673. Cited by: §2.
 Hyperbolic graph convolutional neural networks. In Advances in neural information processing systems, pp. 4868–4879. Cited by: §2, §5.2, Table 1, §5, A.4.2 Detailed Experiment Settings, A.4.2 Detailed Experiment Settings.
 A topological regularizer for classifiers via persistent homology. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2573–2582. Cited by: §2.
 Stability of persistence diagrams. Discrete & computational geometry 37 (1), pp. 103–120. Cited by: §2.
 Extending persistence using poincaré and lefschetz duality. Foundations of Computational Mathematics 9 (1), pp. 79–103. Cited by: §1, §3.1, §3.1, A.2.2 Phase 2, A.2 Examples of the Matrix Reduction Algorithm.
 Vines and vineyards by updating persistence in linear time. In Proceedings of the twentysecond annual symposium on Computational geometry, pp. 119–126. Cited by: §4.1.
 Gene essentiality and the topology of protein interaction networks. Proceedings of the Royal Society B: Biological Sciences 272 (1573), pp. 1721–1725. Cited by: §1.
 Computational topology: an introduction. American Mathematical Soc.. Cited by: §1, §2, §4.1, A.2 Examples of the Matrix Reduction Algorithm, A.3 Correctness of the Faster Algorithm, footnote 1.
 Topological persistence and simplification. In Proceedings 41st annual symposium on foundations of computer science, pp. 454–463. Cited by: §1, §2, §3.1, §4.1, A.2 Examples of the Matrix Reduction Algorithm.
 Graph filtration learning. In International Conference on Machine Learning, pp. 4314–4323. Cited by: §1, §1, §2, §3.3.
 Connectivityoptimized representation learning via persistent homology. In International Conference on Machine Learning, pp. 2751–2760. Cited by: §2.
 Deep learning with topological signatures. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1633–1643. Cited by: §1, §1, §2.
 Stochastic blockmodels: first steps. Social networks 5 (2), pp. 109–137. Cited by: §5.1.
 Topologypreserving deep image segmentation. In Advances in neural information processing systems, pp. 5657–5668. Cited by: §2.
 SimRank: a measure of structuralcontext similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 538–543. Cited by: §2.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §3.3, Table 1, §5.
 Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. Cited by: §1, §2.
 Hyperbolic geometry of complex networks. Physical Review E 82 (3), pp. 036106. Cited by: §3.3, §5.
 Persistence weighted gaussian kernel for topological data analysis. In International Conference on Machine Learning, pp. 2004–2013. Cited by: §2.
 The linkprediction problem for social networks. Journal of the American society for information science and technology 58 (7), pp. 1019–1031. Cited by: §1.
 Link prediction in complex networks: a survey. Physica A: statistical mechanics and its applications 390 (6), pp. 1150–1170. Cited by: §1.
 Zigzag persistent homology in matrix multiplication time. In Proceedings of the twentyseventh Annual Symposium on Computational Geometry, pp. 216–225. Cited by: footnote 1.
 [31] Dinoysus2. Note: https://www.mrzv.org/software/dionysus2/ Cited by: §5.3.
 Elements of algebraic topology. CRC press. Cited by: §2.
 Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2019–2028. Cited by: §1.
 Network alignment by discrete ollivierricci flow. In International Symposium on Graph Drawing and Network Visualization, pp. 447–462. Cited by: §5, A.4.3 Visualization of Extended Persistence.
 Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pp. 6338–6347. Cited by: §3.3, §5.
 Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §1, §2.
 Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §1, §2.

A stable multiscale kernel for topological machine learning.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 4741–4748. Cited by: §2.  Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Cited by: §1.
 Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §5.2, item 1.
 Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868. Cited by: §5.2, item 2.
 Line: largescale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §2.
 Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning, pp. 9448–9457. Cited by: §1.
 Visualizing data using tsne.. Journal of machine learning research 9 (11). Cited by: A.4.3 Visualization of Extended Persistence.
 Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §5.
 A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. Cited by: §1.
 Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, pp. 5165–5175. Cited by: §1, §1, §2, §3.3, Table 1, §5, §5.
 Deep learning on graphs: a survey. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.
 Learning metrics for persistencebased summaries and applications for graph classification. In Advances in Neural Information Processing Systems, pp. 9859–9870. Cited by: §1, §1, §2, §3.3.
 Persistence enhanced graph neural network. In International Conference on Artificial Intelligence and Statistics, pp. 2896–2906. Cited by: §2, Table 1, §5.
 Predicting missing links via local information. The European Physical Journal B 71 (4), pp. 623–630. Cited by: §1, §2.
 Graph geometry interaction learning. Advances in Neural Information Processing Systems 33. Cited by: §2, §5.2, Table 1, §5, A.4.2 Detailed Experiment Settings, A.4.2 Detailed Experiment Settings.
Appendix
In this appendix, we provide (1) additional details for persistence image; (2) additional examples to illustrate the matrix reduction algorithm for extended persistent homology; (3) a complete proof of the correctness of the proposed faster algorithm; and (4) additional experimental details and qualitative results.
A.1 Persistence Image
In recent years, efforts have been made to map persistence diagrams into representations valuable to machine learning tasks. Persistence Image (Adams et al., 2017) is one such approach to convert persistence diagrams to vectors, which will be used in our model. Let
of persistent points. Given a persistence diagram , is the transformed diagram. For any , is the 2D Gaussian function with mean and standard deviation .Let be a nonnegative weight function for the persistence plane . Given a persistence diagram , its persistence surface is defined as: . Fix a grid in the plane with n pixels, the persistence image is the collection of pixels where , thus can be directly used in machine learning tasks. The stability of persistence image under perturbation has been proven in (Adams et al., 2017). In our setting, is a piecewise linear weighting function:
A.2 Examples of the Matrix Reduction Algorithm
The reduction matrix for Figure 6 is shown in Figure 7. Recall that is a binary valued matrix to encode the adjacency relationship between nodes and edges. is a matrix consisting of four matrices: . Every column or row of corresponds to a simplex. In particular, the first columns of correspond to the ascending sequence of simplices . The last columns of correspond to the descending sequence of simplices . The setting is the same for the rows of . Matrix encodes the relationship between all the simplices in the ascending sequence. Similar to the incidence matrix of a graph, iff is the boundary of , i.e., is a node adjacent to the edge .
is defined similarly, except that it encodes the relationship of the simplices in the descending sequence, i.e., iff is the boundary of . stores the permutation that connects the two sequences of simplices, i.e., iff and denote the same simplex. is a zerovalued matrix.
Algorithm 3 reduces the columns of matrix from left to right. If we allow certain flexibility in the reduction ordering, we can separate the algorithm into 3 phases: the reduction of matrix (Phase 1), the reduction of matrix (Phase 2) and the reduction of matrix (Phase 3) (CohenSteiner et al., 2009). We define a simplex as positive if its corresponding column is zero after reduction, and negative if its corresponding column is not zero after reduction. In our setting, all the nodes, as well as edges that give rise to a loop are positive simplices, edges that destroy a connected component are negative simplices. All the simplices are either negative or positive (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2010). Notice that the positive and negative edges in the ascending filtration are not the same as the positive and negative edges in the descending filtration.
In the following paragraph, we will introduce the whole process of the matrix reduction algorithm by introducing the 3 phases successively.
A.2.1 Phase 1
Phase 1 is the matrix reduction for . All the columns of nodes and all the rows of edges are all zero in , therefore they will have no impact on the matrix reduction algorithm. After deleting these rows and columns, matrix is shown below:
For simplicity, we define as the maximum row of node for which . Notice that is originally defined for the row index, and we replace the index with the simplex it represents. i.e., we replace with , and we replace with .
From left to right, we can find that , and , thus we add column of to column of : . Notice that “add” here means the mod2 sum of the two binary vectors. Thus .
Then we can find that , , we add the column of to the column of : , thus . Again we add the column of to the column of : . Notice here the column value for is the value after matrix reduction. Therefore is not paired.
Similarly, we add the column of and column of to column of and get . Then is not paired. After Phase 1, matrix is shown below. And we can obtain the persistence pair: , , .
Recall that encodes the relationship between all the simplices in the ascending sequence. In the ascending sequence, the filter function for an edge is defined as . Thus we can infer the persistence points from the persistence pairs: , , . We remove the persistence points whose birth time and death time are the same, and get the final persistence point: .
Recall that a simplex is defined as positive if its corresponding column is zero after reduction, and negative if its corresponding column is not zero after reduction. So the positive edges in the ascending sequence are and , the negative edges in the ascending sequence are , , and .
A.2.2 Phase 2
Phase 2 is the matrix reduction for . Notice that we define similar to , and Phase 2 influences not only but also (CohenSteiner et al., 2009). All the columns of nodes and all the rows of edges are all zero in . In , all the columns of nodes and all the rows of nodes have no impact to the reduction process and pairing for 1dimensional topology. Therefore for simplicity, we delete these rows and edges and obtain the condensed matrices and :
From left to right, , , . We add the column of to the column of : . Then , and we add column of to column of : . Therefore is not paired. Notice that when we add column of and column of to column of in , we are also adding these columns in . Consequently the column of in will become: .
Then we continue the matrix reduction algorithm, , and . Then we add column of to column of : , . Again we add column of to column of : . Therefore is not paired, and column of in becomes . Matrix and will therefore become:
Similar to the process to discover the persistence point, negative edges and positive edges in Phase 1. We can find that all the persistence pairs appear and disappear at the same time, thus there is no persistence point. The positive edges in the descending sequence are and . The negative edges in the descending sequence are , , and .
Notice that before Phase 3, all the persistence points are all 0dimensional persistence points. In other words, they just record the birth and death of connected components. 1dimensional extended persistence pair will be discussed in Phase 3, where positive edges in the descending filtration will each be paired with an edge in the ascending filtration, thus the saliency of loops can be measured.
A.2.3 Phase 3
Phase 3 is the reduction process of . Only positive descending edges (edges whose column in is 0) will be reduced, thus the value in will not be influenced. Similar to the definition of in Phase 1, we define as the maximum row of edge for which .
From left to right, we only consider the positive descending edges. We find that , . We add column of to column of : .
As a consequence, we can get matrix and after Phase3:
And the extended persistence pair is , . Notice that for a extended persistence pair, the latter edge is the positive edge in the descending filtration, and the former edge is its paired edge in the ascending filtration. i.e, in the extended persistence pair , is the positive edge in the descending filtration, while is the paired edge in the ascending filtration. Recall that for a certain edge, its filter value in the ascending sequence is defined as the maximum value of the filter value of its nodes: . Similarly, . And for a certain edge in the descending filtration, its filter value is defined as the minimum value of the filter value of its nodes: . Similarly, . As a consequence, the extended persistence point are and respectively.
After the whole matrix reduction algorithm, we can get the ordinary persistence diagram for 0dimensional topological structures (connected components): and the extended persistence diagram for 1dimensional topological structures (loops): . To get the 0dimensional extended persistence diagram, the birth and death of the whole connected component is also recorded, that is, the minimum and the maximum filter value . So the 0dimensional extended persistence diagram is , as shown in Figure 6 (c).
A.3 Correctness of the Faster Algorithm
In this section, we provide complete proof of the correctness of the faster algorithm. For convenience, we restate the matrix reduction algorithms Alg. 3 and the proposed faster algorithm Alg. 4. We also restate the main theorem 2.
To prove Theorem 2, the core of our proof is to show that for each positive descending edge, its corresponding pair found by the new algorithm is equivalent to the pair resulting from the matrix reduction algorithm. We prove this by induction. As we go through all positive descending edges in the descending filtration, we show that for each positive edge, which creates a new loop. Its extended persistence pair from our algorithm is the same as the reduction result.
In Lemma 3, we prove that after reducing the part of the column of a given edge , i.e., when , the remaining entries of in constitute a unique loop in .
Comments
There are no comments yet.