Persistence Homology for Link Prediction: An Interactive View

02/20/2021 ∙ by Zuoyu Yan, et al. ∙ 4

Link prediction is an important learning task for graph-structured data. In this paper, we propose a novel topological approach to characterize interactions between two nodes. Our topological feature, based on the extended persistence homology, encodes rich structural information regarding the multi-hop paths connecting nodes. Based on this feature, we propose a graph neural network method that outperforms state-of-the-arts on different benchmarks. As another contribution, we propose a novel algorithm to more efficiently compute the extended persistent diagrams for graphs. This algorithm can be generally applied to accelerate many other topological methods for graph learning tasks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graph-structured data is very common in our life. Learning from graphs is important in various scientific and industrial domains (Zhang et al., 2020; Wu et al., 2020). In this paper, we focus on the link prediction task, i.e., to learn to predict whether an edge exists between two target nodes, conditioned on their attributes and local connectivity (Liben-Nowell and Kleinberg, 2007; Schlichtkrull et al., 2018; Lü and Zhou, 2011). Link prediction is an important step in knowledge discovery in various applications, e.g., recommendation systems (Koren et al., 2009; Adamic and Adar, 2003)

, knowledge graph completion

(Teru et al., 2020), protein-protein interactions (Coulomb et al., 2005), and gene prediction (Nagarajan et al., 2015).

Classic link prediction methods (Barabási and Albert, 1999; Zhou et al., 2009; Brin and Page, 2012) use hand-crafted connectivity features and enforce strong assumptions of the distributions of links, e.g., nodes with similar connectivity features tend to be connected. Better performances have been achieved by comparing the similarity of nodes in the embedding space (Perozzi et al., 2014), which encodes more global connectivity information. In recent years, graph neural networks (GNNs) have achieved state-of-the-art link prediction performance as they exploit graph connectivity information and node attributes in a completely data driven manner. However, even for GNNs, graph connectivity information such as node degrees is beneficial. It provides valuable contextual information for the graph convolutional operations (Kipf and Welling, 2016; Qiu et al., 2018).

For link prediction, an effective strategy is the direct modeling of the interaction between two target nodes. The path distance of nearby nodes from the target nodes has been shown useful (Zhang and Chen, 2018). However, these distance-based methods mainly focus on the “closeness” between target nodes, but do not explicitly model the “richness of connections” between them. In Figure 1, we show examples with the same distance encoding. But the connections in the right example are much richer than in the left example. It is conceivable that nodes with a wealth of multi-hop connections have a better chance to share an edge.

Figure 1: Two example graphs. In both cases, black nodes have the same distance from the red target nodes (either (2,1) or (1,2)). But the richness of the connections is very different. On the left, there are only three connections between the target nodes. On the right, there are many more connections between the targets.
Figure 2: An illustration of extended persistent homology. (a) We plot the input graph with a given filter function. The filter value for each node is , , , . (b) The ascending and descending filtrations of the input graph. The bars of brown and blue colors correspond to the life spans of connected components and loops respectively. The first four figures are the ascending filtration, while the last four figures denote the descending filtration. In the ascending filtration, , while in the descending filtration, . (c) In the resulting extended persistence diagram, red and blue markers correspond to 0-dimensional and 1-dimensional topological structures. There are two blue markers, corresponding to two loops , . The range of filter function for these two loops are , respectively. These ranges are encoded as the coordinates of the blue markers.

To exploit the richness of connections between nodes, we propose a novel method based on the theory of persistent homology (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2010), which encodes high-order structural information of the graph via algebraic topology. The theory captures topological structures of arbitrary shape and scale, e.g., connected components and loops, and encodes them in a robust-to-noise manner. To predict whether two given nodes and are connected, our method explicitly counts the number of loops within their vicinity. This count essentially measures the complexity of connections between and , which can be indicative of the existence of links. The method not only counts the numbers of loops, but also measures the range of distance from and for each loop. See Figure 2 for an illustration. This rich set of structural information is mapped into a topological feature space and is integrated into a graph neural network. As will be shown in experiments, our topological loop-counting graph neural network achieves better performance for link prediction.

Persistent homology has been used for learning with graphs (Zhao and Wang, 2019; Hofer et al., 2020, 2017; Carrière et al., 2020). However, most existing works use it as a global structural feature for the whole graph. These global features, although proven useful for graph classification tasks, cannot describe the interaction between a pair of nodes.

In this paper, we propose a pairwise topological feature to capture the richness of the interaction between a specific pair of target nodes. We compute topological information within the vicinity of the target nodes, i.e., the intersection of the -hop neighborhoods of the nodes. It has been shown that such local enclosing graph carries sufficient information for link prediction (Zhang and Chen, 2018). To measure the saliency of topological structures, we also introduce a distance-based filter function to measure the interaction between nodes. These choices ensure our pairwise topological feature to be informative of the interaction between the targets.

We propose topological loop-counting graph neural network (TLC-GNN) by injecting the pairwise topological feature into the latent representation of a graph neural network. Our method achieves state-of-the-art performance in link prediction. Another contribution of this paper is on the computational side. To capture the full information of loops, we use the extended persistent homology (Cohen-Steiner et al., 2009). The commonly used algorithm is based on a matrix reduction algorithm that is similar to the Gaussian elimination. It is cubic to the input graph/subgraph size, 111In theory, the fastest algorithm for persistence diagram has the same complexity as matrix multiplication (Edelsbrunner and Harer, 2010; Milosavljević et al., 2011), i.e., , in which (Alman and Williams, 2021).. Since the computation needs to be executed on all training/validation/testing pairs of nodes, we could significantly benefit from a faster algorithm. In this paper, we propose a novel algorithm for the extended persistent homology, specific for graphs. Instead of the expensive matrix reduction, our algorithm directly operates on the graph and is quadratic to the input graph/subgraph size, . This algorithm is not application specific and can be applied to other graph learning problems (Zhao and Wang, 2019; Hofer et al., 2020, 2017; Carrière et al., 2020).

In summary, our contribution is three-fold:

  • [topsep=0pt, partopsep=0pt,itemsep=0pt,parsep=0pt]

  • We introduce a pairwise topological feature based on persistent homology to measure the complexity of interaction between nodes. We compute the topological feature specific to the target nodes using a carefully designed filter function and domain of computation.

  • We use the pairwise topological feature to enhance the latent representation of a graph neural network and achieve state-of-the-art link prediction results on various benchmarks.

  • We propose a general-purpose fast algorithm to compute extended persistence homology on graphs. The time complexity is improved from cubic to quadratic to the input size. It applies to many other persistence-homology-based learning methods for graphs.

Outline. In Section 2, we briefly introduce existing works on link prediction and on learning with topological information. In Section 3, we present details of extended persistence homology and our model, TLC-GNN. In Section 4, we introduce a faster algorithm for extended persistence homology and prove its correctness. In Section 5, we evaluate our method on synthetic and real-world benchmarks.

2 Related Work

Link prediction methods. Early works (Barabási and Albert, 1999; Zhou et al., 2009; Brin and Page, 2012; Jeh and Widom, 2002) predict links based on node similarity scores measured within the local neighborhood of the two target nodes. These methods tend to have strong assumptions on the link distribution and do not generalize well. Graph embeddings have been used to encode more global structural information for link prediction (Koren et al., 2009; Airoldi et al., 2008; Perozzi et al., 2014; Tang et al., 2015; Qiu et al., 2018). However, these methods only rely on graph connectivity and do not take full advantage of node attributes.

In recent years, new GNN-based methods have been proposed to jointly leverage graph connectivity and node attributes. Zhang and Chen (2018) show that local enclosing subgraphs contains sufficient information, and propose a GNN to leverage such information. Considering the non-Euclidean nature of graph metrics, one may generalize graph convolution to the hyperbolic space (Chami et al., 2019; Zhu et al., 2020). However, most existing methods use either limited structural information or node embedding to represent edge features. They do not explicitly model the advanced topological information that arises in node interaction.

Learning with topological features. Persistence homology (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2010) captures structural information from the data using the language of algebraic topology (Munkres, 2018). It captures multi-scale topological structures in a provably robust manner (Cohen-Steiner et al., 2007)

. Different learning methods for persistent homology information have been proposed, such as direct vectorization

(Adams et al., 2017), kernel machines (Reininghaus et al., 2015; Kusano et al., 2016; Carriere et al., 2017)

, convolutional neural networks

(Hofer et al., 2017) and topological loss (Chen et al., 2019; Hu et al., 2019; Hofer et al., 2019).

For graph-structured data, topological features have been used for node classification (Zhao et al., 2020) and graph classification (Zhao and Wang, 2019; Hofer et al., 2020; Carrière et al., 2020). However, these existing methods cannot model interactions between nodes as desired in link prediction tasks. Bhatia et al. (2018) also use persistent homology for link prediction. But their method only exploits 0-dimensional topology, i.e., whether the target nodes are connected or not within the local neighborhood. This cannot capture the complexity of connection as we intend to model.

3 Link Prediction with Persistence Homology

In this section, we introduce our topological loop-counting graph neural network (TLC-GNN), which computes persistence homology based on the chosen subgraph and incorporates the pairwise topological feature into a graph neural network. The input of the model includes two target nodes and a subgraph encoding their topological information. The output is the probability of whether an edge exists between the two target nodes.

In section 3.1 and  3.2, we will briefly introduce the extended persistence homology and its computation, respectively. In section 3.3, we will illustrate how to combine the topological feature with a standard graph neural network.

3.1 Extended Persistence Homology

In this section, we provide a brief introduction to extended persistence homology and refer the reader to (Edelsbrunner et al., 2000; Cohen-Steiner et al., 2009) for details. In the setting of graphs, the data only contain 0-dimensional (connected components) and 1-dimensional (loops) topological structures222In graphs, there is no triangle. As a result, all the loops are topological structures (non-bounding cycles). . We define simplices to be all elements in the graph, including nodes and edges. The combinatorial relationship between simplices determines the topological structure of the graph, and persistence homology can count the number of these topological structures. Besides, persistence homology measures the saliency of all topological structures in view of a scalar function defined on all simplices, called the filter function. For example, let , be the sets of nodes and edges, and be the union of and , namely the set of simplices. The filter function for nodes can be defined as the sum of the distance to the target nodes. Then we can further define the filter function for edge as the maximum value of and .

Given the filter function for all the simplices in a graph, we can define as the sublevel set of : . Here is a threshold in the filter function , and is the subset of whose simplices are not greater than . As the threshold increases from to , we obtain the ascending filtration of : . An example is shown in the first half of Figure 2 (b).

With the increasing of threshold , the sublevel set grows from empty to , and new topological structures gradually appear (born) and disappear (die). For example, two connected components appear when reaching and (for simplicity, we replace with ). One of the connected components disappears in . Besides, two loops appear when reaching .

Extended persistence. However, with the settings above, we find that some structures, such as the whole connected component and all the loops, will never disappear. To address this limitation of ordinary persistence homology, Cohen-Steiner et al. (2009) propose extended persistence by introducing another filtration of the superlevel set . Let decrease from to , and we can obtain the descending filtration of : 333Technically, the superlevel set should be the relative homology groups in the second half of the filtration.. An example descending filtration is shown in the second half of Figure 2 (b). Note in the descending filtration, the filter function for an edge is set differently, i.e., = .

For loops and the whole connected component in a graph, the death time can be defined as the filter value in the superlevel set when the structure appears again. For instance, the extended persistence point of the loop in Figure 2 is . It is born when reaching in the ascending filtration and dies when reaching in the descending filtration. It is a bit counter-intuitive that the death time can be smaller than the birth time.

After capturing the birth, death times of all the topological structures, we encode them into a 2-D point set called persistence diagram. Each topological structure corresponds to one persistence point in the diagram. Its x and y coordinates are the birth and death times. In Figure 2 (c), we show an extended persistence diagram.

After obtaining the extended persistence diagram, we can encode it into a vectorized feature called persistence image (Adams et al., 2017). Persistence image is an approach to convert persistence diagrams to vectors with a fixed dimension. Further details of the persistence image are available in the appendix.

3.2 Matrix Reduction Algorithm for Extended Persistence Diagram

In this section, we will introduce the algorithm to compute the extended persistence diagrams. Let be the total number of simplices (). We write () as the ascending sequence of simplices in , i.e., 444Without loss of generality, we assume the filter function of all nodes are distinct.. Similarly, we write () as the descending sequence of simplices in . Every simplex will appear once in the ascending sequence and once in the descending sequence.

To compute the extended persistence diagram, we need a binary valued matrix to encode the adjacency relationship between nodes and edges. Matrix is a matrix consisting of four matrices: . Every column or row of corresponds to a simplex. In particular, the first columns of correspond to the ascending sequence of simplices . The last columns of correspond to the descending sequence of simplices . The setting is the same for the rows of . Matrix encodes the relationship between all the simplices in the ascending sequence. Similar to the incidence matrix of a graph, iff is the boundary of , i.e., is a node adjacent to the edge . The matrix of Figure 2 is shown in Figure 3

Figure 3: The matrix of Figure 2

is defined similarly, except that it encodes the relationship of the simplices in the descending sequence, i.e., iff is the boundary of . stores the permutation that connects the two sequences of simplices, i.e., iff and denote the same simplex. is a zero-valued matrix. In the appendix, we will provide a complete example of matrix .

Algorithm 1 reduces from left to right. To describe the reduction process, let be the maximum row index for which . For instance, in Figure 3, . If column is zero, then is undefined. is reduced if for any two non-zero columns . Notice that “add” here is the mod-2 sum of the binary column vectors. After the matrix reduction is finished, we can record the persistence diagram. Each pair of simplex ids corresponds to an extended persistence pair. By taking the filter function value of the corresponding simplices, we will obtain the corresponding birth and death times of a persistence point. We slightly abuse the notation, that is to denote the birth and death time as and respectively. To better illustrate Algorithm 1, we provide the reduction process of Figure 2 in the appendix.

  Input: filter funtion , graph
  Persistence Diagram
   build reduction matrix
  for  to  do
     while  with  do
        add column to column
     end while
     add to
  end for
  Output: Persistence Diagram
Algorithm 1 Matrix Reduction

3.3 TLC-GNN for Link Prediction

We have introduced extended persistence homology, its computation and vectorization. Next, we explain how our method computes topological feature for a specific pair of target nodes, and combine it with GNN for link prediction.

To characterize the interaction between two target nodes, we need to choose a vicinity graph and a corresponding filter function. Given a weighted graph, we can define the sum of the distance from a certain node to the target nodes as the filter function. Existing works using persistence homology usually exploit topological information from the whole graph (Zhao and Wang, 2019; Hofer et al., 2020), thus cannot be directly used in our task. Besides, topological structures captured by global persistence homology can be irrelevant to the target nodes if they are far away. As justified in (Zhang and Chen, 2018), local enclosing subgraphs already contain enough information for link prediction. Consequently, we exploit the intersection of the -hop neighborhoods of the two target nodes: let be the whole graph, where and are the set of vertices and edges. and are the k-hop neighborhoods of target nodes and , is the intersection. The subgraph of interest is , where . We compute the persistence image of the subgraphs generated by all possible links, and define as the persistence image of the subgraph generated by target nodes and .

In our setting, the model to obtain node embeddings is a classic -layer graph convolutional network (GCN) (Kipf and Welling, 2016). Messages are passed between nodes to update their feature representations. After the -layer GCN, the embedding of a certain node can be viewed as a combination of node representation from its -hop neighborhood. where is the representation of node in the -th layer, , and is the number of nodes. Here, is the input node features, and is the node embeddings of the final layer.

To compute the representation of all the nodes, in the -th layer, , where denotes the adjacency matrix with inserted self-loop and is a diagonal matrix with .

represents the activation function, and

is a learned matrix to encode node embedding from dimension to . The nodewise formulation of node is given by where is the neighborhood of node including itself.

For given target nodes and , the node embeddings and obtained through the GCN model can be viewed as the node feature, and the persistence image can be viewed as the edge feature. To combine the two features effectively, we use a modified Fermi-Dirac decoder (Krioukov et al., 2010; Nickel and Kiela, 2017): is defined as the distinction between the two nodes. It is then concatenated with

, and passed to a two layer Multi-Layer Perceptron (MLP), which outputs a single value. Denote the value after the two layer MLP as distance of the two target nodes:

, and the final probability of whether there exists an edge between is . We then train TLC-GNN by minimizing the cross-entropy loss using negative sampling.

4 A Faster Algorithm for Extended Persistence Diagram

In this section, we propose a new algorithm for extended persistent homology. Recall that Algorithm 1 reduces the matrix . In the new algorithm, we manage to avoid explicit construction and reduction of the matrix . Instead, we create and maintain a rooted tree while going through the descending filtration. When processing a new edge, by inspecting its relationship with the current tree, we can find the corresponding persistence pair efficiently. Afterward, we update the tree and continue with the next edge in the descending filtration. In section 4.1, we explain the algorithm in details. In section 4.2, we prove the correctness of the proposed algorithm.

4.1 A Faster Algorithm

  Input: filter funtion of descending filtration, graph , filter function of ascending filtration
  0-dim PD, , = Union-Find()
  Tree = + all the nodes
  1-dim PD =
  for  in  do
     assume , is the path from to within the tree , is the path from to
      +
     add to 1-dim PD
     
     
  end for
  Output: 0-dim PD, 1-dim PD
Algorithm 2 A Faster Algorithm for Extended Persistence Diagram

Figure 4: An illustration of Algorithm 2, based on Figure 2.

The faster algorithm is shown in Algorithm 2. For 0-dim topology, we run existing union-find algorithm (Edelsbrunner and Harer, 2010) for the ascending filtration once and for the descending filtration once. The algorithm is guaranteed to be correct. In the following section, we will therefore mainly focus on 1-dimensional extended persistence.

Take Figure 4 as an example, with the Union-Find algorithm (Edelsbrunner et al., 2000; Cohen-Steiner et al., 2006), we can procure the set of positive edges and the set of negative edges in the descending filtration: , . Recall that positive edges give rise to new loops, while negative edges do not. And in the descending filtration, every positive edge will be paired with a certain edge in the ascending filtration.

In order to find all the persistence pairs, we construct a tree with all the negative edges and nodes. Every time we add a positive edge into the tree, a loop appears. The newly added edge and the edge which appears the latest in the ascending filtration in the loop form the persistence pair. We then delete the paired edge to update the tree iteratively.

Look at Figure 4, all the negative edges and nodes form (a). The positive edge is then added, and we can discover the loop it forms: . In the loop, the edge that appears the latest in the ascending filtration is . The sequence of simplices in the ascending filtration is shown in Figure 3. Then is paired with , we can obtain the persistence point .

After that, we delete the paired edge to get (c). Then the positive edge is added, with the same process, we can discover the loop: , and the paired edge is . The persistence point of the loop is , so the final 1-dimensional extended persistence diagram is .

Complexity of Algorithm 2. For every positive edge, the time cost to get the loop and to form the new path are both , here denotes the number of nodes, and represents the number of edges. So the time cost among all positive edges is . The union-find algorithm will take . Therefore the final computation cost is . This is much more efficient than the complexity of the matrix reduction algorithm, .

4.2 The Correctness of Algorithm 2

In this section, we prove the proposed faster algorithm is correct, i.e., it produces the same output as the matrix reduction algorithm. Formally, we state the theorem as follows.

Theorem 1.

Algorithm 2 outputs the same extended persistence diagram as Algorithm 1.

We briefly explain the idea of the proof. A complete proof will be provided in the appendix.

The 0-dimensional persistent homology is computed using known Union-Find algorithm. We only need to prove the correctness for the 1-dimensional topology, i.e., loops. Recall each loop is created by an edge in the ascending filtration and destroyed by an edge in the descending filtration. We plan to show that the pairs of the edges found by our new algorithm are the same as the pairs found by the matrix reduction algorithm. Denote by the reduced matrix of and its submatrices.

We classify edges in the descending filtration into negative descending edges and positive descending edges. A negative descending edge

destroys a connected component created by a descending node . In the reduced matrix , its lowest entry falls into . A positive descending edge adds a new loop to the superlevel set. But for extended persistence, it destroys the loop created by an edge in the ascending filtration. The lowest entry of , falls into . See Figure 2 (c) for an illustration of different types of simplex pairs and their corresponding persistence points in the extended diagram.

The core of our proof is to show that for each positive descending edge, its corresponding pair found by the new algorithm is equivalent to the pair in the reduced algorithm. We prove this by induction. As we go through all positive descending edges in the descending filtration, we show that for edge , the pairing by our algorithm is the same as the reduction result. We prove that:

  • [topsep=0pt, partopsep=0pt,itemsep=0pt,parsep=0pt]

  • After reducing the part of the column of , i.e., when , the remaining entries of in constitute a unique loop in . Here is the tree after updating the first positive edges.

  • The lowest entry of the reduced column, , is not paired by any other previous edges, and thus will be paired with . Indeed, it is the last edge in the loop w.r.t. the ascending ordering.

  • The updating of the tree is equivalent to adding the reduced column of to all descending columns with nonzero entry at row of . Although this will affect the final reduced matrix, it will not change the resulting simplex pairings.

This proves that our new algorithm has the same output as the existing matrix reduction algorithm. We have shown that the new algorithm is much more efficient in terms of complexity (Section 4.1). We will also validate the benefit empirically in Section 5.

5 Experiments


Method Cora Citeseer PubMed Photo Computers
GCN (Kipf and Welling, 2016) 90.470.240* 82.561.920* 89.563.660* 91.820.000 87.750.000
HGCN (Chami et al., 2019) 92.900.100* 95.250.000 96.300.000* 95.400.000 93.610.000
GIL (Zhu et al., 2020) 98.280.850* 99.850.450* 95.490.160* 97.110.007 95.890.010
SEAL (Zhang and Chen, 2018) 91.330.057 89.810.023 92.420.119 97.830.013 96.750.015
PEGN (Zhao et al., 2020) 96.300.002 98.930.001 96.590.002 96.610.001 95.060.001
TLC-GNN (nodewise) 97.560.005 98.810.002 97.620.003 96.260.026 96.500.021
TLC-GNN (DRNL) 98.340.002 98.700.002 98.780.001 97.830.003 97.490.004
TLC-GNN (Ricci) 98.090.002 98.670.002 98.580.001 98.370.001 98.340.001
Table 1:

Mean and standard deviation of ROC-AUC on real-world data. “*”: results copied from

(Chami et al., 2019; Zhu et al., 2020).

We evaluate our method on synthetic and real-world graph datasets. We compare with different SOTA link prediction baselines. Furthermore, we evaluate the efficiency of the proposed faster algorithm for extended persistent homology.

Baseline methods. We compare our framework TLC-GNN with different link prediction methods. We compare with popular GNN models such as GCN (Kipf and Welling, 2016) and GAT (Veličković et al., 2017). We use them for node embedding and then adopt the Fermi-Dirac decoder (Krioukov et al., 2010; Nickel and Kiela, 2017) to predict whether there is a link between two nodes. We also compare with several SOTA methods. SEAL (Zhang and Chen, 2018), which utilize local distances and pre-trained transductive node features for link prediction.HGCN (Chami et al., 2019), which introduces hyperbolic space to deal with free-scale graphs. GIL (Zhu et al., 2020), which takes advantage of both Euclidean and hyperbolic geometries.

To demonstrate the importance of pairwise topological features, we also compare with two baseline methods using node-wise topological features, i.e., topological feature for each node. PEGN (Zhao et al., 2020) extracts topological features for each node using its neighborhood. Then the node-wise topological feature is used to re-calibrate the graph convolution. PEGN was originally designed for node classification. Similar to GCN and GAT, we adapt it to link prediction via the Fermi-Dirac decoder. A second baseline, TCL-GNN (nodewise), is a modification of our method TLC-GNN (Ricci). Instead of the pairwise topological feature for a pair of target nodes, we concatenate their node-wise topological features, inject it into the node embedding, and then use MLP to predict.

For all settings, we randomly split edges into 85/5/10% for training, validation, and test sets. To compute the distance-based filter function, we need a graph metric. We use hop-distance and Ollivier-Ricci curvature (Ni et al., 2018) respectively. The filter function with hop-distance is the same as Double-Radius Node Labeling (DRNL) (Zhang and Chen, 2018). For Ollivier-Ricci curvature, we compute the curvature for all training edges and use them as the edge weights. We add 1 to the curvature of all edges to avoid negative edge weights. With this weighted graph, we define the filter function of each node as the total shortest distance from the two target nodes. Recall we need to use the intersection of the k-hop neighborhoods of the two target nodes to compute pairwise topological features. The choice of is either 1 or 2, depending on the size and density of the graph.

5.1 Synthetic Experiments

We generate synthetic data using a graph theoretical model called Stochastic Block Model (SBM) (Holland et al., 1983). To be specific, we create random graphs with 1000 nodes, forming 5 equal-size communities. Edges are randomly sampled with intra-community probability and inter-community probability . We randomly create 12 graphs with ranging in and ranging in . Furthermore, we assign each node with a randomly created feature of dimension 100 and use them as the input of all the methods. For pairwise persistence diagrams, we use intersections of 1-hop neighborhoods for all the graphs.

We run the methods on each synthetic graph 10 times and report the mean average area under the ROC curve (ROC-AUC) score as the result. More details can be found in the appendix.

Figure 5: Heatmap of ROC-AUC score for different methods on synthetic data. For each method we generate a heatmap for its performance on 12 synthetic graphs with different combinations. We also report in the title of each heatmap the average performance over 12 graphs.

Results. Figure 5 reports the results of 6 methods. We observe that TLC-GNN outperforms nearly all the SOTA baselines among all the synthetic graphs. This justifies the benefit of pairwise topological feature. We observe that GIL sometimes performs well, but is unstable overall. GCN and GAT are no better than random guessing. This implies that node degree alone cannot provide enough structural information for link prediction.

5.2 Real-World Benchmarks

We use a variety of datasets: (a) Cora, Citeseer and PubMed (Sen et al., 2008) are standard benchmarks describing citation networks. (b) Photo and Computers (Shchur et al., 2018) are graphs related to Amazon shopping records. More details can be found in the appendix.

Cora Citeseer PubMed Photo Computers
Alg. 1 0.0025 0.0018 0.0068 1.6557 4.6531
Alg. 2 0.0009 0.0008 0.0027 1.1176 2.7033
Table 2: Computational time (seconds per edge) evaluation.

For pairwise persistence diagrams, we compute intersections of 1-hop neighborhoods for Photo and Computers due to their high density. We use intersections of 2-hop neighborhoods for Cora, Citeseer, and PubMed. Following prior works (Chami et al., 2019; Zhu et al., 2020), we evaluate link prediction using the ROC-AUC score on the test set. We run the methods on each graph 50 times to get the results.

Results. Table 1 summarizes the performance of all methods. On small and sparse graphs like Cora and Citeseer, our proposed method is on par with SOTA methods. Nevertheless, on large and dense graphs such as PubMed, Photo, and Computers, our method is superior to others. This implies that the high-order topological feature is more effective in these large and dense graphs, which tend to have rich and heterogeneous structures.

Ablation study. Comparing with models using nodewise persistence homology (PEGN and TLC-GNN (nodewise)), TLC-GNN generally performs better. This proves that for link prediction task, the proposed pairwise persistence homology carries crucial structural information to model node interaction. Whereas nodewise topological feature cannot.

5.3 Algorithm Efficiency

To verify the computational benefit of the proposed new algorithm (Algorithm 2), we compare it with the classic matrix reduction algorithm (Algorithm 1) (implemented in the Dionysus package (Morozov, )). Both implementations are written in python555The codes of Dionysus are transformed into Cython, a C-Extension for Python.. We compare the two algorithms on all real-world benchmarks. We report the average running time for computing the pairwise topological feature for each edge (in seconds). More details are available in the appendix.

Results. As is shown in Table 2, the proposed Algorithm 2 achieves 1.5 to 3 times speedup compared with the matrix reduction algorithm (Algorithm 1) on all benchmarks.

6 Conclusion

In this paper, we propose a novel pairwise topological feature for link prediction, based on the theory of persistent homology. We also introduce a GNN model to leverage this topological feature. Experiments show that our approach outperforms state-of-the-arts on various benchmarks, especially on large and dense graphs. Besides, we propose a novel algorithm to more efficiently calculate the extended persistence diagrams on graphs. We verify the correctness and efficiency of the algorithm. The algorithm can be generalized to other graph learning tasks.

References

  • L. A. Adamic and E. Adar (2003) Friends and neighbors on the web. Social networks 25 (3), pp. 211–230. Cited by: §1.
  • H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta, and L. Ziegelmeier (2017) Persistence images: a stable vector representation of persistent homology.

    The Journal of Machine Learning Research

    18 (1), pp. 218–252.
    Cited by: §2, §3.1, A.1 Persistence Image, A.1 Persistence Image.
  • E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing (2008) Mixed membership stochastic blockmodels. Journal of machine learning research 9 (Sep), pp. 1981–2014. Cited by: §2.
  • J. Alman and V. V. Williams (2021) A refined laser method and faster matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 522–539. Cited by: footnote 1.
  • A. Barabási and R. Albert (1999) Emergence of scaling in random networks. science 286 (5439), pp. 509–512. Cited by: §1, §2.
  • S. Bhatia, B. Chatterjee, D. Nathani, and M. Kaul (2018) Understanding and predicting links in graphs: a persistent homology perspective. arXiv preprint arXiv:1811.04049. Cited by: §2.
  • S. Brin and L. Page (2012) Reprint of: the anatomy of a large-scale hypertextual web search engine. Computer networks 56 (18), pp. 3825–3833. Cited by: §1, §2.
  • M. Carrière, F. Chazal, Y. Ike, T. Lacombe, M. Royer, and Y. Umeda (2020) PersLay: a neural network layer for persistence diagrams and new graph topological signatures. In

    International Conference on Artificial Intelligence and Statistics

    ,
    pp. 2786–2796. Cited by: §1, §1, §2.
  • M. Carriere, M. Cuturi, and S. Oudot (2017) Sliced wasserstein kernel for persistence diagrams. In International Conference on Machine Learning, pp. 664–673. Cited by: §2.
  • I. Chami, Z. Ying, C. Ré, and J. Leskovec (2019) Hyperbolic graph convolutional neural networks. In Advances in neural information processing systems, pp. 4868–4879. Cited by: §2, §5.2, Table 1, §5, A.4.2 Detailed Experiment Settings, A.4.2 Detailed Experiment Settings.
  • C. Chen, X. Ni, Q. Bai, and Y. Wang (2019) A topological regularizer for classifiers via persistent homology. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2573–2582. Cited by: §2.
  • D. Cohen-Steiner, H. Edelsbrunner, and J. Harer (2007) Stability of persistence diagrams. Discrete & computational geometry 37 (1), pp. 103–120. Cited by: §2.
  • D. Cohen-Steiner, H. Edelsbrunner, and J. Harer (2009) Extending persistence using poincaré and lefschetz duality. Foundations of Computational Mathematics 9 (1), pp. 79–103. Cited by: §1, §3.1, §3.1, A.2.2 Phase 2, A.2 Examples of the Matrix Reduction Algorithm.
  • D. Cohen-Steiner, H. Edelsbrunner, and D. Morozov (2006) Vines and vineyards by updating persistence in linear time. In Proceedings of the twenty-second annual symposium on Computational geometry, pp. 119–126. Cited by: §4.1.
  • S. Coulomb, M. Bauer, D. Bernard, and M. Marsolier-Kergoat (2005) Gene essentiality and the topology of protein interaction networks. Proceedings of the Royal Society B: Biological Sciences 272 (1573), pp. 1721–1725. Cited by: §1.
  • H. Edelsbrunner and J. Harer (2010) Computational topology: an introduction. American Mathematical Soc.. Cited by: §1, §2, §4.1, A.2 Examples of the Matrix Reduction Algorithm, A.3 Correctness of the Faster Algorithm, footnote 1.
  • H. Edelsbrunner, D. Letscher, and A. Zomorodian (2000) Topological persistence and simplification. In Proceedings 41st annual symposium on foundations of computer science, pp. 454–463. Cited by: §1, §2, §3.1, §4.1, A.2 Examples of the Matrix Reduction Algorithm.
  • C. Hofer, F. Graf, B. Rieck, M. Niethammer, and R. Kwitt (2020) Graph filtration learning. In International Conference on Machine Learning, pp. 4314–4323. Cited by: §1, §1, §2, §3.3.
  • C. Hofer, R. Kwitt, M. Niethammer, and M. Dixit (2019) Connectivity-optimized representation learning via persistent homology. In International Conference on Machine Learning, pp. 2751–2760. Cited by: §2.
  • C. Hofer, R. Kwitt, M. Niethammer, and A. Uhl (2017) Deep learning with topological signatures. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 1633–1643. Cited by: §1, §1, §2.
  • P. W. Holland, K. B. Laskey, and S. Leinhardt (1983) Stochastic blockmodels: first steps. Social networks 5 (2), pp. 109–137. Cited by: §5.1.
  • X. Hu, F. Li, D. Samaras, and C. Chen (2019) Topology-preserving deep image segmentation. In Advances in neural information processing systems, pp. 5657–5668. Cited by: §2.
  • G. Jeh and J. Widom (2002) SimRank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 538–543. Cited by: §2.
  • T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §1, §3.3, Table 1, §5.
  • Y. Koren, R. Bell, and C. Volinsky (2009) Matrix factorization techniques for recommender systems. Computer 42 (8), pp. 30–37. Cited by: §1, §2.
  • D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguná (2010) Hyperbolic geometry of complex networks. Physical Review E 82 (3), pp. 036106. Cited by: §3.3, §5.
  • G. Kusano, Y. Hiraoka, and K. Fukumizu (2016) Persistence weighted gaussian kernel for topological data analysis. In International Conference on Machine Learning, pp. 2004–2013. Cited by: §2.
  • D. Liben-Nowell and J. Kleinberg (2007) The link-prediction problem for social networks. Journal of the American society for information science and technology 58 (7), pp. 1019–1031. Cited by: §1.
  • L. Lü and T. Zhou (2011) Link prediction in complex networks: a survey. Physica A: statistical mechanics and its applications 390 (6), pp. 1150–1170. Cited by: §1.
  • N. Milosavljević, D. Morozov, and P. Skraba (2011) Zigzag persistent homology in matrix multiplication time. In Proceedings of the twenty-seventh Annual Symposium on Computational Geometry, pp. 216–225. Cited by: footnote 1.
  • [31] D. Morozov Dinoysus2. Note: https://www.mrzv.org/software/dionysus2/ Cited by: §5.3.
  • J. R. Munkres (2018) Elements of algebraic topology. CRC press. Cited by: §2.
  • M. Nagarajan, A. D. Wilkins, B. J. Bachman, I. B. Novikov, S. Bao, P. J. Haas, M. E. Terrón-Díaz, S. Bhatia, A. K. Adikesavan, J. J. Labrie, et al. (2015) Predicting future scientific discoveries based on a networked analysis of the past literature. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 2019–2028. Cited by: §1.
  • C. Ni, Y. Lin, J. Gao, and X. Gu (2018) Network alignment by discrete ollivier-ricci flow. In International Symposium on Graph Drawing and Network Visualization, pp. 447–462. Cited by: §5, A.4.3 Visualization of Extended Persistence.
  • M. Nickel and D. Kiela (2017) Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pp. 6338–6347. Cited by: §3.3, §5.
  • B. Perozzi, R. Al-Rfou, and S. Skiena (2014) Deepwalk: online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. Cited by: §1, §2.
  • J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang (2018) Network embedding as matrix factorization: unifying deepwalk, line, pte, and node2vec. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 459–467. Cited by: §1, §2.
  • J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt (2015) A stable multi-scale kernel for topological machine learning. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 4741–4748. Cited by: §2.
  • M. Schlichtkrull, T. N. Kipf, P. Bloem, R. Van Den Berg, I. Titov, and M. Welling (2018) Modeling relational data with graph convolutional networks. In European semantic web conference, pp. 593–607. Cited by: §1.
  • P. Sen, G. Namata, M. Bilgic, L. Getoor, B. Galligher, and T. Eliassi-Rad (2008) Collective classification in network data. AI magazine 29 (3), pp. 93–93. Cited by: §5.2, item 1.
  • O. Shchur, M. Mumme, A. Bojchevski, and S. Günnemann (2018) Pitfalls of graph neural network evaluation. arXiv preprint arXiv:1811.05868. Cited by: §5.2, item 2.
  • J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei (2015) Line: large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, pp. 1067–1077. Cited by: §2.
  • K. Teru, E. Denis, and W. Hamilton (2020) Inductive relation prediction by subgraph reasoning. In International Conference on Machine Learning, pp. 9448–9457. Cited by: §1.
  • L. Van der Maaten and G. Hinton (2008) Visualizing data using t-sne.. Journal of machine learning research 9 (11). Cited by: A.4.3 Visualization of Extended Persistence.
  • P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio (2017) Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §5.
  • Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and S. Y. Philip (2020) A comprehensive survey on graph neural networks. IEEE transactions on neural networks and learning systems. Cited by: §1.
  • M. Zhang and Y. Chen (2018) Link prediction based on graph neural networks. In Advances in Neural Information Processing Systems, pp. 5165–5175. Cited by: §1, §1, §2, §3.3, Table 1, §5, §5.
  • Z. Zhang, P. Cui, and W. Zhu (2020) Deep learning on graphs: a survey. IEEE Transactions on Knowledge and Data Engineering. Cited by: §1.
  • Q. Zhao and Y. Wang (2019) Learning metrics for persistence-based summaries and applications for graph classification. In Advances in Neural Information Processing Systems, pp. 9859–9870. Cited by: §1, §1, §2, §3.3.
  • Q. Zhao, Z. Ye, C. Chen, and Y. Wang (2020) Persistence enhanced graph neural network. In International Conference on Artificial Intelligence and Statistics, pp. 2896–2906. Cited by: §2, Table 1, §5.
  • T. Zhou, L. Lü, and Y. Zhang (2009) Predicting missing links via local information. The European Physical Journal B 71 (4), pp. 623–630. Cited by: §1, §2.
  • S. Zhu, S. Pan, C. Zhou, J. Wu, Y. Cao, and B. Wang (2020) Graph geometry interaction learning. Advances in Neural Information Processing Systems 33. Cited by: §2, §5.2, Table 1, §5, A.4.2 Detailed Experiment Settings, A.4.2 Detailed Experiment Settings.

Appendix

In this appendix, we provide (1) additional details for persistence image; (2) additional examples to illustrate the matrix reduction algorithm for extended persistent homology; (3) a complete proof of the correctness of the proposed faster algorithm; and (4) additional experimental details and qualitative results.

A.1 Persistence Image

In recent years, efforts have been made to map persistence diagrams into representations valuable to machine learning tasks. Persistence Image (Adams et al., 2017) is one such approach to convert persistence diagrams to vectors, which will be used in our model. Let

be a linear transformation

of persistent points. Given a persistence diagram , is the transformed diagram. For any , is the 2D Gaussian function with mean and standard deviation .

Let be a non-negative weight function for the persistence plane . Given a persistence diagram , its persistence surface is defined as: . Fix a grid in the plane with n pixels, the persistence image is the collection of pixels where , thus can be directly used in machine learning tasks. The stability of persistence image under perturbation has been proven in (Adams et al., 2017). In our setting, is a piecewise linear weighting function:

Figure 6: An illustration of extended persistent homology. (a) We plot the input graph with a given filter function. The filter value for each node is , , , . (b) The ascending and descending filtrations of the input graph. The bars of brown and blue colors correspond to the life spans of connected components and loops respectively. The first four figures are the ascending filtration, while the last four figures denote the descending filtration. In the ascending filtration, , while in the descending filtration, . (c) In the resulting extended persistence diagram, red and blue markers correspond to 0-dimensional and 1-dimensional topological structures. There are two blue markers, corresponding to two loops , . The range of filter function for these two loops are , respectively. These ranges are encoded as the coordinates of the blue markers.
Figure 7: Reduction matrix for Figure 6

A.2 Examples of the Matrix Reduction Algorithm

The reduction matrix for Figure 6 is shown in Figure 7. Recall that is a binary valued matrix to encode the adjacency relationship between nodes and edges. is a matrix consisting of four matrices: . Every column or row of corresponds to a simplex. In particular, the first columns of correspond to the ascending sequence of simplices . The last columns of correspond to the descending sequence of simplices . The setting is the same for the rows of . Matrix encodes the relationship between all the simplices in the ascending sequence. Similar to the incidence matrix of a graph, iff is the boundary of , i.e., is a node adjacent to the edge .

is defined similarly, except that it encodes the relationship of the simplices in the descending sequence, i.e., iff is the boundary of . stores the permutation that connects the two sequences of simplices, i.e., iff and denote the same simplex. is a zero-valued matrix.

Algorithm 3 reduces the columns of matrix from left to right. If we allow certain flexibility in the reduction ordering, we can separate the algorithm into 3 phases: the reduction of matrix (Phase 1), the reduction of matrix (Phase 2) and the reduction of matrix (Phase 3) (Cohen-Steiner et al., 2009). We define a simplex as positive if its corresponding column is zero after reduction, and negative if its corresponding column is not zero after reduction. In our setting, all the nodes, as well as edges that give rise to a loop are positive simplices, edges that destroy a connected component are negative simplices. All the simplices are either negative or positive (Edelsbrunner et al., 2000; Edelsbrunner and Harer, 2010). Notice that the positive and negative edges in the ascending filtration are not the same as the positive and negative edges in the descending filtration.

In the following paragraph, we will introduce the whole process of the matrix reduction algorithm by introducing the 3 phases successively.

A.2.1 Phase 1

Phase 1 is the matrix reduction for . All the columns of nodes and all the rows of edges are all zero in , therefore they will have no impact on the matrix reduction algorithm. After deleting these rows and columns, matrix is shown below:

For simplicity, we define as the maximum row of node for which . Notice that is originally defined for the row index, and we replace the index with the simplex it represents. i.e., we replace with , and we replace with .

From left to right, we can find that , and , thus we add column of to column of : . Notice that “add” here means the mod-2 sum of the two binary vectors. Thus .

Then we can find that , , we add the column of to the column of : , thus . Again we add the column of to the column of : . Notice here the column value for is the value after matrix reduction. Therefore is not paired.

Similarly, we add the column of and column of to column of and get . Then is not paired. After Phase 1, matrix is shown below. And we can obtain the persistence pair: , , .

Recall that encodes the relationship between all the simplices in the ascending sequence. In the ascending sequence, the filter function for an edge is defined as . Thus we can infer the persistence points from the persistence pairs: , , . We remove the persistence points whose birth time and death time are the same, and get the final persistence point: .

Recall that a simplex is defined as positive if its corresponding column is zero after reduction, and negative if its corresponding column is not zero after reduction. So the positive edges in the ascending sequence are and , the negative edges in the ascending sequence are , , and .

A.2.2 Phase 2

Phase 2 is the matrix reduction for . Notice that we define similar to , and Phase 2 influences not only but also (Cohen-Steiner et al., 2009). All the columns of nodes and all the rows of edges are all zero in . In , all the columns of nodes and all the rows of nodes have no impact to the reduction process and pairing for 1-dimensional topology. Therefore for simplicity, we delete these rows and edges and obtain the condensed matrices and :

From left to right, , , . We add the column of to the column of : . Then , and we add column of to column of : . Therefore is not paired. Notice that when we add column of and column of to column of in , we are also adding these columns in . Consequently the column of in will become: .

Then we continue the matrix reduction algorithm, , and . Then we add column of to column of : , . Again we add column of to column of : . Therefore is not paired, and column of in becomes . Matrix and will therefore become:


Similar to the process to discover the persistence point, negative edges and positive edges in Phase 1. We can find that all the persistence pairs appear and disappear at the same time, thus there is no persistence point. The positive edges in the descending sequence are and . The negative edges in the descending sequence are , , and .

Notice that before Phase 3, all the persistence points are all 0-dimensional persistence points. In other words, they just record the birth and death of connected components. 1-dimensional extended persistence pair will be discussed in Phase 3, where positive edges in the descending filtration will each be paired with an edge in the ascending filtration, thus the saliency of loops can be measured.

A.2.3 Phase 3

Phase 3 is the reduction process of . Only positive descending edges (edges whose column in is 0) will be reduced, thus the value in will not be influenced. Similar to the definition of in Phase 1, we define as the maximum row of edge for which .

From left to right, we only consider the positive descending edges. We find that , . We add column of to column of : .

As a consequence, we can get matrix and after Phase3:

And the extended persistence pair is , . Notice that for a extended persistence pair, the latter edge is the positive edge in the descending filtration, and the former edge is its paired edge in the ascending filtration. i.e, in the extended persistence pair , is the positive edge in the descending filtration, while is the paired edge in the ascending filtration. Recall that for a certain edge, its filter value in the ascending sequence is defined as the maximum value of the filter value of its nodes: . Similarly, . And for a certain edge in the descending filtration, its filter value is defined as the minimum value of the filter value of its nodes: . Similarly, . As a consequence, the extended persistence point are and respectively.

After the whole matrix reduction algorithm, we can get the ordinary persistence diagram for 0-dimensional topological structures (connected components): and the extended persistence diagram for 1-dimensional topological structures (loops): . To get the 0-dimensional extended persistence diagram, the birth and death of the whole connected component is also recorded, that is, the minimum and the maximum filter value . So the 0-dimensional extended persistence diagram is , as shown in Figure 6 (c).

A.3 Correctness of the Faster Algorithm

In this section, we provide complete proof of the correctness of the faster algorithm. For convenience, we restate the matrix reduction algorithms Alg. 3 and the proposed faster algorithm Alg. 4. We also restate the main theorem 2.

Theorem 2.

Algorithm 4 outputs the same extended persistence diagram as Algorithm 3.

To prove Theorem 2, the core of our proof is to show that for each positive descending edge, its corresponding pair found by the new algorithm is equivalent to the pair resulting from the matrix reduction algorithm. We prove this by induction. As we go through all positive descending edges in the descending filtration, we show that for each positive edge, which creates a new loop. Its extended persistence pair from our algorithm is the same as the reduction result.

In Lemma 3, we prove that after reducing the part of the column of a given edge , i.e., when , the remaining entries of in constitute a unique loop in .

1:  Input: filter funtion , graph
2:  Persistence Diagram
3:   build reduction matrix