Graph Partitioning and Graph Neural Network based Hierarchical Graph Matching for Graph Similarity Computation

05/16/2020 ∙ by Haoyan Xu, et al. ∙ Chongqing University Zhejiang University 0

Graph similarity computation aims to predict a similarity score between one pair of graphs so as to facilitate downstream applications, such as finding the chemical compounds that are most similar to a query compound or Fewshot 3D Action Recognition, etc. Recently, some graph similarity computation models based on neural networks have been proposed, which are either based on graph-level interaction or node-level comparison. However, when the number of nodes in the graph increases, it will inevitably bring about the problem of reduced representation ability or excessive time complexity. Motivated by this observation, we propose a graph partitioning and graph neural network based model, called PSimGNN, to effectively resolve this issue. Specifically, each of the input graphs is partitioned into a set of subgraphs to directly extract the local structural features firstly. Next, a learnable embedding function is used to map each subgraph into an embedding vector. Then, some of these subgraph pairs are selected for node-level comparison to supplement the subgraph-level embedding with fine-grained information. Finally, coarse-grained interaction information among subgraphs and fine-grained comparison information among nodes in different subgraphs are integrated to predict the final similarity score. Using approximate Graph Edit Distance (GED) as graph similarity metric, experimental results on graph data sets of different graph size demonstrate PSimGNN outperforms state-of-the-art methods in graph similarity computation tasks. The codes will release when this paper is published.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Graph similarity computation and graph matching techniques have been widely used in various fields, such as recommendation system (Wu et al., 2015)

, computer vision

(Horaud and Skordas, 1989; Pelillo et al., 1999) and so on. Graph Edit Distance (GED) (Bunke, 1983) and Maximum Common Subgraph (MCS) (Bunke and Shearer, 1998) are two common metrics for evaluating how similar the two graphs are. The traditional graph similarity computation methods such as A* (Riesen et al., 2013), Hungarian (Kuhn, 1955; Riesen and Bunke, 2009), VJ (Fankhauser et al., 2011; Jonker and Volgenant, 1987), Beam(Neuhaus et al., 2006), perform directly from the edges and nodes characteristics of the graphs. These exact and approximate algorithms for computing the GED or MCS between two graphs have a high time-complexity, and are hard to be generalized to large graphs in real applications such as graph similarity search. As shown in Figure 1, in the task of graph similarity search, how to retrieve graphs in the graph database that approximately match or similar with the query graph efficiently is challenging but meaningful.

Figure 1. Graph similarity search. Given a database consisting of a set of nodes and edges labeled graphs, graph similarity search aims to find all graphs in that are similar to a user-given query graph .

With the rapid development of deep learning technology, Graph Neural Networks (GNN) provide a new solution for similarity computation and matching of graph structures. Graph deep learning models automatically extract the structural characteristics of the graph through GNN layers, embed each node into a low-dimension vector containing both its own feature information and its local connection relationship information. SimGNN (Bai et al., 2019), GSimCNN(Bai et al., 2018), GMN (Li et al., 2019)

are some representative graph deep learning based models for graph similarity computation. During the training stage, these models fit the GED or MCS ground truth (label) in a supervised learning way and learn a mapping between a pair of graph inputs and the similarity score. When testing or during actual applications, deep learning models are time-efficient compared with traditional graph similarity computation methods. However, these deep learning models still either contain Pairwise Node Comparison process which need at least quadratic time with respect to the number of nodes in two graphs, or they only directly embed the whole graph to a graph-level vector. Embedding the whole graph to a vector and computing the similarity between vectors as the similarity of the corresponding graph pairs is time-efficient but will lose much node-level information. Graph similarity computation is challenging not only because of the demand to perceive the whole graph, but also the need for careful comparison between the information of the nodes.

In this paper, we focus on large graph similarity computation and try to address the following two challenges. First, although many graph similarity computation models based on neural networks have been proposed, few of them can effectively calculate the similarity between graphs with a large number of nodes, and adjust the models according to different tasks to achieve the trade-off between accuracy and time complexity. Second, for a graph with a large number of nodes, the local information of the graph, such as the structural characteristics of the subgraph, should also be directly concerned. However, none of the existing models consider these challenges in detail.

Although large graph similarity computation is still a very challenging issue, we notice that with the advent of ever larger instances in applications such as scientific simulation, social networks, or road networks, graph partition (GP) becomes more and more important, multifaceted, and challenging (Buluç et al., 2016). Inspired by these applications that based on graph partitioning methods, we propose an end-to-end model PSimGNN, i.e., Partition based Similarity Computation via Graph Neural Networks, to address these challenges. First, our model partitions each of the input graph into a set of subgraphs to directly extract the local structural features. Second, a learnable embedding function is used to map every subgraph into an embedding vector. Then coarse-grained similarity computation is conducted by computing the similarity of each pair of subgraphs’ subgraph-level embedding vectors. Third, our model can select some of these subgraph pairs for node-level comparison to supplement the subgraph-level embedding with fine-grained information. Finally, the model integrate coarse-grained interaction information between subgraphs and fine-grained comparison information between nodes in different subgraph pairs to predict the final similarity score. Our model is especially effective in graph structure similarity computation of large graphs, compared with other state-of-the-art graph similarity computation models which need at least quadratic time with respect to the number of nodes in two graphs. Thus our major contributions are:

  • [leftmargin=8pt]

  • We first propose the graph partitioning based framework to address the challenging problem of similarity computation between large graphs. This framework achieves a good trade-off between accuracy and efficiency.

  • We propose a novel model that extract and aggregate local information effectively to conduct subgraph-level comparison. This can resolve the challenges of under representation ability or high time complexity of many graph deep learning based similarity computation models.

  • We conduct extensive experiments on a very popular graph similarity/distance metric, GED, based on datasets of different size. Results from these experiments and theoretical analysis demonstrate the effectiveness and efficiency of PSimGNN model in graph similarity computation tasks.

Figure 2. The general architecture of our model. The red arrows denote the data flow for subgraph-level interaction and the blue arrows denote the data flow for node-level comparison. After the graph partitioning method, only the top (here ) subgraph pairs with the highest similarity scores will conduct the node-level comparison.

2. Related Work

In this section, we introduce the concept of graph partitioning, graph neural network and graph similarity that will be used in this paper.

2.1. Graph Partitioning

Graph partitioning is an effective way for complexity reduction or parallelization (Buluç et al., 2016). In mathematics, a graph partition is the reduction of a graph to a smaller subgraph by partitioning its set of nodes into mutually exclusive groups and the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others (Andreev and Racke, 2006). Recently, the graph partition problem has gained importance due to its applications for clustering and detection of cliques in social, pathological and biological networks (Buluç et al., 2016).

2.2. Graph Neural Networks

Graph Neural Networks (GNNs) are an effective framework for representation learning of graphs, which can directly operate on the graph structure. GNNs follow a neighborhood aggregation scheme, where the representation vector of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes. Many GNN variants have been proposed and have achieved state-of-the-art results on both node and graph classification tasks. However, despite GNNs revolutionizing graph representation learning, there is limited understanding of their representational properties. Studies (Xu et al., 2018) have shown that popular GNN variants (such as graph convolutional networks and GraphSAGE (Hamilton et al., 2017)) have limited discriminative power, and they cannot learn to distinguish certain simple graph structures. Therefore, we use Graph Isomorphism Network (GIN) (Xu et al., 2018) in our experiment that is the most expressive in the GNN class and has the same powerful function as the Weisfeiler Lehman graph isomorphism test (Shervashidze et al., 2011).

2.3. Ged & Mcs

Graph Edit Distance (GED) (Bunke, 1983) can be considered as an extension of the String Edit Distance (Levenshtein, 1966) metric, which is defined as the minimum cost required to convert one graph to another through a sequence graph editing operations. Maximum Common Subgraph (MCS) (Bunke and Shearer, 1998) is equivalent to GED under the same cost function (Bunke, 1997). Both are the two most common ways to calculate the similarity of graphs or the distance between graphs, which is the core operation of graph similarity search and many applications. However, this core operation, computing the GED or MCS between two graphs, is known to be NP-complete (Bunke and Shearer, 1998; Zeng et al., 2009). For a pair of graphs with more than 16 nodes, even the state-of-the-art algorithms cannot reliably compute the exact GED within reasonable time (Blumenthal and Gamper, 2018)

. So, instead of calculating the exact similarity, there are some methods which can find approximate values in a fast and heuristic way. However, these methods usually require complicated design and the time complexity is still sub-exponential or polynomial in the number of nodes in the graphs, such as

Hungarian (Kuhn, 1955; Riesen and Bunke, 2009), VJ (Fankhauser et al., 2011; Jonker and Volgenant, 1987), Beam(Neuhaus et al., 2006), etc.

3. The Proposed Approach: PSimGNN

In this section, we formally define the problem of graph similarity computation, and then introduce the proposed method PSimGNN, i.e., Partition based Similarity Computation via Graph Neural Networks, which is an end-to-end neural network-based method to solve graph similarity computation problem. PSimGNN consists of four parts: (1) graph partitioning; (2) subgraph-level embedding interaction; (3) node-level comparison; (4) graph similarity score computation. An overview of PSimGNN is shown in the Figure 2.

3.1. Problem Definition

We define an undirected and unweighted graph , where is a set of nodes and is a set of edges. Node features , where is the dimension of node feature vectors. We transform GED into a similarity metric ranging between 0 and 1. Our goal is to learn a neural network based function that takes two graphs as input and outputs the similarity score that can be transformed back to GED through a one-to-one mapping.

Figure 3. The workflow of for communities (red, green and blue). Each node assigned to a community is labeled with the density of that community. The update rule is evaluated on each step for the node highlighted in black. Only the schematic results of the first iteration are given here.

3.2. Graph Partitioning

Most neural network based graph similarity computation models use appropriate mechanisms to generate graph-level embeddings and node-level embeddings, and calculate the graph similarity score between different graphs combining a corasen-grained graph-level interaction and a fine-grained node-level comparison. However, for graph with a large number of nodes, these approaches may have several limitations:

  • [leftmargin=8pt]

  • Only graph-level embedding may have limited ability to represent the whole graph, sometimes we have to pay attention to some local structure characteristics.

  • Due to the large number of nodes, the node-level comparison will bring high time complexity and too many matching between nodes far away will also introduce some noise.

To overcome these limitations and better reflect the concept of local structure characteristics of large graphs, we partition a graph into subgraphs using graph partitioning method. In our experiments, the Fluid Communities algorithm (Parés et al., 2017) shows the best performance. The graph partitioning contains three steps:

  • [leftmargin=8pt]

  • Step-1: Choose nodes randomly in the graph as the initial nodes of communities.

  • Step-2: Iterate over all nodes in a random order and update the community of each node based on its own community and the communities of its neighbours.

  • Step-3: Repeat step-2 until convergence.

The entire process can refer to Figure 3. At all times, each community has a total density of 1, which is equally distributed among the nodes it contains. If a node changes its’ community, node densities of affected communities are adjusted immediately. When a complete iteration over all nodes is done, such that no node changes the community it belongs to, the algorithm has converged and returns. Through , we can obtain a series of connected subgraphs (or communities) that can reflect local features, so that the similarity computation at the subgraph-level and node-level can be performed later.

3.3. Subgraph-Level Embedding Interaction

One good graph-level embedding can efficiently preserve the structural information of a graph, and the similarity between two graphs can be computed by interacting the two graph-level embeddings. For graphs with a large number of nodes, by comparing the similarity between different subgraphs generated by some graph partitioning methods (like in our experiment), the local similarity between two large graphs can be better reflected. The entire process involves the following three parts: (1) Subgraph node embedding, which embeds the nodes of each subgraph into vectors, encoding its structural information (2) Subgraph embedding, which embeds each subgraph into one graph-level vector considering the context information through an attention-based node aggregation way; (3) Subgraph-subgraph interaction

, which receives two subgraph-level embeddings and returns the interaction score representing the similarity between subgraphs. Next, these subgraph interaction scores are further reduced to a final similarity score through Multilayer Perceptron, which represent the similarity of the pair of large graphs. And the parameters involved in these three steps can be updated by comparing the final similarity score with the ground truth similarity score in training process.

3.3.1. Part I: Subgraph Node Embedding

Among the existing multiple graph neural network methods, we choose Graph Isomorphism Network (GIN) (Xu et al., 2018) because it can not only efficiently gather information of neighboring nodes like Graph Convolutional Networks (GCN) (Kipf and Welling, 2016; Defferrard et al., 2016) and GraphSAGE (Hamilton et al., 2017), but also learn accurate structural information. The hidden layer can be written as follow:


, where is the -th layer node embedding for the node , means Multilayer Perceptron (Pal and Mitra, 1992), is a learnable parameter and represents the neighbor nodes of node .

For graphs with unlabeled nodes, we treat each node as the same label, thereby obtaining the same constant as the initial representation. After multiple layers of GIN (3 layers in our experiment), the node-level embeddings information will be fed into the attention module as described below.

3.3.2. Part II: Subgraph Embedding

In this model, we use a weighted sum method, where we use attention mechanism to generate subgraph-level embeddings with a weighted sum method. Instead of averaging embeddings of all nodes, or giving each node different weights according to the degree of the node, our attention module focuses more on the nodes that can better represent the whole graph structure information.

After learning the node-level embedding, the node embeddings in subgraph can be expressed as , where represents the number of nodes in the subgraph, and is the dimension of each node embedding. The representation of the whole subgraph information can be written as , which is a non-linear expression of the average value of nodes embedding: , where is a learnable weight matrix to adjust the focus of attention mechanism according to the specific similarity computation task.

Through learning the weight matrix, provides the global structure and feature information of the subgraph suitable for a given similarity measure. Based on , we can calculate an attention weight for each node. For node , in order to make it notice the global information, we take the inner product between its node embedding and

. That is to say, the nodes that are more capable of expressing the features of the graph should be given higher weights. The sigmoid function

is applied to the result to ensure that the attention weight is between (0, 1). Finally, subgraph embedding is the weighted sum of node embedding:


, where represents the dot product between vectors.

3.3.3. Part III: Subgraph-subgraph Interaction

Through the node embedding and attention mechanism mentioned above, we have achieved subgraph-level embedding. Good node embedding and attention mechanisms should embed graphs with similar structures and similar features in similar positions in space, so their distance should be relatively small. Here we use the distance to measure the similarity between a pair of subgraph embedding:


, where is the 2-norm of .

A pair of large graphs are partitioned into subgraphs respectively, and the similarity between two subgraphs between large graphs is calculated using the method mentioned above. After that, similarity scores are obtained, and Multilayer Perceptron () is used to map these scores to the final similarity score to characterize the similarity between the pair of large graphs:


, where is the concatenation operation, represents the similarity score between the pair of large graphs and represents the similarity score between the -th subgraph of and the -th subgraph of .

3.4. Node-Level Comparison

Figure 4. Propagation layer in node-level comparison. Each round of iteration is based on the embeddings of the previous round and the node-level comparison within each graph and between graphs.

Only considering subgraph-level embedding interaction may lose some fine-grained node-level information, so we design the following node-level comparison component utilizing the interaction between nodes in subgraphs.

This component accepts a pair of subgraphs as its input, and calculates the similarity between them through the comparison of nodes within each subgraph and between the pair of subgraphs. An overview of the interaction is shown in the Figure 4, where represents the node embeddings in each subgraph after -th propagation layer. We assume that the input subgraph pair can be represented as , , and their node sets and edge sets are , and , , respectively. After times iterations within the graph and between graphs, the embedding of node can be represented as . In each interaction within the subgraph, the influence of node on node is:


An attention mechanism is used to give different weights to the nodes of another subgraph to indicate the importance of different nodes to nodes :


Through this attention mechanism, we magnify the influence between similar nodes in one pair of subgraphs, and use to represent the interaction between node and node in different subgraphs:


After obtaining the interactive information within each subgraph and between one pair of subgraphs, we merge the -th round propagation node information with it, and then generate the -th round propagation node information:


After iterating through rounds, we get the embedding of each node, denoted as , and then through a self-attention mechanism aggregation layer, we get a subgraph-level embedding:


After obtaining the fine-grained embedding of each subgraph, we use the distance to measure the similarity of one pair of graphs, which is expressed as:


3.5. Graph Similarity Score Computation

It is worth mentioning that through the previous graph partitioning, each large graph is partitioned into subgraphs, and there will be pairs of subgraphs. Here, we sort the subgraph-level similarities obtained before, and only the pairs with top similarity score will perform a node-level comparison. We use to integrate coarse-grained scores and fine-grained scores to finally get the similarity between the large graphs:


After the similarity score, , is predicted, it is compared with the ground truth similarity score,

, using the following mean square error loss function:


,where is the training graph pairs and is the total number of the training graph pairs.

3.6. Time Complexity Analysis

For a pair of input graphs and with , edges and and

nodes separately, we can evaluate the time complexity of several types of models that are commonly used in graph similarity computation. Then we analyze the time complexity of PSimGNN and discuss how it can reduce the time complexity by graph partitioning. Note that due to the fact that there exists a lot of variance for each model, we only use the simplest cases.

3.6.1. Embedding models

Embedding model refers to calculating the similarity between graphs by generating graph-level embeddings. Assuming the simplest case here, we only visit every edge once and deploy two computational operational on the two nodes it connect, which contributes to the feature of local topology. Thus the computation complexity for these cases is .

3.6.2. Matching models

Matching model refers to calculating the similarity between graphs by matching (graph-level interaction or node-level comparison). Assuming the simplest case here, we compute the relationship across and . This part involves computational operations because we have to calculate the connection between every node in to all nodes in

. For the common matching models, both SimGNN and GSimCNN pad fake nodes to the smaller graph at the node-level comparison to emphasize their size difference, and GMN also has the interaction of nodes within each graph, so the final time complexity is


3.6.3. PSimGNN

The time complexity of PSimGNN can be divided into three parts to analyze. (1) Graph Partitioning. In our model, we choose as the graph partitioning method. As analyzed in Section 3.1, it updates node information based on neighbor nodes, or the connected edges of nodes, so it belongs to the fastest and most scalable family of algorithms in the literature with a linear computational complexity of (Parés et al., 2017). Notice that the partitioned subgraphs can be pre-computed and stored, and in the setting of grpah simsilarity search, the unseenquery graph only needs to be partitioned once to obtain its subgraphs. (2) Subgraph-level Embedding Interaction. The time complexoty associated with the generation of node-level and subgraph-level embdeeings is (Kipf and Welling, 2016). Assuming that each graph is partitioned into subgraphs and the embedding dimension at the subgraph-evel is , we use to measure the similarity between embeddings, so the time complexity in the subgraph interaction part is . As mentioned above, the sub-graph level and node-level embedding can also be saved in advance, which greatly saves the time of graph similarity query. (3) Node-level Comparison. According to the similarity socres obtained by subgraph-level interaction, we select top subgraph pairs with the highest similarity scores for node-level comparison. After partitioning, the average number of nodes in each subgraph is or . As analyzed in Section 3.5.2, the average node-level comparison time complexity of one pair of subgraphs is . Since we choose pairs, the total time complexity of this part is or , where the range of belongs to . The parameter

can be used as a hyperparameter to adjust the relationship between the accuracy and time. When

is set to zero, our model only calculates the coarse-grained subgraph similarity. At this time, the time complexity of the model is , where is the number of edges in the large graph. When is set to , our model performs fine-grained similarity calculation for each pair of sub graphs, and the time complexity of the model is , where is the number of nodes in the large graph. For occasions with time requirements, we can only perform coarse-grained matching between subgraphs, and for occasions where accuracy requirements are relatively high, we can perform fine-grained node-level comparison to improve model performance. Therefore, according to specific application scenarios, trade-offs between time and accuracy can be made to choose the best solution.

3.6.4. Similarity search

In similarity search problem, we assume that we have a database consisted of graphs, each of which has nodes and edges, for simplicity. What we need to finish is to compute the similarity between all the graphs in the database and an incoming new graph (also with nodes and edges). In embedding models, we can compute all the feature vectors for graphs in the database at the very beginning. And then when the new graph comes, we encode it to its feature vector and only compute similarity based on the feature vectors. Thus the time consumption is . In matching models, we can only forward pairs of graphs every time because of the computation across graphs. Thus the time consumption is extremely high . And obviously, the time complexity for our framework is . When m is small, the time complexity becomes ; when m is large, the time complexity becomes . This also reflects the adjustability of our model. It is worth mentioning that our model is not suitable for very dense graphs, because it is difficult to get subgraphs that can better reflect local information. So in our discussion, .

Dataset Graph Meaning #Graphs #Pairs Min #Nodes Max #Nodes Avg #Nodes Min #Edges Max #Edges Avg #Edges
AIDS Chemical Compounds 700 490K 2 10 8.90 1 14 8.80
LINUX Program Dependency Graphs 1000 1M 4 10 7.58 3 13 6.94
IMDB Actor/Actress Ego-Networks 1500 2.25M 7 89 13.00 12 1467 65.95
BA-60 Barabási–Albert graph with 60 nodes 200 40K 54 65 59.50 54 66 60.06
BA-100 Barabási–Albert graph with 100 nodes 200 40K 96 105 100.01 96 107 100.56
BA-200 Barabási–Albert graph with 200 nodes 200 40K 192 205 199.63 193 206 200.16
Table 1. Statistics of datasets.
(a) BA-60
(b) BA-100
(c) BA-200
Figure 5. Nodes degree distribution of BA-model datasets.

4. Experiments

4.1. Datasets

In this section, we first introduce a graph similarity computation dataset based on Barabási–Albert preferential attachment model (BA-model) (Jeong et al., 2003), which consists of three sub-datasets: BA-60, BA-100, BA-200, named according to the average number of nodes per graph. And we compare our released BA-model dataset with other well-known datasets used for graph similarity computation.

4.1.1. Barabási–Albert model dataset

Here we introduce the concept of Barabási–Albert model (BA-model), the rules for generating a Barabási–Albert graph (BA-graph), and how our datasets are produced. The BA-model (Jeong et al., 2003) is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the world wide web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes (called hubs) with unusually high degree as compared to the other nodes of the network. The BA-model tries to explain the existence of such nodes in real networks and it incorporates two important general concepts: growth and preferantial attachment, which exist widely in real networks. Growth means that the number of nodes in the network increases over time and preferential attachment means that the more connected a node is, the more likely it is to rceive new links. Nodes with a higher degree have a stronger ability to grab links added to the network.

The BA-model begins with an initial connected network of nodes. New nodes are added to the network one at a time. Each new node is connected to

existing nodes with a probability that is proportional to the number of links that the existing nodes already have. Formally, the probability

that the new node is connected to node is (Albert and Barabási, 2002), where is the degree of node and the sum is made over all pre-existing nodes (i.e. the denominator results in twice the current number of edges in the network). Heavily linked nodes (”hubs”) tend to quickly accumulate even more links, while nodes with only a few links are unlikely to be chosen as the destination for a new link. The new nodes have a ”preference” to attach themselves to the already heavily linked nodes.

Our datasets are made up of some basic graphs and derivative graphs that have been trimmed, which solve several problems:

  • [leftmargin=8pt]

  • When generating a graph with a large number of nodes randomly, there is a high probability that the generated graphs are dissimilar between each other, which result in an uneven similarity distribution after normalization.

  • Due to the large number of nodes in each graph, the approximate GED algorithm cannot guarantee that the calculated similarity can fully reflect the similarity of the graph pairs. We trim and generate derivative graphs while recording the number of trimming steps. These steps and the values calculated by the approximation algorithm take the minimum value as the GED with the basic graph, thereby obtaining a more accurate similarity.

  • By trimming different steps, we can generate graphs with different similarities, which is more conducive to the experiment of graph similarity query.

There are three types of trimming methods here: delete a leaf node, add a node and add an edge. Since deleting an edge may have a greater impact on the result of the generated graph, we will not consider this method. We try to trim the base graph without changing the global features of the base graph to generate more similar graph pairs. In this way, we get three datasets according to the following generation rule.

A BA-graph of nodes is grown by attaching new nodes each with edges that are preferentially attached to existing nodes with high degree. We set to be 60, 100, and 200, respectively, and is fixed to 1, to generate basic graphs. Each sub-dataset generates two basic graphs, and each base graph is trimmed with different GEDs. For each basic graph, generate 99 trimmed graphs in the range of GED 1 to 10. So each sub-dataset consists of two basic graphs and 198 trimmed graphs.

4.1.2. Comparison with Other Datasets

Because in public datasets, such as AIDS (Liang and Zhao, 2017) and LINUX (Wang et al., 2012), the number of nodes in each graph is relatively small and local structures are not obvious, the characteristics of the entire graph can be easily represented. As for IMDB (Yanardag and Vishwanathan, 2015), (named ”IMDB-MULTI”) there are some graphs with a large number of nodes. However, these graphs are relatively dense and too many edges between nodes will make the local structures less obvious. So we did not choose these three datasets.

In view of this, we artificially made three BA-datasets, which not only has a large number of nodes, but also has graphs with obvious local structures by using the characteristics of the BA-model. Table 1 shows the comparison of different datasets for graph similarity computation. Figure 5 shows the nodes degree distribution of BA-model datasets. From these charts, we can see that the ratio of the average number of nodes to the number of edges in the BA-model datasets is approximately equal to 1. Graphs are relatively sparse and suitable for extracting local structural features by graph partitioning. The degree distribution indicates that most nodes have relatively low degrees, and only a few have high degrees. These nodes have a greater probability of becoming the center node of the subgraph. The partitioning results of the two graphs in the BA-60 dataset are shown in Figure 6. Through graph partitioning, obvious local structural features can be extracted, which is also a characteristic of our BA-model datasets.

(a) 20604gexf
(b) 20806gexf
Figure 6. Two examples of graph partition from BA-60 dataset. Different colors represent different subgraphs.
Figure 7. Generate a derivative graph with a GED of 5 from the basic graph. It is not necessary to use all three methods in real trimming. Here is only one case.

4.2. Data Preprocessing

For each dataset, we randomly split 60%, 20% and 20% of all graphs as training set, validation set and test set respectively.

Due to the large number of nodes in our data set, A* (Riesen et al., 2013) algorithm cannot be used to calculate the GED. We used the smallest distance calculated by three well-known approximation algorithms, Hungarian (Kuhn, 1955; Riesen and Bunke, 2009) and VJ (Fankhauser et al., 2011; Jonker and Volgenant, 1987), and Beam (Neuhaus et al., 2006). However, these algorithms are also difficult to ensure a certain accuracy in this case. So we also added the GED value generated when trimming the graph as another evaluation indicator. When each graph is trimmed, we will get a GED value to record the number of trimming steps. In this experiment, every time a leaf node is deleted, the edge it connects will also be deleted. In this case, the GED between the derived graph and the basic graph increases by 2; and every time an edge is added, GED increases by 1.

As shown in Figure 7, when generating a derivative graph with a specific GED from the basic graph, we randomly select among the above three methods (randomly delete a leaf node, add a leaf node or add an edge) to generate a set of operations, and the sum of GED accumulated by all operations is the specific GED value. We take the minimum value of the trimming GED and the calculated GED with these three algorithms as the final GED value. Here, the minimum value is taken instead of the average value because GED is the upper bound. The real GED value must be less than or equal to the GED generated here.

In order to convert the calculated GED into the similarity score required by our model, we first normalize the GED by , where represents the total number of nodes in graph . Then use the exponential function to map the normalized GED to between 0 and 1, to represent the graph similarity of the pair. Here we can see that the more similar the graph, the smaller the GED, and the more similarity tends to 1.

4.3. Baseline Methods

Our baseline includes three categories of methods, fast approximate GED calculation algorithms, graph embedding based models and graph matching network based models.

  • [leftmargin=8pt]

  • The first category of baseline includes three classic algorithms for GED calculation. (1) Hungarian (Kuhn, 1955; Riesen and Bunke, 2009) is a cubic-time algorithms based on the Hungarian Algorithm for bipartite graph matching. (2) VJ (Fankhauser et al., 2011; Jonker and Volgenant, 1987) is also a cubic-time algorithms based on the algorithm of Volgenant and Jonker. (3) Beam search (Beam) (Neuhaus et al., 2006). The equivalent variable of the A* algorithm is sub-exponential time.

  • The second category of baseline includes two graph embedding based models, GCN-Mean and GCN-Max. They all embed graphs into vectors using GCN, and then use the similarities calculated by these vectors as the similarities of these graph pairs.

  • The third category of baseline includes three graph matching network based models. (1) SimGNN (Bai et al., 2019) and (2) GSimCNN (Bai et al., 2018) combine the embedding of the whole graph and node-level comparison. (3) GMN (Li et al., 2019) uses the comparison node information within and between graphs to calculate similarity.

Our method also belongs to the third category of methods, using graph matching based networks to calculate the similarities of graph pairs.

4.4. Parameter Settings

For the architecture of our model, PSimGNN, we partition each large graph into (here =3) subgraphs. Among the 9 subgraph pairs, 0, 3 and 9 subgraph pairs with the highest similarity scores are selected for node-level comparison respectively. Here we call them PSimGNN-up (only subgraph-level interactions are involved in the computation), PSimGNN- ( or 3 pairs of subgraphs are participated in the node-level comparison) and PSimGNN (all or 9 pairs of subgraphs are participated in the node-level comparison).

We set the number of GIN (Xu et al., 2018)

layer to 3, and use Parametric Rectified Linear Unit (PReLU)

(He et al., 2015)

as the activation function. For the initial node representations, we adopt the constant encoding scheme for BA-datasets, since their nodes are unlabeled, as mentioned in Section 3.2.1. The dimensions of the 1st, 2nd and 3rd layer of GIN’s output are 64, 32 and 16, respectively. We use a fully connected layer to reduce dimension of the similarity vectors obtained at the subgraph-level interaction from 9 to 8, and another fully connected layer to change the dimension of the similarity vector after the node-level comparison from 3 to 8. Finally, 4 fully connected layers are used to reduce the dimension of the concatenated results from the subgraph-level interaction and the nodel-level comparison module, from 16 to 8, 8 to 4, 4 to 2, and 2 to 1.

For training, we set the batch size to 128, use the Adam algorithm (Kingma and Ba, 2014) for optimization, and set the initial learning rate to 0.001. We set the number of training iterations to 2000, and choose the best model based on the lowest validation loss.

4.5. Evaluation Metrics

We used two metrics to evaluate the similarity computation results of this model. Mean Squared Error (MSE). MSE measures the average squared difference between all the calculated similarities and the ground-truth similarities. Mean Absolute Error (MAE). MAE measure the averaged value of the absolute deviation of all the calculated similarities from the ground-truth similarities.

For the ranking results, we also use Spearman’s Rank Correlation Coefficient () (Spearman, 1961) and Kendall’s Rank Correlation Coefficient () (Kendall, 1938) to evaluate how well the predicted ranking results match the true ranking results. is computed by taking the intersection of the predicted top results and the ground truth top results divided by . Compared with , and can better reflect the global ranking results instead of focusing on the top results.

Method MSE MAE p@10 p@20
hungarian 18.62 33.22 75.98 57.72 74.25 84.75
vj 25.87 39.48 3.29 2.29 35.00 50.50
beam 5.88 12.93 85.80 74.34 67.75 90.00
GCN-Mean 0.58 5.39 75.64 53.29 58.00 86.88
GCN-Max 1.37 9.14 74.61 52.30 54.50 86.62
SimGNN 0.78 6.58 77.30 56.78 71.00 88.87
GSimCNN 0.60 5.61 80.78 60.47 67.75 90.50
GMN 0.27 3.82 76.36 54.67 60.00 89.00
PSimGNN-up 0.44 4.80 78.92 57.63 59.50 88.37
PSimGNN- 0.32 4.07 80.43 60.31 70.50 88.00
PSimGNN 0.20 3.39 84.49 66.15 78.50 91.87
Table 2. Results on BA-60 dataset (). The best results of the neural network-based models, as well as the traditional methods that exceed these results are bolded.
Method MSE MAE p@10 p@20
hungarian 20.54 34.38 81.10 60.36 61.00 99.00
vj 27.39 40.46 58.37 41.56 46.25 82.62
beam 11.40 20.68 78.67 62.83 62.75 90.00
GCN-Mean 1.25 9.09 76.39 53.38 56.50 100
GCN-Max 1.20 8.54 76.17 53.04 52.50 99.88
SimGNN 0.80 6.93 76.37 53.83 58.00 100.00
GSimCNN 0.23 3.25 82.33 61.69 67.00 100.00
GMN 0.15 2.71 77.22 54.50 53.25 100.00
PSimGNN-up 0.50 4.24 77.71 55.33 53.50 100.00
PSimGNN- 0.12 2.51 79.65 57.81 57.75 100.00
PSimGNN 0.11 2.41 80.14 58.44 61.25 100.00
Table 3. Results on BA-100 dataset ()
Method MSE MAE p@10 p@20
hungarian 25.91 37.94 79.38 58.10 64.25 94.00
vj 31.44 42.68 61.91 43.10 48.50 80.38
beam 18.60 28.79 77.24 65.21 56.00 83.50
GCN-Mean 2.37 12.78 73.47 49.46 50.00 95.00
GCN-Max 2.28 10.76 74.99 51.69 53.75 94.25
SimGNN 0.84 6.19 73.47 48.89 52.75 95.13
GSimCNN 0.32 3.58 79.68 56.82 59.00 95.00
GMN 0.12 2.66 79.58 57.87 60.25 95.00
PSimGNN-up 0.08 4.53 74.95 51.58 46.75 95.13
PSimGNN- 0.07 2.14 76.36 53.29 52.50 96.00
PSimGNN 0.06 1.96 79.16 57.24 55.75 97.63
Table 4. Results on BA-200 dataset ()

4.6. Results and Analysis

The effectiveness results on the three datasets can be found in Table 2, 3 and 4. The two embedding models we use, GCN-Mean and GCN-Max, have worse results than any matching model, which proves that the representation ability of a single embedding is limited. When the number of nodes per graph increases, the limitation of using a vector to characterize the entire graph is more obvious and the results are worse, which also confirms our previous analysis in Section 3.2. PSimGNN-up only uses the subgraph-level embeddings and achieves the same level of evaluation results as other matching models, which proves the effectiveness of our framework. PSimGNN-, which uses

subgraph pairs for node-level comparison, achieves better results than PSimGNN-up on all evaluation metrics. Our model, PSimGNN, consistently achieves the best or second best under most evaluation metrics across the three datasets within the neural network based methods. This implies that our model not only introduces a more flexible framework, but also performs the same level of accuracy as other neural network based models. And as the number of subgraph pairs using node-level comparison increases, the model contain more information and the corresponding evaluation results become better, which is also in line with our expectations.

The ranking results of VJ on the BA-60 dataset is extremely poor, and these three traditional methods have very high and . These results show the limitations of traditional methods for graphs with a large number of nodes. As for the BA-100 dataset, is 100%. This is because when randomly dividing the test dataset of 40 graphs, there are exactly 20 graphs from the basic graph 1, and the other 20 graphs are from the basic graph 2. Being able to distinguish these graphs perfectly also proves the excellent performance of the neural network based models.

PSimGNN does not achieve the best in some metrics in BA-100 and BA-200 datasets. This may be due to the randomness of graph partition. Some subgraphs do not extract the local structural features well when they are partitioned. Improving the stability and accuracy of partition is also what we should do in the future.

5. Conclusion and Future Directions

We are at the intersection of graph neural network, graph similarity computation and graph partition, and taking the first step towards large graph similarity computation, via graph partition and a novel neural network based approach PSimGNN. The central idea of the proposed method is to solve the problem of large graph similarity computation from the perspective of subgraphs, which takes any two graphs as input and outputs their similarity score. The experimental results show that PSimGNN achieves competitive accuracy and time complexity by introducing graph partitioning.

There are several directions to go for the future work: (1) Since the nodes in three BA datasets used in this paper are none-attribute, we also need to find suitable datasets with a large number of nodes and node attributes to verify our model; (2) We should also introduce a mechanism to deal with edge attributes. In chemistry, not only atomic properties, but also bonds of a chemical compound are usually labeled, so it is useful to incorporate edge labels into our model; (3) Given the contraint that the exact GEDS for large graphs cannot be computed, we can only use approximate GED. When the number of graph nodes is larger, the approximation algorithm becomes less accurate. It would be interesting to see how the learned model generalize to larger graphs, which is trained only on the exact GEDS between partitoned subgraphs.


We thank Yunsheng Bai and Derek Xu for valuable discussions.


  • [1] R. Albert and A. Barabási (2002) Statistical mechanics of complex networks. Reviews of modern physics 74 (1), pp. 47. Cited by: §4.1.1.
  • [2] K. Andreev and H. Racke (2006) Balanced graph partitioning. Theory of Computing Systems 39 (6), pp. 929–939. Cited by: §2.1.
  • [3] Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, and W. Wang (2019) Simgnn: a neural network approach to fast graph similarity computation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 384–392. Cited by: §1, 3rd item.
  • [4] Y. Bai, H. Ding, Y. Sun, and W. Wang (2018) Convolutional set matching for graph similarity. arXiv preprint arXiv:1810.10866. Cited by: §1, 3rd item.
  • [5] D. B. Blumenthal and J. Gamper (2018) On the exact computation of the graph edit distance. Pattern Recognition Letters. Cited by: §2.3.
  • [6] A. Buluç, H. Meyerhenke, I. Safro, P. Sanders, and C. Schulz (2016) Recent advances in graph partitioning. In Algorithm Engineering, pp. 117–158. Cited by: §1, §2.1.
  • [7] H. Bunke and K. Shearer (1998) A graph distance metric based on the maximal common subgraph. Pattern recognition letters 19 (3-4), pp. 255–259. Cited by: §1, §2.3.
  • [8] H. Bunke (1983) What is the distance between graphs. Bulletin of the EATCS 20, pp. 35–39. Cited by: §1, §2.3.
  • [9] H. Bunke (1997) On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters 18 (8), pp. 689–694. Cited by: §2.3.
  • [10] M. Defferrard, X. Bresson, and P. Vandergheynst (2016) Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pp. 3844–3852. Cited by: §3.3.1.
  • [11] S. Fankhauser, K. Riesen, and H. Bunke (2011) Speeding up graph edit distance computation through fast bipartite matching. In International Workshop on Graph-Based Representations in Pattern Recognition, pp. 102–111. Cited by: §1, §2.3, 1st item, §4.2.
  • [12] W. Hamilton, Z. Ying, and J. Leskovec (2017) Inductive representation learning on large graphs. In Advances in neural information processing systems, pp. 1024–1034. Cited by: §2.2, §3.3.1.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2015)

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification

    In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §4.4.
  • [14] R. Horaud and T. Skordas (1989) Stereo correspondence through feature grouping and maximal cliques. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (11), pp. 1168–1180. Cited by: §1.
  • [15] H. Jeong, Z. Néda, and A. Barabási (2003) Measuring preferential attachment in evolving networks. EPL (Europhysics Letters) 61 (4), pp. 567. Cited by: §4.1.1, §4.1.
  • [16] R. Jonker and A. Volgenant (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38 (4), pp. 325–340. Cited by: §1, §2.3, 1st item, §4.2.
  • [17] M. G. Kendall (1938) A new measure of rank correlation. Biometrika 30 (1/2), pp. 81–93. Cited by: §4.5.
  • [18] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.4.
  • [19] T. N. Kipf and M. Welling (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §3.3.1, §3.6.3.
  • [20] H. W. Kuhn (1955) The hungarian method for the assignment problem. Naval research logistics quarterly 2 (1-2), pp. 83–97. Cited by: §1, §2.3, 1st item, §4.2.
  • [21] V. I. Levenshtein (1966) Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, Vol. 10, pp. 707–710. Cited by: §2.3.
  • [22] Y. Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli (2019) Graph matching networks for learning the similarity of graph structured objects. arXiv preprint arXiv:1904.12787. Cited by: §1, 3rd item.
  • [23] Y. Liang and P. Zhao (2017) Similarity search in graph databases: a multi-layered indexing approach. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 783–794. Cited by: §4.1.2.
  • [24] M. Neuhaus, K. Riesen, and H. Bunke (2006) Fast suboptimal algorithms for the computation of graph edit distance. In Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp. 163–172. Cited by: §1, §2.3, 1st item, §4.2.
  • [25] S. K. Pal and S. Mitra (1992) Multilayer perceptron, fuzzy sets, classifiaction. Cited by: §3.3.1.
  • [26] F. Parés, D. G. Gasulla, A. Vilalta, J. Moreno, E. Ayguadé, J. Labarta, U. Cortés, and T. Suzumura (2017) Fluid communities: a competitive, scalable and diverse community detection algorithm. In International Conference on Complex Networks and their Applications, pp. 229–240. Cited by: §3.2, §3.6.3.
  • [27] M. Pelillo, K. Siddiqi, and S. W. Zucker (1999) Matching hierarchical structures using association graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 21 (11), pp. 1105–1120. Cited by: §1.
  • [28] K. Riesen and H. Bunke (2009) Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing 27 (7), pp. 950–959. Cited by: §1, §2.3, 1st item, §4.2.
  • [29] K. Riesen, S. Emmenegger, and H. Bunke (2013) A novel software toolkit for graph edit distance computation. In International Workshop on Graph-Based Representations in Pattern Recognition, pp. 142–151. Cited by: §1, §4.2.
  • [30] N. Shervashidze, P. Schweitzer, E. J. Van Leeuwen, K. Mehlhorn, and K. M. Borgwardt (2011) Weisfeiler-lehman graph kernels.

    Journal of Machine Learning Research

    12 (77), pp. 2539–2561.
    Cited by: §2.2.
  • [31] C. Spearman (1961) The proof and measurement of association between two things.. Cited by: §4.5.
  • [32] X. Wang, X. Ding, A. K. Tung, S. Ying, and H. Jin (2012) An efficient graph indexing method. In 2012 IEEE 28th International Conference on Data Engineering, pp. 210–221. Cited by: §4.1.2.
  • [33] B. Wu, J. Xiao, and J. Chen (2015) Friend recommendation by user similarity graph based on interest in social tagging systems. In International Conference on Intelligent Computing, pp. 375–386. Cited by: §1.
  • [34] K. Xu, W. Hu, J. Leskovec, and S. Jegelka (2018) How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826. Cited by: §2.2, §3.3.1, §4.4.
  • [35] P. Yanardag and S. Vishwanathan (2015) Deep graph kernels. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374. Cited by: §4.1.2.
  • [36] Z. Zeng, A. K. Tung, J. Wang, J. Feng, and L. Zhou (2009) Comparing stars: on approximating graph edit distance. Proceedings of the VLDB Endowment 2 (1), pp. 25–36. Cited by: §2.3.