Hierarchical and Fast Graph Similarity Computation via Graph Coarsening and Deep Graph Learning

05/14/2020 ∙ by Haoyan Xu, et al. ∙ 5

In this work, we are interested in the large graph similarity computation problem, which is one of the most important graph-based problems. Traditional techniques to compute the exact or approximate values of Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) require at least polynomial time complexity over node numbers thus are not able to handle this problem when the numbers of nodes are large. Recently the develop of deep learning techniques provide a promising solution for this problem by training a network which is able to encode graphs to feature vectors and then compute similarity based on feature vectors. However, when we look into these techniques and classify them to embedding models and matching models, problems arise. Embedding models can be quite fast but perform poorly due to the lack of interaction across graphs while matching models involve this for much better performance but satisfy far more on time consumption. Similar to the process of large biological molecular identification, where we first maps the whole molecular to molecular groups and then identify them based on the "abstracted smaller molecular", the feature aggregation across two whole graphs is always redundant, especially when the number of nodes is large. Thus we here present a novel framework for large graph similarity computation problem. We first embed and coarsen the large graphs to "abstracted smaller graphs" with denser local topology, similar to molecular groups in biological concept. Then we aggregate both the internal features in "abstracted smaller graphs" and external features across "abstracted smaller graph pair", leading to feature vectors for each graph, with which we calculate the final similarity score. Experiments demonstrate that our proposed framework outperforms state-of-the-art methods in graph similarity computation tasks and has significant improvement in time efficiency.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With strong representative abilities, graphs have a broad range of applications in a wide variety of fields, including social network study, computational chemistry gilmer2017neural , and biomedical image analysis ktena2017distance . Recently, especially with the development of deep learning technique, there aroused surging interests on graph-related problems including graph similarity computation, one of the fundamental challenges.

Traditionally, Graph Edit Distance (GED) 6313167 and Maximum Common Subgraph (MCS) BUNKE1998255 are two dominant metrics developed for graph similarity computation problem. GED is the least number of elementary operations (nodes or edges insertion, removal, substitution) to transform one graph to the other. MCS evaluates graph similarity with an induced subgraph of the pair of graphs with as many vertices as possible (Maximum common induced subgraph) or a subgraph isomorphic to the pair of graphs with as many edges as possible (Maximum common edge subgraph). Existing algorithms for GED or MCS computation can be divided into two classes. The first one includes algorithms like A* for the exact values. The exact values for sure help us better evaluate the similarity between graphs, however, it is indeed an NP-hard problem and requires exponential time complexity. The second class, which includes algorithms like Beam, VJ and Hungarian, only computes the approximate values and saves time in return. However, these algorithms still run with polynomial or even sub-exponential time complexity.

To date, graph neural networks (GNNs) have been proved to be potential data-driven solution for various graph-related tasks with relatively high accuracy and far lower time cost in inference. Various graph convolution layers like GCN

NIPS2016_6081 , GAT velivckovic2017graph and GIN Xu2018HowPA are proposed and demonstrated powerful in embedding node features with both original labels and local topology. Also, as the structure of graph is quite different from that of image, advanced pooling techniques on graphs are developed, including TopKPooling gao2019graph , SAGPool lee2019self , DiffPool ying2018hierarchical and MemoryPool Khasahmadi2020Memory-Based .

When we look at deep learning techniques in graph similarity computation problem, there always involves two stages: embedding, which map graphs to feature vectors and similarity computation based on these feature vectors. And we can clearly classify these techniques into two categories with the different methods in first stage as illustrated in fig. 1. The first category, exemplified by Hierarchically Mean and Hierarchical Max defferrard2016convolutional , directly maps graphs to feature vectors by hierarchically coarsening the graphs while the second category embraced by GMNs li2019graph embeds pair of graphs at the same time with a cross-graph matching mechanism. Other techniques like simGNN bai2018simgnn and GRAPHSIM bai2020learning are similar to the second category except that them interactively embed two graphs to one feature vector but not two for similarity computation. For simplicity, in the remaining content of this paper we will call the first category embedding models and the second matching models.

On the one hand, embedding models can be quite faster than matching models at inference stage. Assuming we have graphs, embedding models first embed all graphs to feature vectors for times and only compute similarity times based on feature vectors while matching models have to forward whole graphs times. Taking graph similarity search problem as another example, in which we need to compute the similarity of a new graph with all graphs in our database, embedding models can save the feature vectors of graphs in the database in advance, map the new graph to its feature vector and compute similarity only with feature vectors. Meanwhile matching models need to forward the whole new graphs with all the other graphs in the database every time when there comes a new graph, which is absolutely time-consuming. Details about time complexity analysis will be shown in Section 3. On the other hand, as the feature vectors generated by embedding models do not involve interaction between graphs, the performance of embedding models are worse than matching models.

Intuitively, matching across two whole graph is always redundant, especially when we talk about large graphs, like large biological or chemical molecular, where similarity between graphs largely depends on "abstracted smaller graphs", or we can call them molecular groups in biology and chemistry. Thus as illustrated in fig. 2, we propose our hierarchical matching-based graph similarity computation model, in which graph embedding layers and pooling layers are applied to encode and coarsen the graphs to "abstracted smaller graphs" and then matching models are used on "abstracted smaller graph pairs" to compute similarity between graphs. In the first two stages of our model, there is no interaction across graphs so this part can be finished in advance in inference time or similarity search problem in order to save time. And the matching model part guarantees the performance, which is even slightly better than to directly apply the matching models.

Thus we highlight our main contributions as follows:

  • We propose the first framework, which makes a trade off between embedding and matching models, to address the challenging problem of similarity computation between large graphs.

  • We provide an improved graph pooling layer based on Memory-based Pooling Layer introduced in Khasahmadi2020Memory-Based and achieve significantly better performance on graph similarity computation problem.

  • Our framework shows significant improvement in time complexity as compared to matching models and achieves even slightly better performance than matching models, thus far better performance than embedding models.

  • We conduct extensive experiments on both real graph datasets and self-built datasets consisted of large graphs, which will be detailed in Section 4 to demonstrate the effectiveness and efficiency of our proposed model.

The remaining of this paper will be structured as follows. In Section 2, we introduce our model in detail. Then we conduct analysis about time complexity of our proposed model and the former algorithm in Section 3. Details about experiments are respectively shown in Section 4. And we make a conclusion in Section 5.

2 Proposed Framework

We will use bold font for matrix and tilt font for function in this part. The right superscript of a matrix stands for graph number indicator and the right subscript stands for the stage of the matrix. For example, means the first input graph and means the second graph after encoding.

As illustrated in fig. 2, there are three stages in our proposed model: encoding, coarsening (or say pooling) and matching. Assuming we get two input node sets denoted as and (the number of nodes in each graph is always different) as well as the adjacency matrix and , the whole model can be described as below. Note that in each stage, we first give the overall equation on how the stage works and then make detailed explanations on the overall equation.

Figure 2:

An overview of our model. Given two input graphs, first encode them respectively to obtain topological information. Then generate multiple memory heads by linear transformation, which will be aggregated with a convolution operation and get coarsened graphs. Using cross-graph mechanism, we implement fine-grained comparison and each graph’s feature vector is calculated and then fed into cosine similarity computation to generate final similarity score.

2.1 Encoding

In the first stage, we employ encoding layers, , on the two input graphs respectively for embedding feature of nodes and transform the dimension of two node set respectively to and , just as shown in equation 1 below:


where and is consisted of a graph convolution layer and a non-linear activation as equation 2 shows.


We use the classical for here and graph embedding layers can be GCN defferrard2016convolutional , GAT velivckovic2017graph or GIN Xu2018HowPA .

2.2 Coarsening (or Pooling)

What this stage does is to pool the encoded graphs, and , to "abstracted smaller graphs", and . As mentioned before, our pooling layer is inspired by Memory-based Pooling Layer introduced in Khasahmadi2020Memory-Based and we make significant improvement over their structure. The overall transformation is shown in equation 3, where is an assignment matrix and is a trainable matrix standing for a linear transformation.


We will go through how we calculate the assignment matrix . Similar to the mechanism in Khasahmadi2020Memory-Based , we first generate memory heads (or say memory keys, standing for multiple batches of centroids for pooling operation) . Different from Khasahmadi2020Memory-Based , which proposes to randomly generate the memory heads at the very beginning and make them trainable, we apply linear transformation to transform to a matrix of and then reshape it to , as shown in equation 4. That is to say that our generation of memory keys is dependent on the input node set of the pooling layer, which makes more sense.


Then we compute the respective relationship between memory head and . This portion is also different from Khasahmadi2020Memory-Based , where a normalized Student’s t-distribution is deployed. We here apply a normalized reciprocal of cosine similarity here, as described in equation 5, where is a very small number for avoiding the division by zero.


We finally aggregate the information of relationship . As equation 6 shows, we concatenate and perform a trainable weighted sum ( convolution) () and a softmax activation () to the concatenated matrix, leading to our assignment matrix .


2.3 Matching

This part similar to Graph Matching Network li2019graph aims to compute similarity based on the interaction of two coarsened graphs. There are three propagators and one aggregator to transform the pooled graphs to seperate feature vectors (D is the dimension for final feature vector). Cosine similarity computation is applied subsequently for the final similarity calculation. Equation 7 is an overall description of this part.


Here in the propagators , we first propagate features inside respective graph with and across graphs with and then apply to embrace all these features including the original input. As shown in equation 8, an extract function is first applied on , to extract node pairs in single graph with edge indicated by , generating matrix . Then , which involves linear transformation layers , is deployed on . in equation 9 first generate a relationship mask the relationship matrix across the graph pair with , which simply involve matrix multiply for the normalized graphs and a softmax activation (), and then apply the mask to and subtract with the masked . After these two steps, we again concatenate all the feature we have and apply linear transformation layers in .


As for the aggregator, we use the following module proposed in (Li et al., 2015).


3 Time Complexity Analysis

For a pair of input graphs and , we can assume that they separately have , edges and the final embedded feature vectors for both graphs are and . Then we can analyse the time complexity over , and

. Note that due to the fact that there exists a lot of variance for each model, we here analyse the simplest cases for each category and the real time consumption will be presented later in the

Section 4. In the first three parts here, we discuss the time consumption from each input graph to its feature vector for embedding models, matching models and our framework. Finally in the fourth part, we use similarity search as an example to specify the time consumption comparisons in applications.

3.1 Embedding models

Considering the simplest case here for embedding models, we only visit every edge once and deploy two computational operational on the two nodes it connect, which contributes to the feature of local topology. Thus the computation complexity for these cases is .

3.2 Matching models

Assuming the simplest case here for matching models, we first compute the relationship across and . This part involves computational operations because we have to calculate the connection between every node in to all nodes in . Then for each input graph, we also visit every edge once and deploy two computational operational on the two nodes it connect, which also makes contribution to the feature of local topology. Thus the computation complexity for these cases is .

3.3 Our framework

For our framework, we take the denotation in Section 2, that is the number of nodes in coarsened graphs is . In the embedding stage, we similarly have to visit every edge once and compute twice. Then in the pooling part, we make an transformation from dimensional space to dimensional space, which costs us computational operations. Finally in the matching stage, like what have been analysed in the previous section, we need operations. Thus the resulting complexity is .

3.4 Graph similarity search

In similarity search problem, we assume that we have a database consisted of graphs, each of which has nodes and edges, for simplicity. What we need to finish is to compute the similarity between all the graphs in the database and an incoming new graph (also with nodes and edges). In embedding models, we can compute all the feature vectors for graphs in the database in advance. And then when the new graph comes, we encode it to its feature vector, which cost us , and only compute similarity based on the feature vectors. Thus the time consumption is . In matching models, we can only forward pairs of graphs every time because the computation across graphs can not be finished at the very beginning. Thus the time consumption is extremely high at . Then the time complexity for our framework is . In our settings, is around , thus our framework costs far less time than matching models do, especially when the node number of the graphs is very large.

4 Experiments and Results

4.1 Dataset Information

Currently the number of nodes is relatively small in each graph in most of the existing graph datasets used in graph similarity computation, such as AIDS, LINUX, etc. Thus the characteristics of the entire graphs can be easily characterized and there is not that much need to extract "abstracted smaller graphs" by our coarsening step for better similarity computation results. From this point of view, we here use two types of datasets for evaluating the performance of our framework. The first one is relatively large graphs (with 15 or more nodes) in IMDB dataset. The second one is graphs created artificially with more nodes. We will show how we process the IMDB dataset and how we create the artificial dataset in the following parts.

4.1.1 Processing IMDB dataset

In this paper, we focus on similarity computation of large graphs, so we filter the original IMDB dataset and choose all the graphs that have 15 or more nodes. The new dataset is called IMDB-L.

4.1.2 Artificial dataset information

To generate an artificial dataset, we need to first generate graphs and then give ground truth similarity for graph pairs. If we generate graphs with a large number of nodes randomly, all these graphs might be highly dissimilar to each other thus the similarity distribution of the graph pairs in the dataset will be uneven (only a very small portion of graph pairs are similar). To make the similarity distribution more uniform, we first generate a small number of basic graphs, and then generate some more similar derived graphs by pruning the basic graphs. We apply two different categories of rules here for generating the datasets: Barabási–Albert preferential attachment model (BA model) jeong2003measuring and Erdős-Rényi graph (ER model) erdos1959random bollobas2001random . The detailed description about this two models will be presented respectively in appendix B. We here generate 4 datasets here: BA-60, BA-100, BA-200 and ER-100, where the first two characters stand for generation rule and behind is a number n standing for the approximate number of nodes in the graph in that dataset. As for the ground truth calculation, A star algorithm cannot be used to calculate the ged distance because of the large number of nodes in our dataset. While there exists three well-known approximation algorithms, Hungarian and VJ, and Beam, they are not able to ensure the accuracy. Thus we propose to use the minimum among four evaluation indicators: the three values calculate respectively by Hungarian, VJ, and Beam and the GED value we obtain while generating the graph, which will be detailed in appendix C. Note that we take the minimum value here but not the average value due to the fact that GED is the upper bound and the real GED value must be less than or equal to the GED generated here. We finally convert this minimum indicator to the similarity score with a normalization of the minimums followed by a exponential function to map the normalized value to . This conversion is shown in equation 12, where represents the total number of nodes in graph .

Method BA-60 BA-100 BA-200
hungarian 186.19 332.20 >100 205.38 343.83 >100 259.06 379.38 >100
vj 258.73 294.83 >100 273.94 404.61 >100 314.45 426.86 >100
beam 59.14 129.26 >100 114.02 206.76 >100 186.03 287.88 >100
GCN-Mean 5.85 53.92 2.10 12.53 90.88 2.37 23.66 127.82 2.87
GCN-Max 13.66 91.38 2.11 12.04 85.44 2.34 22.77 107.58 2.96
SimGNN 8.57 63.80 4.68 6.23 47.80 5.19 3.06 32.77 8.41
GSimCNN 5.97 56.06 3.31 2.28 32.54 4.37 3.16 35.88 8.25
GMN 2.66 38.20 9.23 1.47 27.06 14.98 1.17 26.63 45.71
CoSimGNN 1.83 31.55 4.57 1.03 23.16 4.86 0.50 17.44 5.72
CoSim-CNN 2.50 35.53 3.18 1.49 27.20 3.45 0.53 18.44 4.27
CoSim-Mem 3.14 39.70 4.46 1.68 29.45 4.74 0.41 16.18 5.46
  • The results for MSE and MAE are in

  • The results for time consumption over the whole test set is in seconds.

Table 1: Results on BA-model datasets
Method ER-100 IMDB-L
hungarian 236.15 421.66 >100 2.67 16.60 >100
vj 275.22 463.53 >100 7.68 22.73 >100
beam 104.73 226.42 >100 0.39 3.93 >100
GCN-Mean 16.58 92.98 2.42 22.17 55.35 9.72
GCN-Max 79.09 211.07 2.38 47.14 123.16 9.10
SimGNN 6.37 45.30 5.53 7.42 33.74 25.05
GSimCNN 3.53 35.59 4.57 5.01 30.43 18.26
GMN 2.97 29.58 16.00 3.34 23.50 57.15
CoSimGNN 1.16 26.11 4.82 2.29 21.78 17.52
CoSim-CNN 2.78 33.36 3.50 10.37 38.03 12.77
CoSim-Mem 1.94 31.22 4.80 5.12 29.46 17.68
  • The results for MSE and MAE are in

  • The results for time consumption over the whole test set is in seconds.

Table 2: Results on ER-model and IMDB-L datasets()

4.2 Experiment Settings

4.2.1 Data splitting

For each dataset, we randomly split 60%, 20% and 20% of all graphs as training set, validation set and test set respectively at the beginning and then fix these three parts through the whole process of experiments.

4.2.2 Evaluation Metrics

We apply six metrics to evaluate all the models: TIME, Mean Squared Error (MSE), Mean Absolute Error (MAE), Spearman’s Rank Correlation Coefficient () spearman1961proof , Kendall’s Rank Correlation Coefficient () kendall1938new and Precision at k (p@k). TIME is least time a model need to compute similarity over the whole test set. For embedding model, we first map all the graphs to feature vectors and then only compute similarity based on them. For our model, we first map all the graphs to coarsened graphs and every time forward a pair of coarsened graphs for matching stage. And we simply forward all pairs of graphs for matching models because interaction across whole graphs is required. MSE measures the average squared difference between the predicted similarities and the ground-truth similarities. and also measures how well the predicted results match the ground-truth. We compute the intersection of predicted top k similar results and the ground-truth top k similar results and divide it by k for p@k. Due to the limited length of this article, we present the results of MSE, DEV and TIME in the major part of this paper and the results for , and p@k will be shown in the appendix A.

4.2.3 Experiment Device

Intel(R) i5-8600K@3.60GHz cpu MHz 800.015, 6 cores

4.2.4 Baseline

There are three types of baselines involved in this part. The first category is traditional methods for GED computation, where we include A*-Beamsearch (Beam) 10.1007/11815921_17 , Hungarian kuhn1955hungarian riesen2009approximate , and Vj jonker1987shortest 10.1007/978-3-642-20844-7_11 . Beam is one of the variants of A* algorithm and its time complexity is sub-exponential. Hungarian, which is based on the Hungarian Algorithm for bipartite graph matching, and Vj based on the Volgenant and Jonker algorithm, are two algorithms in cubic-time. The second category is the embedding models. We include GCN-Mean and GCN-Max defferrard2016convolutional here. The third category is the matching models. We here involve SimGNN bai2018simgnn , GSimCNN bai2018convolutional and GMN li2019graph .

4.2.5 Setting in our proposed framework

We here provide experiments on three kinds of variants of our framework to demonstrate its ability in graph similarity computation problem. CosimGNN stands for our most powerful model, in which we first apply GIN Xu2018HowPA in the embedding stage, then our proposed pooling layer and finally our proposed matching mechanism. Cosim-Mem is a model set to compare the performance of the MemPool Khasahmadi2020Memory-Based and our proposed pooling layer. Thus the only difference between the Cosim-Mem and CosimGNN is that the pooling layer of Cosim-Mem is MemPool Khasahmadi2020Memory-Based instead of our proposed pooling layer. And finally, we propose a fastest model Cosim-CNN, in which we deploy GIN Xu2018HowPA in the embedding stage then our proposed pooling layer and finally the matching mechanism in bai2018convolutional . And the parameters in our framework is as below: , (same as all baselines), , , , and .

4.2.6 Training settings

On BA datasets, we train each model for 2000 iterations. On ER and IMDB-L dataset, we train each model for 10000 and 5000 iterations respectively because it is more difficult for all the models to obtain a good approximation on these two datasets. And batch size is set to 128 for all models.

4.3 Results and Analysis

Statistic results are shown respectively in Table 1 and 2. We here highlight best results respectively among four different categories: traditional methods, embedding models, matching models and our proposed models, for better understanding the results. We can find that the our framework always outperforms all the traditional methods, embedding models and matching models on MSE and DEV. And the least time consumption among three categories are highlighted in Table 1 and 2. We can find that our CoSim-CNN runs faster than any matching models. When we take a comparisons among BA datasets, we can find that our time consumption keeps low as the node number increases. And the gaps on time consumption between our models and matching models keep increasing. Then we can take a look at the Cosim-Mem and CosimGNN models and we will find out that in four out of the all 5 datasets, our pooling layer performs better than the Mem-Pool layer Khasahmadi2020Memory-Based .

5 Conclusion

In this paper, we propose a framework for large graphs similarity computation, which first embeds graphs to a feature space, then coarsen (pool) the graphs and then compute the final feature vectors with interaction across graphs. We conduct experiments on various baselines and evaluation metrics to demonstrate that our novel framework not only makes a trade off between embedding models and matching models and reduce the time consumption of matching models, but it is also able to achieve highest accuracy among current graph similarity computation techniques.


  • [1] Réka Albert and Albert-László Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002.
  • [2] Yunsheng Bai, Hao Ding, Song Bian, Ting Chen, Yizhou Sun, and Wei Wang. Simgnn: A neural network approach to fast graph similarity computation, 2018.
  • [3] Yunsheng Bai, Hao Ding, Ken Gu, Yizhou Sun, and Wei Wang. Learning-based efficient graph similarity computation via multi-scale convolutional set matching.
  • [4] Yunsheng Bai, Hao Ding, Yizhou Sun, and Wei Wang. Convolutional set matching for graph similarity. arXiv preprint arXiv:1810.10866, 2018.
  • [5] Béla Bollobás and Bollobás Béla. Random graphs. Number 73. Cambridge university press, 2001.
  • [6] Horst Bunke and Kim Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(3):255 – 259, 1998.
  • [7] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 3844–3852. Curran Associates, Inc., 2016.
  • [8] Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems, pages 3844–3852, 2016.
  • [9] Paul Erdos. On random graphs. Publicationes mathematicae, 6:290–297, 1959.
  • [10] Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. Speeding up graph edit distance computation through fast bipartite matching. In Xiaoyi Jiang, Miquel Ferrer, and Andrea Torsello, editors, Graph-Based Representations in Pattern Recognition, pages 102–111, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg.
  • [11] Hongyang Gao and Shuiwang Ji. Graph u-nets, 2019.
  • [12] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry, 2017.
  • [13] Hawoong Jeong, Zoltan Néda, and Albert-László Barabási. Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4):567, 2003.
  • [14] Roy Jonker and Anton Volgenant. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38(4):325–340, 1987.
  • [15] Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
  • [16] Amir Hosein Khasahmadi, Kaveh Hassani, Parsa Moradi, Leo Lee, and Quaid Morris. Memory-based graph networks. In International Conference on Learning Representations, 2020.
  • [17] Sofia Ira Ktena, Sarah Parisot, Enzo Ferrante, Martin Rajchl, Matthew Lee, Ben Glocker, and Daniel Rueckert. Distance metric learning using graph convolutional networks: Application to functional brain networks, 2017.
  • [18] Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
  • [19] Junhyun Lee, Inyeop Lee, and Jaewoo Kang. Self-attention graph pooling. arXiv preprint arXiv:1904.08082, 2019.
  • [20] Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, and Pushmeet Kohli. Graph matching networks for learning the similarity of graph structured objects, 2019.
  • [21] Michel Neuhaus, Kaspar Riesen, and Horst Bunke. Fast suboptimal algorithms for the computation of graph edit distance. In Dit-Yan Yeung, James T. Kwok, Ana Fred, Fabio Roli, and Dick de Ridder, editors, Structural, Syntactic, and Statistical Pattern Recognition, pages 163–172, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg.
  • [22] Kaspar Riesen and Horst Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing, 27(7):950–959, 2009.
  • [23] A. Sanfeliu and K. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13(3):353–362, 1983.
  • [24] Charles Spearman. The proof and measurement of association between two things. 1961.
  • [25] Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. arXiv preprint arXiv:1710.10903, 2017.
  • [26] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? ArXiv, abs/1810.00826, 2018.
  • [27] Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. Hierarchical graph representation learning with differentiable pooling, 2018.

Appendix A Results for , and p@k

Method BA-60 BA-100
p@10 p@20 p@10 p@20
hungarian 0.5772 0.7598 0.7425 0.8438 0.6036 0.8110 0.6100 0.9900
vj 0.0229 0.0329 0.3500 0.5050 0.4156 0.5837 0.4625 0.8262
beam 0.7434 0.8580 0.6775 0.9000 0.6283 0.7867 0.6275 0.9000
GCN-Mean 0.5329 0.7564 0.5800 0.8688 0.5338 0.7639 0.5650 1.0000
GCN-Max 0.5230 0.7461 0.5450 0.8662 0.5304 0.7617 0.5250 0.9987
SimGNN 0.5678 0.7730 0.7100 0.8887 0.5383 0.7637 0.5800 1.0000
GSimCNN 0.6047 0.8078 0.6775 0.9050 0.6169 0.8232 0.6700 1.0000
GMN 0.5467 0.7636 0.6000 0.8900 0.5450 0.7722 0.5325 1.0000
CoSimGNN 0.6369 0.8273 0.6950 0.9112 0.6168 0.8210 0.6850 1.0000
CoSim-CNN 0.6059 0.8057 0.7475 0.9050 0.6335 0.8332 0.7100 1.0000
CoSim-Mem 0.5727 0.7881 0.6050 0.8950 0.5585 0.7840 0.5325 1.0000
Table 3: Results on BA-model datasets
Method BA-200 ER-100
p@10 p@20 p@10 p@20
hungarian 0.5810 0.7938 0.6425 0.9400 0.3757 0.5404 0.6600 0.7400
vj 0.4310 0.6191 0.4850 0.8037 0.2876 0.4030 0.5950 0.6637
beam 0.6521 0.7724 0.5600 0.8350 0.4255 0.5261 0.5800 0.6937
GCN-Mean 0.4946 0.7347 0.5000 0.9500 0.4708 0.6704 0.5175 0.7862
GCN-Max 0.5169 0.7499 0.5375 0.9425 0.4313 0.6244 0.5150 0.7738
SimGNN 0.4889 0.7347 0.5275 0.9513 0.5848 0.7212 0.6575 0.8177
GSimCNN 0.5682 0.7968 0.5900 0.9500 0.6085 0.8047 0.6950 0.8500
GMN 0.5787 0.7958 0.6025 0.9500 0.6471 0.8256 0.8000 0.8338
CoSimGNN 0.5814 0.8010 0.6200 0.9500 0.6141 0.8154 0.7650 0.8325
CoSim-CNN 0.6020 0.8092 0.6850 0.9512 0.5998 0.8012 0.7050 0.8150
CoSim-Mem 0.5857 0.8006 0.5925 0.9512 0.5820 0.7787 0.6000 0.8212
Table 4: Results on BA-model datasets
Method IMDB-L
p@10 p@20
hungarian 0.8318 0.9309 0.7487 0.8092
vj 0.8334 0.9332 0.7460 0.8151
beam 0.9040 0.9601 0.9092 0.9072
GCN-Mean 0.3620 0.4664 0.4868 0.7026
GCN-Max 0.1736 0.2462 0.3908 0.4460
SimGNN 0.3935 0.5270 0.5566 0.6158
GSimCNN 0.4987 0.6626 0.6210 0.6414
GMN 0.5498 0.6866 0.6732 0.6908
CoSimGNN 0.5709 0.7164 0.6671 0.7039
CoSim-CNN 0.5499 0.6855 0.6421 0.7032
CoSim-Mem 0.4777 0.6141 0.5697 0.6414
Table 5: Results on BA-model datasets

Appendix B BA and ER model

b.1 Barabási–Albert preferential attachment model.

Here we introduce the concept of BA-model, the rules for generating a BA-graph, and how our datasets are produced. The Barabási–Albert (BA) model jeong2003measuring is an algorithm for generating random scale-free networks using a preferential attachment mechanism. Several natural and human-made systems, including the Internet, the world wide web, citation networks, and some social networks are thought to be approximately scale-free and certainly contain few nodes (called hubs) with unusually high degree as compared to the other nodes of the network. The BA-model tries to explain the existence of such nodes in real networks and it incorporates two important general concepts: growth and preferantial attachment, which exist widely in real networks. Growth means that the number of nodes in the network increases over time and preferential attachment means that the more connected a node is, the more likely it is to rceive new links. Nodes with a higher degree have a stronger ability to grab links added to the network.

The BA-model begins with an initial connected network of nodes. New nodes are added to the network one at a time. Each new node is connected to

existing nodes with a probability that is proportional to the number of links that the existing nodes already have. Formally, the probability

that the new node is connected to node is albert2002statistical , where is the degree of node and the sum is made over all pre-existing nodes (i.e. the denominator results in twice the current number of edges in the network). Heavily linked nodes ("hubs") tend to quickly accumulate even more links, while nodes with only a few links are unlikely to be chosen as the destination for a new link. The new nodes have a "preference" to attach themselves to the already heavily linked nodes.

Our dataset is made up of some basic graphs and derivative graphs that have been trimmed, which solve several problems:

  • When generating a graph with a large number of nodes randomly, there is a high probability that the generated graphs are dissimilar between each other, which result in an uneven similarity distribution after the normalization described in equation 12.

  • Due to the large number of nodes in each graph, the approximate GED algorithm cannot guarantee that the calculated similarity can fully reflect the similarity of the graph pairs. We trim and generate derivative graphs while recording the number of trimming steps. These steps and the values calculated by the approximation algorithm take the minimum value as the GED with the basic graph, thereby obtaining a more accurate similarity.

  • By trimming different steps, we can generate graphs with different similarities, which is more conducive to the experiment of graph similarity query.

There are three types of trimming methods here: delete a leaf node, add an edge, and add a node. Since deleting an edge may have a greater impact on the result of the generated graph, we will not consider this method. We try to trim the base graph without changing the global features of the base graph to generate more similar graph pairs. In this way, we get three datasets according to the following generation rule.

A BA-graph of nodes is grown by attaching new nodes each with edges that are preferentially attached to existing nodes with high degree. We set to be 60, 100, and 200, respectively, and is fixed to 1, to generate basic graphs. Each sub-dataset generates two basic graphs, and each base graph is trimmed with different ged distances. For each basic graph, generate 99 trimmed graphs in the range of ged distance 1 to 10. So each sub-dataset consists of two basic graphs and 198 trimmed graphs.

b.2 Erdős-Rényi graph or a binomial graph model.

In game theory, Erdős-Rényi model stands for either of two similar models for generating random graphs, named respectively after mathematicians Paul Erdős and Alfréd Rényi. The definition of these two models are as follows:

  • In the model, a graph is chosen uniformly at random from the collection of all graphs which have n nodes and M edges. For example, in the G(3, 2) model, each of the three possible graphs on three vertices and two edges are included with probability 1/3.

  • In the G(n, p) model, a graph is constructed by connecting nodes randomly. Each edge is included in the graph with probability p independent from every other edge. Equivalently, all graphs with n nodes and M edges have equal probability of . The parameter p in this model can be thought of as a weighting function; as p increases from 0 to 1, the model becomes more and more likely to include graphs with more edges and less and less likely to include graphs with fewer edges. In particular, the case p = 0.5 corresponds to the case where all graphs on n vertices are chosen with equal probability.

There are three types of trimming methods here: delete a leaf node, add an edge, and add a node. Since deleting an edge may have a greater impact on the result of the generated graph, we will not consider this method. We try to trim the base graph without changing the global features of the base graph to generate more similar graph pairs. In this way, we get three datasets according to the following generation rule.

A ER-graph connects each pair of n nodes with probability p. Like the BA model, we set n to be 100, and adjust the value of p so that the graph is not too dense. We generate the base graphs and trim them in the same way, so each dataset consists of two basic graphs and 198 pruned graphs too.

b.3 Statistic of datasets

We here present all the statistic of datasets in Table 6.

Dataset #Graphs #Pairs Min #Nodes Max #Nodes Avg #Nodes Min #Edges Max #Edges Avg #Edges
BA-60 200 40K 54 65 59.50 54 66 60.06
BA-100 200 40K 96 105 100.01 96 107 100.56
BA-200 200 40K 192 205 199.63 193 206 200.16
ER-100 200 40K 94 104 99.90 98 107 101.43
IMDB-L 381 145K 15 89 24.10 33 1467 166.58
Table 6: Statistics of datasets.

Appendix C Details about the GED value obtained when converting basic graphs to derived ones.

As shown in Figure 3, when generating a derivative graph with a specific GED from the basic graph, we randomly select among the above three methods (randomly delete a leaf node, add a leaf node or add an edge) to generate a set of operations, and the sum of GED accumulated by all operations is the specific GED value. As proposed formerly, we then take the minimum value of the trimming GED and the calculated GED with the three algorithms as the final GED value.

Figure 3: Generate a derivative graph with a GED of 6 from the basic graph. It is not necessary to use all three methods in real trimming. Here is only one case.