1 Introduction
Learning a general similarity metric between arbitrary pairs of graphstructured objects is one of the key challenges in machine learning. Such learning problems arise in a variety of applications, ranging from graph searching in graphbased databases
Yan and Han (2002), to fewshot 3D action recognition Guo et al. (2018) and malware detection Wang et al. (2019). Conceptually, classical exact or inexact graph matching techniques Bunke and Allermann (1983); Caetano et al. (2009); Riesen et al. (2010) provide a strong tool for learning graph similarity. However, these methods usually either require input graphs with similar sizes or consider mainly the graph structures for finding a correspondence between the nodes of different graphs without taking into account the node representations or features. In contrast, in this paper, we consider the graph matching problem of learning a mapping between a pair of graph inputs and the similarity score , based on a set of training triplet of graph input pairs and scalar output scoredrawn from some fixed but unknown probability distribution in real applications.
Although graph neural networks (GNNs) have recently demonstrated to be a powerful class of neural networks for learning node embeddings of graphs on tasks ranging from node classifications, graph classifications to graph generations Bronstein et al. (2017); Li et al. (2016); Kipf and Welling (2017); Hamilton et al. (2017); Velickovic et al. (2018)
, there is relatively less study on learning graph similarity using GNNs. A simple yet straightforward way is to use GNN to encode each graph as a vector and combine the vectors of two graphs to make a decision. This simple approach can be effective as the graphlevel vector contains important information of a pair of graphs, but one obvious limitation is that this approach ignores finergrained interactions among differentlevel embeddings of two graphs. Very recently, a few attempts have been made to take into account lowlevel interactions either by considering the histogram information or spatial patterns (with CNNs) of the nodewise similarity matrix of node embeddings
Bai et al. (2019, 2020), or by improving the node embeddings of one graph by incorporating the implicit attentive neighbors of another graph Li et al. (2019). However, there are two significant challenges making these graph matching networks potentially ineffective: i) how to learn differentlevel granularity (global level and local level) of interactions between a pair of graphs; ii) how to effectively learn richer crosslevel interactions between nodes of a graph and a whole graph.Inspired by these observations, in this paper, we propose a hierarchical^{1}^{1}1The terminology “hierarchical" here means differentlevel granularity of interactions between a pair of graphs, which is different from the meaning of “hierarchical pooling" operations in Ying et al. (2018). graph matching network (HGMN) for computing the graph similarity between any pair of graphstructured objects. HGMN jointly learns graph representations and a graph matching metric function for computing graph similarity in an endtoend fashion. It consists of a novel nodegraph matching network for effectively learning crosslevel interaction features between nodes of a graph and a whole graph, and a siamese graph neural network for learning globallevel interaction features between two graphs. Our final small prediction networks consume these feature vectors from both crosslevel and globallevel interactions to perform either graphgraph classification or graphgraph regression tasks, respectively.
Recently proposed work only computes graph similarity by considering either graphgraph classification tasks (with labels ) Li et al. (2019), or graphgraph regression tasks (with similarity score ) Bai et al. (2019, 2020). To demonstrate the effectiveness of our model, we systematically evaluate the performance of HGMN on four datasets for both the graphgraph classification and regression tasks. Noted that, the graphgraph classification tasks here are different from the general graph classification tasks Ying et al. (2018); Ma et al. (2019) that only assign each graph with a label. Our graphgraph classification tasks learn a binary label (i.e., similar or dissimilar) for pairs of two graphs instead of one graph. Another important aspect is previous work does not consider the impact of the size of input graphs, which often plays an important role in determining the performance of graph similarity learning. Motivated by this observation, we consider three different ranges of graph sizes to evaluate the robustness of models. In addition, to bridge the gap of the lack of standard datasets for graph similarity learning, we create one new dataset from a real application together with a previously released dataset by Xu et al. (2017) for graphgraph classification tasks. Both code and data are available at https://github.com/kleincup/HGMN. In brief, we highlight our main contributions as follows:

We propose a hierarchical graph matching network (HGMN) for computing the graph similarity between any pair of graphstructured objects. HGMN jointly learns graph representations and a graph matching metric function for computing graph similarity in an endtoend fashion.

In particular, we propose a novel nodegraph matching network for effectively capturing the crosslevel interactions between a node embedding of a graph and a corresponding attentive graphlevel embedding of another graph.

Comprehensive experiments demonstrate that HGMN consistently outperforms stateoftheart graph similarity learning baselines for different tasks (i.e., classification and regression) and also exhibits stronger robustness as the sizes of the two input graphs increase.
2 Problem Formulation
In this section, we briefly introduce the problem formulation. Given a pair of graph inputs , the aim of graph similarity learning in this paper is to produce a similarity score . The graph is represented as a set of nodes with a feature matrix , edges (binary or weighted) formulating an adjacency matrix , and a degree matrix . Similarly, the graph is represented as a set of nodes with a feature matrix , edges (binary or weighted) formulating an adjacency matrix , and a degree matrix . Note that, when performing the graphgraph classification tasks is the class label ; when performing the graphgraph regression tasks is the similarity score . We train our model based on a set of training triplet of structured input pairs and scalar output score .
3 Hierarchical Graph Matching Networks
In this section, we will introduce two key components of HGMN  Siamese Graph Neural Networks (SGNN) and NodeGraph Matching Networks (NGMN). We first discuss SGNN for learning the globallevel interactions between two graphs and then outline NGMN for effectively learning the crosslevel nodegraph interactions between nodes of one graph and one whole graph. The overall model architecture for HGMN is shown in Figure 1.
3.1 SGNN for Globallevel Interaction Learning
The graphlevel embeddings contain important information of a graph. Therefore, learning graphlevel interactions between two graphs could be an important component for learning the graph similarity of two graphs. To capture the globallevel interaction features between two graphs, we employ SGNN, which is based on the Siamese Network architecture Bromley et al. (1994) that has achieved great success in many applications such as visual recognition Bertinetto et al. (2016); Varior et al. (2016) and sentence similarity analysis He et al. (2015); Mueller and Thyagarajan (2016). In general, SGNN consists of 3 components: 1) node embedding layers; 2) graphlevel embedding aggregation layers; 3) graphgraph matching and prediction layers.
Node Embedding Layers. We use threelayer graph convolution networks (GCN) with the siamese networks to generate node embeddings of both graphs and ,
(1) 
where
is the activation function,
is the normalized Laplacian matrix for depending on or , and are hidden weighted matrices for each layer. Note that the twin networks share the parameters of GCN when training on the pair of graphs . The number of GCN layers required may depend on the real application graph data.Graphlevel Embedding Aggregation Layers. With the computed node embeddings for each graph, we need to aggregate them to formulate a corresponding graphlevel embedding ,
(2) 
We employ different aggregation functions such as elementwise max/mean pooling (Max/Avg), elementwise max/mean pooling following a fully connected layer on (FCMax/FCAvg), and an LSTMbased aggregator. Although LSTM is not permutation invariant on a set of node embeddings, it might admit more expressive ability in aggregation and has been applied in previous work Hamilton et al. (2017); Zhang et al. (2019).
GraphGraph Matching & Prediction Layers. After the graphlevel embeddings and are computed for and , we then use the resulting graph embeddings to compute the graph similarity score of
. As it is common to employ cosine similarity in the classification tasks
Xu et al. (2017); Gu et al. (2018), we directly compute the cosine similarity of two graphlevel embeddings,(3) 
Differently, the results of the regression tasks are continuous and are set in a range of [0,1]. Thus, for the regression tasks, we first concatenate the two graph embeddings into
, employ standard fully connected layers to gradually project the dimension of resulting vector down to 1, and finally perform the sigmoid function to enforce the similarity score in the range of [0,1].
(4) 
For both tasks, we train the SGNN model with the mean square error loss function to compare the computed similarity score
with the groudtruth similarity score , i.e., .3.2 NGMN for Crosslevel NodeGraph Interaction Learning
Although globallevel interaction learning could capture the important structural and feature information of two graphs to some extent, it is not enough to capture all important information of two graphs since they ignore other crosslevel interactions between parts of two graphs. In particular, existing work has considered either globallevel graphgraph interactions or lowlevel nodenode interactions, ignoring the rich crosslevel interactions between nodes of a graph and a whole graph. Inspired by these observations, we propose a novel nodegraph matching network to effectively learn the crosslevel interaction features and illustrate each part in detail as follows.
Node Embedding Layers. Similar as described in Section 3.1, we choose to employ the threelayer GCN to generate node embeddings and for graphs and . Conceptually, the node embedding layers of NGMN could be chosen to be an independent GCN or a shared GCN with SGNN. As shown in Figure 1, our NGMN shares the same graph encoder (i.e., GCN) with SGNN due to two reasons: i) it reduces the number of parameters by half, which helps mitigate possible overfitting; ii) it maintains the consistency of the resulting node embeddings for both NGMN and SGNN, potentially leading to more aligned globallevel interaction and crosslevel interaction features.
NodeGraph Matching Layers. This layer is the key part of NGMN, which can effectively learn the crosslevel interactions between nodes of a graph and a whole graph. There are generally two steps for this layer: i) calculate the graphlevel embedding of a graph; ii) compare the node embeddings of a graph with the associated graphlevel embedding of a whole graph and then produce a similarity feature vector. To build more tight interactions between the two graphs for learning the graphlevel embedding of each other, we first calculate the crossgraph attention coefficients between the node in and all other nodes in . Similarly, we calculate the crossgraph attention coefficients between the node in and all other nodes in . These two crossgraph attention coefficients can be computed with an attention function independently,
(5) 
where is the attention function for computing the similarity score. For simplicity, we use cosine function in our experiments but other similarity metrics can be adopted as well. Then, we compute the attentive graphlevel embeddings using the weighted average of the node embeddings of the other graph,
(6) 
Next, we define a multiperspective matching function to compute the similarity feature vector by comparing two vectors as follows,
(7) 
where is a dimension similarity feature vector, is a trainable weight matrix and each represents a perspective with total number of perspectives. Notably, could be any similarity function and we use the cosine similarity metric in our experiments. It is worth noting that the proposed essentially shares a similar spirit with multihead attention Vaswani et al. (2017), with the difference that multihead attention uses number of weighted matrices instead of vectors.
Therefore, we can use to compare the th or th node embedding of a graph with the corresponding attentive graphlevel embedding to capture the crosslevel nodegraph interactions. The resulting similarity feature vectors (w.r.t node in either or ) can thus be computed by,
(8) 
After performing nodegraph matching over all nodes for both and , the newly produced interaction feature matrices and are ready to be fed into the aggregation layers.
Aggregation Layers. To aggregate the crosslevel interaction feature matrix from the nodegraph matching layer, we employ BiLSTM Hochreiter and Schmidhuber (1997) to aggregate the unordered feature embeddings,
(9) 
where is computed by concatenating the last hidden vectors of two directions and represents the aggregated graphlevel embedding for each graph and . Although other aggregators can also be used, our extensive experiments show that BiLSTM aggregator achieved consistent better performance over other aggregators (see Appendix A.4). Similar LSTMtype aggregators have also been employed in the previous work Hamilton et al. (2017); Zhang et al. (2019).
Prediction Layers. After the aggregated graph embeddings and are obtained, we then use these two embeddings to compute the similarity score of . Just like the prediction layer in SGNN, we use Equations (3) and (4) to predict the similarity score for both classification and regression tasks. We also use the same mean square error loss function for the model training.
3.3 Discussions on Our Full Model – HGMN
The full model HGMN combines the advantages of both SGNN and NGMN to capture both globallevel graphgraph interaction features and crosslevel nodegraph interaction features between two graphs. For the final prediction layer of HGMN, we have a total of six aggregated graph embedding vectors where two are and from SGNN, and another four are and from NGMN.
Complexity. The computation complexity of SGNN is , where the most dominant computation is the sparse matrixmatrix operations in Equation (1). Similarly, the computational complexity of NGMN is , where the most computationally extensive operations are in Equations (6), (7), and (8). Compared to recently proposed work Bai et al. (2019, 2020); Li et al. (2019), their computational complexities are highly comparable.
4 Experiments
4.1 Datasets, Experimental Setup, and Baselines
Classification Datasets: we evaluate our model on the task of detecting a similarity score (i.e., 1) between two binary functions, which is the heart of many binary security problems Feng et al. (2016); Xu et al. (2017); Ding et al. (2019). As we represent binaries with control flow graphs, detecting the similarity between two binaries can be cast as learning the similarity score between two control flow graphs and
. We prepare two datasets generated from two popular opensource softwares:
FFmpeg and OpenSSL. Besides, existing work does not consider the impact of the sizes of graphs on the performance. However, we find the larger the graph size is, the worse the performance is. Therefore, it is important to evaluate the robustness of graph similarity networks in this setting. We thus further split each datasets into 3 subdatasets ([3, 200], [20,200], and [50,200]) according to the range of graph sizes. ^{2}^{2}2Although there are many benchmarks for the general graph classification tasks, these cannot be directly used in our graphgraph classification tasks as we cannot simply treat two graphs with the same labels as “similar".Tasks  Datasets 







classif ication  FFmpeg  [3, 200]  83,008  10,376  18.83  27.02  6  
[20, 200]  31,696  7,668  51.02  75.88  
[50, 200]  10,824  3,178  90.93  136.83  
OpenSSL  [3, 200]  73,953  4,249  15.73  21.97  6  
[20, 200]  15,800  1,073  44.89  67.15  
[50, 200]  4,308  338  83.68  127.75  
regre ssion  AIDS700    700    8.90  8.80  29  
LINUX1000    1000    7.58  6.94  1 
Regression Datasets: we evaluate our model on learning the graph edit distance (GED) Zeng et al. (2009); Gao et al. (2010); Riesen (2015), which measures the structural similarity between two graphs. Formally, GED is defined as the cost of the least expensive sequence of edit operations that transform one graph into another, where an edit operation can be an insertion or a deletion of a node or an edge. In our experiments, we normalize GED as , and evaluate models on two datasets AIDS700 and LINUX1000 from Bai et al. (2019). Table 1 shows the statistic for all datasets with more details in Appendix A.1.
Implementation Details
. We implement our models using PyTorch 1.1
Paszke et al. (2017) and train them with Adam optimizer Kingma and Ba (2015). We use 3 GCN layers with each output dimension of 100 and set the number of perspectivesto 100. For classification tasks, we train the model by running 100 epochs with 0.5e3 learning rate. At each epoch, we build the pairwise training data as follows. For each graph
in the training subset, we obtain one positive pair and a corresponding negative pair , where is randomly selected from all control flow graphs that compiled from the same source function as , and is selected from other graphs. By default, each minibatch includes 5 positive and 5 negative pairs. For regression tasks, we train the model by 10000 iterations with a minibatch of 128 graph pairs with 5e3 learning rate. Each pair is a tuple of , where is the groundtruth GED between and . Noted that all experiments are conducted on a PC equipped with 8 Intel Xeon 2.2GHz CPU and one NVIDIA GTX 1080 Ti GPU. Other model settings and experiment details can be found in Appendix A.2.1.Baseline Methods ^{3}^{3}3As the three baseline methods only consider classification tasks or regression tasks, we slightly adjust the last layer of model or loss function of each baseline in order to make fair comparisons on both tasks.: i) SimGNN Bai et al. (2019) adopts GCN to encode node features and applies 2 strategies to model the similarity between two graphs: one based on interactions between two graphlevel embeddings, another based on histogram features from two sets of node embeddings; ii) GMN Li et al. (2019) employs a variant of message passing neural networks and improves the node embeddings of one graph via incorporating the information of attentive neighborhoods of another graph; iii) GraphSim Bai et al. (2020) extends SimGNN by turning the two sets of node embeddings into a similarity matrix and then processing the matrix with CNNs Krizhevsky et al. (2012). Detailed experimental settings are given in Appendix A.2.2.
Note that we have two variants of the full model HGMN: HGMN (FCMax) and HGMN
(BiLSTM), where SGNN uses either the FCMax or BiLSTM aggregator, respectively. We repeat all experiments 5 times and report the mean and standard deviation of results, with the best performance in
bold.4.2 Comparison with Baseline Methods
Comparison on the GraphGraph Classification Tasks. For graphgraph classification tasks, we measure the Area Under the ROC Curve (AUC) Bradley (1997) of different models. As shown in Table 2, our models (both full model HGMN or key component NGMN) clearly achieve stateoftheart performance on all 6 subdatasets for both FFmpeg and OpenSSL. Particularly when the graph size increases, both HGMN and NGMN models show better and more robust performance than stateoftheart methods. In addition, compared with SGNN (Max), NGMN shows superior performance by a large margin, demonstrating the benefits of the multiperspective nodegraph matching mechanism that captures the crosslevel interaction features between node embeddings of a graph and the graphlevel embedding of another graph. HGMN (i.e., NGMN+SGNN) further improves the performance of NGMN together with globallevel interaction features learned from SGNN (see more experiments of SGNN with other aggregation functions in Appendix A.3).
Model  FFmpeg  OpenSSL  
[3, 200]  [20, 200]  [50, 200]  [3, 200]  [20, 200]  [50, 200]  
SimGNN  95.380.76  94.311.01  93.450.54  95.960.31  93.580.82  94.250.85 
GMN  94.150.62  95.921.38  94.760.45  96.430.61  93.033.81  93.911.65 
GraphSim  97.460.30  96.490.28  94.480.73  96.840.54  94.970.98  93.661.84 
SGNN  93.920.07  93.820.28  85.151.39  91.070.10  88.940.47  82.100.51 
NGMN  97.730.11  98.290.21  96.810.96  96.560.12  97.600.29  92.891.31 
HGMN (FCMax)  98.070.06  98.290.10  97.830.11  96.870.24  97.590.24  95.581.13 
HGMN (BiLSTM)  97.560.38  98.120.04  97.160.53  96.900.10  97.311.07  95.870.88 
Comparison on the GraphGraph Regression Tasks. For the regression tasks of computing the normalized GED between two graphs, we evaluate the models using Mean Square Error (), Spearman’s Rank Correlation Coefficient () Spearman (1904), Kendall’s Rank Correlation Coefficient () Kendall (1938), and precision at k (p@k). All results of both AIDS700 and LINUX1000 datasets are summarized in Table 3
. Although GraphSim shows better performance than the other two baselines, our models (the full model HGMN and key component NGMN) outperform all baselines on both datasets in terms of most evaluation metrics. Moreover, compared with SGNN (Max), NGMN achieves much better performance (see more in Appendix
A.3). It highlights the importance of our proposed nodegraph matching mechanism, which could effectively capture crosslevel nodegraph interactions between nodes of a graph and a whole graph in NGMN. HGMN (i.e., SGNN+NGMN) further improves the performance of NGMN together with globallevel interaction features learned from SGNN.Datasets  Model  ()  p@10  p@20  
AIDS700  SimGNN  1.3760.066  0.8240.009  0.6650.011  0.4000.023  0.4890.024 
GMN  4.6100.365  0.6720.036  0.4970.032  0.2000.018  0.2630.018  
GraphSim  1.9190.060  0.8490.008  0.6930.010  0.4460.027  0.5250.021  
SGNN  2.8220.149  0.7650.005  0.5880.004  0.2890.016  0.3730.012  
NGMN  1.1910.048  0.9040.003  0.7490.005  0.4650.011  0.5380.007  
HGMN (FCMax)  1.2050.039  0.9040.002  0.7490.003  0.4570.014  0.5320.016  
HGMN (BiLSTM)  1.1690.036  0.9050.002  0.7510.003  0.4560.019  0.5390.018  
LINUX 1000  SimGNN  2.4791.038  0.9120.031  0.7910.046  0.6350.328  0.6500.283 
GMN  2.5710.519  0.9060.023  0.7630.035  0.8880.036  0.8560.040  
GraphSim  0.4710.043  0.9760.001  0.9310.003  0.9560.006  0.9420.007  
SGNN  11.8320.698  0.5660.022  0.4040.017  0.2260.106  0.4920.190  
NGMN  1.5610.020  0.9450.002  0.8140.003  0.7430.085  0.7410.086  
HGMN (FCMax)  1.5750.627  0.9460.019  0.8170.034  0.8070.117  0.7840.108  
HGMN (BiLSTM)  0.4390.143  0.9850.005  0.9190.016  0.9550.011  0.9430.014 
4.3 Ablation Studies
Different Attention Functions. As discussed in Section 3.2, the proposed multiperspective matching function shares similar spirits with the multihead attention mechanism Vaswani et al. (2017), which makes it interesting to compare them. Therefore, we investigate the impact of these two different attention mechanisms for the proposed NGMN model with classification results showed in Table 4 Interestingly, our proposed multiperspective attention mechanism consistently outperforms the results of the multihead attention mechanism by quite a large margin. We suspect that our proposed multiperspective attention uses vectors attention weights which may significantly reduce the potential overfitting.
Model  FFmpeg  OpenSSL  
[3, 200]  [20, 200]  [50, 200]  [3, 200]  [20, 200]  [50, 200]  
MultiPerspectives ()  97.730.11  98.290.21  96.810.96  96.560.12  97.600.29  92.891.31 
MultiHeads ()  91.185.91  77.495.21  68.156.97  92.815.21  85.435.76  56.877.53 
Different Numbers of Perspectives. We further investigate the impact of different number of perspectives adopted by the nodegraph matching layer of the NGMN model for classification tasks. Following the same settings of previous experiments, we only change the number of perspectives (i.e., ) of NGMN. From Table 5, it is clearly seen that the AUC score of NGMN does not increase as the number of perspectives grows. We thus conclude that our model performance is not sensitive to the number of perspective (from 50 to 150) and we make by default.
Model  FFmpeg  OpenSSL  
[3, 200]  [20, 200]  [50, 200]  [3, 200]  [20, 200]  [50, 200]  
NGMN ()  98.110.14  97.760.14  96.930.52  97.380.11  97.030.84  93.383.03 
NGMN ()  97.990.09  97.940.14  97.410.05  97.090.25  98.660.11  92.104.37 
NGMN ()  97.730.11  98.290.21  96.810.96  96.560.12  97.600.29  92.891.31 
NGMN ()  98.100.03  98.060.08  97.260.36  96.730.33  98.670.11  96.032.08 
NGMN ()  98.320.05  98.110.07  97.920.09  96.500.31  98.040.03  97.130.36 
Different GNNs. We investigate the impact of different GNNs including GraphSAGE Hamilton et al. (2017), GIN Xu et al. (2019), and GGNN Li et al. (2016) adopted by the node embedding layer of our NGMN models for both classification and regression tasks. Table 6 presents the results of classification tasks (see the regression results from Table 12 in Appendix A.5). In general, the performance of different GNNs is quite similar for all datasets of both classification and regression tasks, which indicates that our model is not sensitive to the choice of GNNs in the node embedding layers. An interesting observation is that NGMNGGNN performs even better than our default NGMNGCN on both FFmpeg and OpenSSL datasets. This shows that our model can be further improved by adopting more advanced GNN models or choosing the most appropriate GNNs according to different application tasks.
Model  FFmpeg  OpenSSL  
[3, 200]  [20, 200]  [50, 200]  [3, 200]  [20, 200]  [50, 200]  
NGMNGCN (Our)  97.730.11  98.290.21  96.810.96  96.560.12  97.600.29  92.891.31  

97.310.56  98.210.13  97.880.15  96.130.30  97.300.72  93.663.87  

97.970.08  98.060.22  94.664.01  96.980.20  97.420.48  92.292.23  

98.420.41  99.770.07  97.931.18  99.350.06  98.511.04  94.177.74 
5 Related Work
Conventional Graph Matching. In general, graph matching can be categorized into exact graph matching and errortolerant graph matching. Exact graph matching aims to find a strict correspondence between two (in large parts) identical graphs being matched, while errortolerant graph matching allows matching between completely nonidentical graphs Riesen (2015). In realworld applications, the constraint of exact graph matching is too rigid, and thus an amount of work has been proposed to solve the errortolerant graph matching problem, which is usually quantified by a specific similarity metric, such as GED, maximum common subgraph (MCS) Bunke (1997), or even more coarse binary similarity, according to different application backgrounds. For GED and MCS, both of them are wellstudied NPhard problems and suffer from exponential computational complexity and huge memory requirements for exact solutions in practice Bunke (1997); McGregor (1982); Zeng et al. (2009); Blumenthal and Gamper (2018).
Graph Similarity Computation and Graph Matching Networks.
Considering the great significance and challenge of computing the graph similarity, various approximation methods have been proposed for improvements for better accuracy and efficiency, including traditional heuristic methods
Gao et al. (2010); Zeng et al. (2009); Riesen (2015); Wu et al. (2019); Yoshida et al. (2019); Wu et al. (2018) and recent datadriven graph matching networks Bai et al. (2019, 2020); Li et al. (2019), as detailed in the baselines of Section 4.1. Our research belongs to the graph matching networks, but differs from prior work in two main aspects. First, unlike prior work only consider graphlevel or nodelevel interaction features, our HGMN model successfully captures richer interactions between nodes of a graph and a whole graph. Second, our work is the first one to systematically evaluate the performance on both graphgraph classification and regression tasks as well as the size of input graphs.6 Conclusion and Future Work
In this paper, we presented a novel hierarchical graph matching network (HGMN) for computing the graph similarity between any pair of graphstructured objects. Our model jointly learned graph embeddings and a datadriven graph matching metric for computing graph similarity in an endtoend fashion. We further proposed a new nodegraph matching network for effectively learning crosslevel interactions between two graphs beyond lowlevel nodenode and globallevel graphgraph interactions. Our extensive experimental results correlated the superior performance compared with stateoftheart baselines on both graphgraph classification and regression tasks. One interesting future direction is to adapt our HGMN model for solving different realworld applications such as unknown malware detection, text matching and entailment, and knowledge graph question answering.
References
 [1] (2019) SimGNN: a neural network approach to fast graph similarity computation. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 384–392. Cited by: §A.1.2, §A.1.2, §A.1.2, §A.1.2, §1, §1, §3.3, §4.1, §4.1, §5.

[2]
(2020)
Learningbased efficient graph similarity computation via multiscale convolutional set matching.
In
ThirtyForth AAAI Conference on Artificial Intelligence
, Cited by: §1, §1, §3.3, §4.1, §5. 
[3]
(2016)
Fullyconvolutional siamese networks for object tracking.
In
European conference on computer vision
, pp. 850–865. Cited by: §3.1.  [4] (2018) On the exact computation of the graph edit distance. Pattern Recognition Letters. Cited by: §5.
 [5] (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition. Cited by: §4.2.
 [6] (1994) Signature verification using a" siamese" time delay neural network. In Advances in neural information processing systems, pp. 737–744. Cited by: §3.1.

[7]
(2017)
Geometric deep learning: going beyond euclidean data
. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §1.  [8] (1983) Inexact graph matching for structural pattern recognition. Pattern Recognition Letters 1 (4), pp. 245–253. Cited by: §1.
 [9] (1997) On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters 18 (8), pp. 689–694. Cited by: §5.
 [10] (2009) Learning graph matching. IEEE transactions on pattern analysis and machine intelligence 31 (6), pp. 1048–1058. Cited by: §1.
 [11] (2019) Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization. In IEEE Symposium on Security and Privacy (S&P), Cited by: §4.1.
 [12] (2016) Scalable graphbased bug search for firmware images. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §4.1.
 [13] (2010) A survey of graph edit distance. Pattern Analysis and applications 13 (1), pp. 113–129. Cited by: §4.1, §5.
 [14] (2018) Deep code search. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE), pp. 933–944. Cited by: §3.1.
 [15] (2018) Neural graph matching networks for fewshot 3d action recognition. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 653–669. Cited by: §1.
 [16] (2017) Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems, Cited by: §1, §3.1, §3.2, §4.3.
 [17] (1968) A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics 4 (2), pp. 100–107. Cited by: §A.1.2.

[18]
(2015)
Multiperspective sentence similarity modeling with convolutional neural networks
. InProceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
, pp. 1576–1586. Cited by: §3.1.  [19] (1997) Long shortterm memory. Neural computation. Cited by: §3.2.
 [20] (1938) A new measure of rank correlation. Biometrika. Cited by: §4.2.
 [21] (2015) Adam: A method for stochastic optimization. In International Conference on Learning Representations, Cited by: §4.1.
 [22] (2017) Semisupervised classification with graph convolutional networks. In International Conference on Learning Representations, Cited by: §1.
 [23] (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §4.1.
 [24] (2019) Graph matching networks for learning the similarity of graph structured objects. ICML. Cited by: §1, §1, §3.3, §4.1, §5.
 [25] (2016) Gated graph sequence neural networks. International Conference on Learning Representations. Cited by: §1, §4.3.
 [26] (2019) Graph convolutional networks with eigenpooling. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 723–731. Cited by: §1.
 [27] (1982) Backtrack search algorithms and the maximal common subgraph problem. Software: Practice and Experience 12 (1), pp. 23–34. Cited by: §5.
 [28] (2016) Siamese recurrent architectures for learning sentence similarity. In Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §3.1.
 [29] (2017) Automatic differentiation in pytorch. Cited by: §4.1.
 [30] (2013) A novel software toolkit for graph edit distance computation. In International Workshop on GraphBased Representations in Pattern Recognition, pp. 142–151. Cited by: §A.1.2.
 [31] (2010) Exact and inexact graph matching: methodology and applications. In Managing and Mining Graph Data, pp. 217–247. Cited by: §1.
 [32] (2015) Structural pattern recognition with graph edit distance. In Advances in computer vision and pattern recognition, Cited by: §4.1, §5, §5.
 [33] (1904) The proof and measurement of association between two things. American Journal of Psychology. Cited by: §4.2.
 [34] (2016) Gated siamese convolutional neural network architecture for human reidentification. In European conference on computer vision, pp. 791–808. Cited by: §3.1.
 [35] (2017) Attention is all you need. In Advances in neural information processing systems, pp. 5998–6008. Cited by: §3.2, §4.3.
 [36] (2018) Graph attention networks. In International Conference on Learning Representations, Cited by: §1.
 [37] (2019) Heterogeneous graph matching networks for unknown malware detection. In Proceedings of International Joint Conference on Artificial Intelligence, Cited by: §1.
 [38] (2012) An efficient graph indexing method. In 2012 IEEE 28th International Conference on Data Engineering, Cited by: §A.1.2.
 [39] (2018) D2ke: from distance to kernel and embedding. arXiv preprint arXiv:1802.04956. Cited by: §5.
 [40] (2019) Scalable global alignment graph kernel using random features: from node embedding to graph embedding. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1418–1428. Cited by: §5.
 [41] (2019) How powerful are graph neural networks?. In International Conference on Learning Representations, Cited by: §4.3.
 [42] (2017) Neural networkbased graph embedding for crossplatform binary code similarity detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Cited by: §A.1.1, §1, §3.1, §4.1.
 [43] (2002) Gspan: graphbased substructure pattern mining. In Proceedings of IEEE International Conference on Data Mining, pp. 721–724. Cited by: §1.
 [44] (2018) Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems, pp. 4800–4810. Cited by: §1, footnote 1.
 [45] (2019) Learning interpretable metric between graphs: convex formulation and computation with graph mining. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1026–1036. Cited by: §5.
 [46] (2009) Comparing stars: on approximating graph edit distance. Proceedings of the VLDB Endowment 2 (1), pp. 25–36. Cited by: §4.1, §5, §5.
 [47] (2019) Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Cited by: §3.1, §3.2.
Appendix A Appendix
a.1 Datasets
a.1.1 Classification Datasets
In our evaluation, two binary functions that are compiled from the same source code but under different settings (architectures, compilers, optimization levels, etc) are considered to be semantically similar to each other. It is noted that one source code function, after compiled with different settings (architectures, compilers, optimization levels, etc), can generate various binary functions. To learn the similarity scores from pairs of binary functions, we represent those binary functions with control flow graphs, whose nodes represent the basic blocks (a basic block is a sequence of instructions without jumps) and edges represent control flow paths between these basic blocks. Thus, detecting the similarity between two binary functions can be cast as the problem of learning the similarity score between two control flow graphs and , where indicates and are similar; otherwise indicates dissimilar. We prepare two benchmark datasets generated from two popular opensource softwares: FFmpeg and OpenSSL, to evaluate our model on the graphgraph classification tasks.
For FFmpeg, we prepare the corresponding control flow graph (CFG) dataset as the benchmark dataset to detect binary function similarity. First, we compile FFmpeg 4.1.4 using 2 different compilers gcc 5.4.0 and clang 3.8.0, and 4 different compiler optimization levels (O0O3), and generate 8 different binary files. Second, these 8 generated binaries are disassembled using IDA Pro,^{4}^{4}4IDA Pro disassembler, https://www.hexrays.com/products/ida/index.shtml. which can produce CFGs for all disassembled functions. Finally, for each basic block in CFGs, we extract 6 blocklevel numeric features as the initial node representation based on IDAPython (a pythonbased plugin in IDA Pro).
OpenSSL is built from OpenSSL (v1.0.1f and v1.0.1u) using gcc 5.4 in three different architectures (x86, MIPS, and ARM), and four different optimization levels (O0O3). The OpenSSL dataset that we evaluate is previously released by [42] and publicly available^{5}^{5}5https://github.com/xiaojunxu/dnnbinarycodesimilarity. with prepared 6 blocklevel numeric features.
Overall, for both FFmpeg and OpenSSL datasets, each node in the CFGs are initialized with 6 blocklevel numeric features: # of string constants, # of numeric constants, # of total instructions, # of transfer instructions, # of call instructions, and # of arithmetic instructions.
a.1.2 Regression Datasets
Instead of directly computing the graph edit distance (GED) between two graphs and , we try to learn a similarity score , which is the normalized exponential of GED in the range of . To be specific, , where or denotes the number of nodes of or , and or denotes the normalized/unnormalized GED between and .
We employ both AIDS700 and LINUX1000 released by [1], which are publicly available.^{6}^{6}6https://github.com/yunshengb/SimGNN. Each dataset contains a set of graph pairs as well as their groundtruth GED scores, which are computed by exponentialtime exact GED computation algorithm [17, 30]. As the groundtruth GEDs of another dataset IMDBMULTI are provided with inexact approximations, we thus do not consider this dataset in our experiments.
AIDS700 is a subset of the AIDS dataset, a collection of AIDS antiviral screen chemical compounds from the Development Therapeutics Program (DTP) in the National Cancer Institute (NCI).^{7}^{7}7https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data Originally, AIDS contains 42687 chemical compounds, where each of them can be represented as a graph with atoms as nodes and bonds as edges. To avoid calculating the groundtruth GED between two graphs with a large number of nodes, the authors of [1] create the AIDS700 dataset that contains 700 graphs with 10 or fewer nodes. For each graph in AIDS700, every node is labeled with the element type of its atom and every edge is unlabeled (i.e., bonds features are ignored).
LINUX1000 is also a subset dataset of Linux that introduced in [38]. The original Linux dataset is a collection of 48747 program dependence graphs generated from Linux kernel. In this case, each graph is a static representation of data flow and control dependency within one function, with each node assigned to one statement and each edge describing the dependency between two statements. For the same reason as above that avoiding calculating the groundtruth GED between two graphs with a large number of nodes, the LINUX1000 dataset used in [1] is randomly selected and contains 1000 graphs with 10 or fewer nodes. For each graph in LINUX1000, both nodes and edges are unlabeled.
For both classification and regression datasets, Table 7 provides more detailed statistics. In our evaluation, for the classification tasks, we split each dataset into three disjoint subsets of binary functions for training/validation/testing. In the regression tasks, we first split graphs of each dataset into training, validation, and testing sets, and then build the pairwise training/validation/testing data as the previous work [1].
Tasks  Datasets 







classif ication  FFmpeg  [3, 200]  83,008  10,376  (3/200/18.83)  (2/332/27.02)  (1.25/4.33/2.59)  
[20, 200]  31,696  7,668  (20/200/51.02)  (20/352/75.88)  (1.90/4.33/2.94)  
[50, 200]  10,824  3,178  (50/200/90.93)  (52/352/136.83)  (2.00/4.33/3.00)  
OpenSSL  [3, 200]  73,953  4,249  (3/200/15.73)  (1/376/21.97)  (0.12/3.95/2.44)  
[20, 200]  15,800  1,073  (20/200/44.89)  (2/376/67.15)  (0.12/3.95/2.95)  
[50, 200]  4,308  338  (50/200/83.68)  (52/376/127.75)  (2.00/3.95/3.04)  
regre ssion  AIDS700    700    (2/10/8.90)  (1/14/8.80)  (1.00/2.80/1.96)  
LINUX1000    1000    (4/10/7.58)  (3/13/6.94)  (1.50/2.60/1.81) 
a.2 More Experimental Setup
a.2.1 Other experimental settings for our models
For SGNN, we use three GCN layers in the node embedding layer and each of the GCNs has an output dimension of 100. We use ReLU as the activation function along with a dropout layer after each GCN layer with dropout rate being 0.1. In the graphlevel embedding aggregation layer of SGNN, we can employ different aggregation functions (i,e., Max, FCMax, Avg, FCAvg and BiLSTM) as stated previously in Section
3.1. For NGMN, We set the number of perspectives to 100. We also employed different aggregation functions similar to SGNN and found that BiLSTM consistently performs better than others (see Appendix A.4). Thus, for NGMN, we take BiLSTM as the default aggregation function and we make its hidden size equal to the dimension of node embeddings. For each graph, we concatenate the last hidden vector of two directions of BiLSTM, which results in a 200dimension vector as the graph embedding.a.2.2 Detailed experimental settings for baseline models
In principle, we follow the same experimental settings as the baseline methods of their original papers and adjust a few settings to fit specific tasks. For instance, SimGNN is originally used for graphgraph regression tasks, we modify the final layer of model architecture so that it can be used to evaluate graphgraph classification tasks fairly. Thus, detailed experimental settings of all three baseline methods for both classification and regression tasks are given as follows.
SimGNN
: SimGNN firstly adopts threelayer GCN to encode each node of a pair of graphs into a vector. Then, SimGNN employs a twostage strategy to model the similarity between the two graphs: i) it uses Neural Tensor Network (NTN) module to interact two graphlevel embeddings that aggregated by a node attention mechanism; ii) it uses the histogram features extracted from the pairwise nodenode similarity scores. Finally, the features learned from the twostage strategy are concatenated to feed into multiple fully connected layers to obtain a final prediction.
For the graphgraph regression tasks, the output dimensions for the threelayer GCNs are 64, 32, and 16, respectively. The number of K in NTN and the number of histogram bins are both set to 16. Four fully connected layers are employed to reduce the dimension of concatenated results from 32 to 16, 16 to 8, 8 to 4, 4 to 1. As for training, the mean square error (MSE) loss function is used to train the model with Adam optimizer. The learning rate is set to 0.001 and the batch size is set to 128. We set the number of iterations to 10,000 and select the best model based on the lowest validation loss.
To fairly compare our model with SimGNN in evaluating graphgraph classification tasks, we adjust the settings of SimGNN as follows. We follow the same architecture of SimGNN in regression tasks except that the output dimension of the last connected layer is set to 2. We apply a softmax operation over the output of SimGNN to get the predicted binary label for the graphgraph classification tasks. As for training, we use the crossentropy loss function to train our model and set the number of epochs to 100. Other training hyperparameters are kept the same as the regression tasks.
GMN: The spirit of GMN is improving the node embeddings of one graph by incorporating the implicit neighbors of another graph through a soft attention mechanism. GMN follows a similar model architecture of the neural message passing network with three components: encoder layer that maps the node and edge to initial vector features of node and edge, propagation layer further update the node embeddings through proposed strategies and an aggregator that compute a graphlevel representation for each graph.
For the graphgraph classification tasks, we use 1layer MLP as the node/edge encoder and set the number of rounds of propagation to 5. The dimension of the node feature is set to 32, and the dimension of graphlevel representation is set to 128. The Hamming distance is employed to compute the distance of two graphlevel representation vectors. Based on the Hamming distance, we train the model with the marginbased pairwise loss function for 100 epochs in which validation is carried out per epoch. Adam optimizer is used with learning rate of 0.001 and batch size 10.
In order to enable fair comparisons with GMN for graphgraph regression tasks, we adjust the GMN architecture by concatenating the graphlevel representation of two graphs and feeding it into a fourlayer fully connected layers like SimGNN so that the final output dimension is reduced to 1. As for training, we use mean square loss function with batch size 128. Other settings remain the same as the classification tasks.
GraphSim: The main idea of GraphSim is to convert the graph similarity computation problems into pattern recognition problems. GraphSim first employs GCN to generate node embeddings of the pair of graphs, then turns the two sets of node embedding into a similarity matrix consisting of the pairwise nodenode interaction similarity scores, feeds these matrics into convolutional neural networks (CNN), and finally concatenates the results of CNN to multiple fully connected layers to obtain a final predicted graphgraph similarity score.
For the graphgraph regression tasks, three layers of GCN are employed with each output dimension being set to 128, 64, and 32, respectively. The following architecture of CNNs is used: , , , , , , , , , . Numbers in and indicates and . Eight fully connected layers are used to reduce the dimension of the concatenated results from CNNs, from 384 to 256, 256 to 128, 128 to 64, 64 to 32, 32 to 16, 16 to 8, 8 to 4, 4 to 1. As for training, the mean square error (MSE) loss function is used to train the model with Adam optimizer. The learning rate is set to 0.001 and the batch size is set to 128. Similar to SimGNN, we set the number of iterations to 10,000 and select the best model based on the lowest validation loss.
To make a fair comparison of our model with GraphSim in our evaluation, we also adjust GraphSim to solve the graphgraph classification tasks. We follow the same architecture of GraphSim in regression tasks except that seven connected layers are used instead of eight. The output dimension of final connected layers is set to 2, and we apply a softmax operation over it to get the predicted binary label for the classification tasks. As for training, we use the crossentropy loss function to train our model and set the number of epochs to 100. Other training hyperparameters are kept the same as the regression tasks.
a.2.3 Detailed experimental setup for different GNNs
When performing experiments to see how different GNNs affect the performance of NGMN, we only replace GCN with GraphSAGE, GIN, and GGNN using the geometric deep learning library  PyTorch Geometric^{8}^{8}8https://pytorchgeometric.readthedocs.io. More specifically, for GraphSAGE, we used a 3layer GraphSAGE GNN with their output dimensions all set to 100. For GIN, we used 3 GIN modules with a 1layer MLP with output dimension 100 as the learnable function. For GGNN, we used 3 onelayer propagation models to replace the 3 GCNs in our original setting and also set their output dimensions to 100.
a.3 SGNN with different aggregation functions for both classification & regression tasks
To further compare our models with the SGNN models, we train and evaluate several SGNN models with different aggregation functions, such as Max, FCMax, Avg, FCAvg, and BiLSTM. The classification results and regression results are summarized in Table 8 and Table 9, respectively. For both classification and regression tasks, our models (the full model HGMN and key component NGMN) show statistically significant improvement over all SGNN models with different aggregation functions, which indicates the advantage of the proposed nodegraph matching network.
Model  FFmpeg  OpenSSL  
[3, 200]  [20, 200]  [50, 200]  [3, 200]  [20, 200]  [50, 200]  
SGNN (BiLSTM)  96.920.13  97.620.13  96.350.33  95.240.06  96.300.27  93.990.62 
SGNN (Max)  93.920.07  93.820.28  85.151.39  91.070.10  88.940.47  82.100.51 
SGNN (FCMax)  95.370.04  96.290.14  95.980.32  92.640.15  93.790.17  93.210.82 
SGNN (Avg)  95.610.05  96.090.05  96.700.13  92.890.09  93.900.24  94.120.35 
SGNN (FCAvg)  95.180.03  95.740.15  96.430.16  92.700.09  93.720.19  93.490.30 
NGMN  97.730.11  98.290.21  96.810.96  96.560.12  97.600.29  92.891.31 
HGMN (Max)  97.440.32  97.840.40  97.220.36  94.771.80  97.440.26  94.061.60 
HGMN (FCMax)  98.070.06  98.290.10  97.830.11  96.870.24  97.590.24  95.581.13 
HGMN (BiLSTM)  97.560.38  98.120.04  97.160.53  96.900.10  97.311.07  95.870.88 
Datasets  Model  ()  p@10  p@20  
AIDS700  SGNN (BiLSTM)  1.4220.044  0.8810.005  0.7180.006  0.3760.020  0.4720.014 
SGNN (Max)  2.8220.149  0.7650.005  0.5880.004  0.2890.016  0.3730.012  
SGNN (FCMax)  3.1140.114  0.7350.009  0.5540.008  0.2780.021  0.3640.017  
SGNN (Avg)  1.4530.015  0.8760.002  0.7120.002  0.3530.007  0.4440.012  
SGNN (FCAvg)  1.6580.067  0.8570.007  0.6890.008  0.3050.018  0.3990.021  
NGMN  1.1910.048  0.9040.003  0.7490.005  0.4650.011  0.5380.007  
HGMN (Max)  1.2100.020  0.9000.002  0.7430.003  0.4610.012  0.5340.009  
HGMN (FCMax)  1.2050.039  0.9040.002  0.7490.003  0.4570.014  0.5320.016  
HGMN (BiLSTM)  1.1690.036  0.9050.002  0.7510.003  0.4560.019  0.5390.018  
LINUX 1000  SGNN (BiLSTM)  2.1401.668  0.9350.050  0.8250.100  0.9780.012  0.9650.007 
SGNN (Max)  11.8320.698  0.5660.022  0.4040.017  0.2260.106  0.4920.190  
SGNN (FCMax)  17.7950.406  0.3620.021  0.2520.015  0.2390.000  0.2410.000  
SGNN (Avg)  2.3430.453  0.9330.012  0.7900.017  0.7780.048  0.8110.050  
SGNN (FCAvg)  3.2110.318  0.9090.004  0.7570.008  0.8310.163  0.8130.159  
NGMN  1.5610.020  0.9450.002  0.8140.003  0.7430.085  0.7410.086  
HGMN (Max)  1.0540.086  0.9620.003  0.8500.008  0.8770.054  0.8830.047  
HGMN (FCMax)  1.5750.627  0.9460.019  0.8170.034  0.8070.117  0.7840.108  
HGMN (BiLSTM)  0.4390.143  0.9850.005  0.9190.016  0.9550.011  0.9430.014 
a.4 NGMN with different aggregation functions for both classification & regression tasks
We investigate the impact of different aggregation functions adopted by the aggregation layer of NGMN model for both classification and regression tasks. Following the default and same settings of previous experiments, we only change the aggregation layer of NGMN and use five possible aggregation functions: Max, FCMax, Avg, FCAvg, LSTM, and BiLSTM. As can be observed from Table 10 and Table 11, BiLSTM offers superior performance on all datasets for both classification and regression tasks in terms of most evaluation metrics. Therefore, we take BiLSTM as the default aggregation function for NGMN, and fix it for the NGMN part in HGMN models.
Model  FFmpeg  OpenSSL  
[3, 200]  [20, 200]  [50, 200]  [3, 200]  [20, 200]  [50, 200]  
NGMN (Max)  73.748.30  73.851.76  77.722.07  67.142.70  63.313.29  63.022.77 
NGMN (FCMax)  97.280.08  96.610.17  96.650.30  95.370.19  96.080.48  95.900.73 
NGMN (Avg)  85.921.07  83.294.49  85.521.42  80.104.59  70.813.41  66.944.33 
NGMN (FCAvg)  95.930.21  73.900.70  94.220.06  93.380.80  94.521.16  94.710.86 
NGMN (LSTM)  97.160.42  97.020.99  84.656.73  96.300.69  97.510.82  89.418.40 
NGMN (BiLSTM)  97.730.11  98.290.21  96.810.96  96.560.12  97.600.29  92.891.31 
Datasets  Model  ()  p@10  p@20  
AIDS 700  NGMN (Max)  2.3780.244  0.8130.015  0.6420.013  0.5780.199  0.5830.169 
NGMN (FCMax)  2.2201.547  0.8080.145  0.6560.122  0.4250.078  0.5040.064  
NGMN (Avg)  1.5240.161  0.8800.010  0.7170.012  0.4080.044  0.4740.027  
NGMN (FCAvg)  1.2810.075  0.8950.006  0.7370.008  0.4530.015  0.5270.016  
NGMN (LSTM)  1.2900.037  0.8950.004  0.7370.005  0.4480.007  0.5200.012  
NGMN (BiLSTM)  1.1910.048  0.9040.003  0.7490.005  0.4650.011  0.5380.007  
LINUX 1000  NGMN (Max)^{*}  16.9210.000         
NGMN (FCMax)  4.7930.262  0.8290.006  0.6650.011  0.7640.170  0.7670.166  
NGMN (Avg)  4.0500.594  0.8880.008  0.7190.012  0.5010.093  0.5360.112  
NGMN (FCAvg)  6.9530.195  0.8970.004  0.7360.005  0.4990.126  0.5090.129  
NGMN (LSTM)  1.5350.096  0.9450.004  0.8130.007  0.6950.064  0.6980.081  
NGMN (BiLSTM)  1.5610.020  0.9450.002  0.8140.003  0.7430.085  0.7410.086 

As all duplicated experiments running on this setting do not converge in their training processes, their corresponding result metrics cannot be calculated.
a.5 NGMN with Different GNNs on regression tasks.
As a supplement to Table 6 in Section 4.3, Table 12 shows the experimental results of GCN versus GraphSAGE/GIN/GGNN in NGMN for the regression tasks.
Datasets  Model  ()  p@10  p@20  
AIDS 700  NGMNGCN (Our)  1.1910.048  0.9040.003  0.7490.005  0.4650.011  0.5380.007 

1.2750.054  0.9010.006  0.7450.008  0.4480.016  0.5330.014  

1.3670.085  0.8890.008  0.7290.010  0.4000.022  0.4920.021  

1.8700.082  0.8710.004  0.7060.005  0.3880.015  0.4570.017  
LINUX 1000  NGMNGCN (Our)  1.5610.020  0.9450.002  0.8140.003  0.7430.085  0.7410.086 

2.7840.705  0.9150.019  0.7670.028  0.6820.183  0.6930.167  

1.1260.164  0.9630.006  0.8580.015  0.7920.068  0.8210.035  

2.0680.991  0.9380.028  0.8150.055  0.6280.189  0.6540.176 
Comments
There are no comments yet.