Convolutional Neural Networks for Fast Approximation of Graph Edit Distance

09/10/2018 ∙ by Yunsheng Bai, et al. ∙ Purdue University 0

Graph Edit Distance (GED) computation is a core operation of many widely-used graph applications, such as graph classification, graph matching, and graph similarity search. However, computing the exact GED between two graphs is NP-complete. Most current approximate algorithms are based on solving a combinatorial optimization problem, which involves complicated design and high time complexity. In this paper, we propose a novel end-to-end neural network based approach to GED approximation, aiming to alleviate the computational burden while preserving good performance. The proposed approach, named GSimCNN, turns GED computation into a learning problem. Each graph is considered as a set of nodes, represented by learnable embedding vectors. The GED computation is then considered as a two-set matching problem, where a higher matching score leads to a lower GED. A Convolutional Neural Network (CNN) based approach is proposed to tackle the set matching problem. We test our algorithm on three real graph datasets, and our model achieves significant performance enhancement against state-of-the-art approximate GED computation algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recent years have seen growing importance of graph-based applications in the domains of chemistry, bioinformatics, recommender systems, social network study, program static analysis, etc. One of the fundamental problems related to graphs is the computation of distance/similarity between two graphs, which is a core operation for graph similarity search and graph database analysis. Among various definitions of graph distance/similarity, GED [Bunke1983] is one of the most widely adopted metrics. Besides its popularity in graph similarity search [Wang et al.2012, Liang and Zhao2017], GED has also been adopted in graph classification [Riesen and Bunke2008, Riesen and Bunke2009], handwriting recognition [Fischer et al.2013], image indexing [Xiao et al.2008], etc. However, the computation of the GED between two graphs is known to be NP-complete [Bunke and Shearer1998], and even state-of-the-art algorithms cannot reliably compute the exact GED within reasonable time between graphs with more than 16 nodes [Blumenthal and Gamper2018].

Figure 1: Our model transforms each graph into a set of node embeddings and compares two sets based on CNNs. Colors denote node labels, and numbers denote node ids.

Faced with the great significance yet huge difficulty of computing the exact GED between two graphs, a flurry of approximate algorithms have been proposed with a trade-off between speed and accuracy. However, these methods usually require rather complicated design and implementation based on discrete optimization or combinatorial search. The time complexity is usually polynomial or even sub-exponential in the number of nodes in the graphs, such as HED [Fischer et al.2015], Hungarian [Riesen and Bunke2009], VJ [Fankhauser, Riesen, and Bunke2011], A*-Beamsearch (Beam) [Neuhaus, Riesen, and Bunke2006], etc.

In this paper, we propose a novel approach to speed up the GED computation. Instead of directly computing the approximate similarities, our solution turns it into a learning problem. Specifically, we design an end-to-end model based on Convolutional Neural Networks (CNN) that predicts the similarity between two graphs. During training, the parameters involved in this function will be learned by minimizing the difference between the predicted similarity scores and the ground truth, where each training data point is a pair of graphs together with their true similarity score, transformed and normalized from their pre-computed exact GED. During testing, we feed the neural network with any pair of graphs and obtain a predicted similarity score. We name such approach as GSimCNN, i.e., Graph Similarity Computation via Convolutional Neural Networks.

GSimCNN enjoys the key advantage of efficiency due to the nature of neural network computation. As for effectiveness, CNNs have been successfully applied to natural language sentence matching [Hu et al.2014, He and Lin2016]

. Similar to these studies, one can first transform each node into an embedding, and graph similarity computation essentially becomes a two-set matching problem, where each node in the set is represented by an embedding vector. By forming a similarity matrix between these two sets via computing the inner products for every pair of node embeddings in the two graphs, a deep and non-linear transformation, such as CNN, then is expected to output a matching score between two graphs. The whole idea is illustrated in Fig.

1.

However, the GED computation on graphs brings up the following challenges which prevent the direct usage of CNNs.

  1. Permutation invariance. The same graph can be represented by different adjacency matrices by permuting the order of nodes, and our algorithm should not be sensitive to such permutation.

  2. Spatial locality preservation. A major assumption for CNN architecture is that the input data has spatial locality, i.e, nearby data points are more similar to each other. Making our embedding-based similarity matrix preserve such spatial locality is the key to the successful application of CNN.

  3. Graph size invariance. CNN architecture requires fix-length input. How to take care of graphs of different sizes is another challenge.

To tackle these challenges, GSimCNN is proposed, which (1) adopts an ordering scheme to alleviate the issues related to the node ordering permutation and spatial locality, (2) uses a padding technique to handle graphs of varying sizes. GSimCNN runs in quadratic time with respect to the number of nodes of the larger of the two graphs, which is among the most efficient state-of-the-art approaches to approximate GED computation.

Moreover, GSimCNN can be considered as an implicit and trainable solver for the set matching problem, which learns a mapping function from a graph pair to their GED and yields a solution for any graph pair with much lower computational complexity. The connection between GSimCNN and several existing graph matching algorithms is provided. Our contributions can be summarized as follows:

  • We address the problem of GED computation by considering it as a learning problem, and propose a novel CNN based approach, called GSimCNN, as the solution. To the best of our knowledge, we are among the first to adopt neural networks to tackle this challenging and classic problem. Running in quadratic time, our model, GSimCNN, is among the most efficient state-of-the-art algorithms for GED computation.

  • We provide theoretical connections of GSimCNN with optimal assignment graph kernels and bipartite graph matching to justify the adoption of CNNs in GSimCNN, and provide a new direction for the more general problem of set matching.

  • We conduct extensive experiments on three real network datasets to demonstrate the significant performance enhancement of the proposed approach.

2 Problem Definition

Graph Edit Distance (GED). Formally, the edit distance between and , denoted by , is the number of edit operations in the optimal alignments that transform into , where an edit operation on a graph is an insertion or deletion of a vertex/edge or relabelling of a vertex 111Although other variants of GED exsit [Riesen, Emmenegger, and Bunke2013], we adopt this basic version.. Intuitively, if two graphs are identical (isomorphic), their GED is 0. Fig. 2 shows an example of GED between two simple graphs.

GED is transformed to a similarity metric ranging between 0 and 1 (more details can be found in Section 6.1). Our goal is to learn a neural network based function that takes two graphs as input and outputs the similarity score that can be mapped back to GED.

Figure 2: The GED between the graph to the left and the graph to the right is 3, as the transformation needs 3 edit operations: two edge deletions, and an edge insertion.

3 The Proposed Approach: GSimCNN

GSimCNN consists of the following sequential stages: 1) multiple Graph Convolutional Network layers generate vector representations for each node in the two graphs; 2) Similarity Matrix Generation layers compute the inner products between the embeddings of every pair of node, resulting in a similarity matrix capturing the node-node interaction scores; 3) Convolutional Neural Network layers

convert the similarity computation problem into a pattern recognition problem, which provides features to a

fully connected network to obtain a final predicted graph-graph similarity score.

3.1 Stage I: Node Embedding Generation

In Stage I, our goal is to represent a graph as a set of embedding vectors. We adopt Graph Convolutional Networks (GCN) [Kipf and Welling2016]

to generate node level embeddings, which is an inductive method that can be applied to unseen nodes. Different node types are represented by different colors and one-hot encoded as the initial node representation. For graphs with unlabeled nodes, we use the same constant vector as the initial representation.

The core operation, graph convolution, operates on the representation of a node, which is denoted as , and is defined as follows:

(1)

where is the set of the first-order neighbors of node plus node itself, is the degree of node plus 1, is the weight matrix associated with the -th GCN layer, is the bias, and

is the activation function.

The graph convolution operation aggregates the features from the first-order neighbors of the node. Sequentially stacking layers would cause the final representation of a node to include its -th order neighbors. In other words, the more GCN layers, the larger the scale of the learned embeddings.

3.2 Stage II: Similarity Matrix Generation

Once a graph is represented as a set of node embeddings, we can calculate the inner products between all possible pairs of node embeddings in the two graphs, resulting in a similarity matrix.

Different from pixels of images or words of sentences, nodes of a graph typically lack ordering. A different node ordering would lead to a different similarity matrix. Moreover, the CNNs require spatial locality preservation as described in Section 1. To alleviate these two issues, we utilize the breadth-first-search (BFS) node-ordering scheme proposed in GraphRNN [You et al.2018] to sort and reorder the node embeddings. Since BFS is performed on the graph, nearby nodes are ordered close to each other. It is worth noting that the BFS ordering scheme achieves a reasonable trade-off between efficiency and uniqueness of ordering, as the canonical graph ordering is NP-complete [Niepert, Ahmed, and Kutzkov2016], and the BFS ordering only requires quadratic operations in the worst case (i.e. complete graphs) [You et al.2018].

Besides the issues related to node permutation and spatial locality, the graph size invariance challenge must be addressed as well. One can fix the number of nodes in each graph by padding fake nodes to a pre-defined number.

3.3 Stage III: CNN Based Similarity Score Computation

Once the BFS ordering and resizing are performed, the similarity matrix is ready to be processed by CNNs. If each matrix is treated as an image, then the task of graph similarity measurement can be viewed as an image processing problem in which the goal is to discover the optimal node matching pattern in the image by applying CNNs.

At the end, the result is fed into multiple fully connected layers, so that a final similarity score is generated for the graph pair and

. The mean squared error loss function is used to train our model:

(2)

where is the set of training graph pairs, and is a one-to-one function that transforms the true GED into the true similarity score.

3.4 Time Complexity Analysis

The time complexity of generating node embeddings for a graph is  [Kipf and Welling2016], where is its number of edges. The similarity matrix generation has time complexity , including the BFS ordering and the matrix multiplication. Note that we can take advantage of GPU acceleration for the dense matrix multiplication. The overall time complexity of GSimCNN along with several baseline methods is shown in Table 1.

Model Reference Time Complexity
A* [Hart, Nilsson, and Raphael1968]
Beam [Neuhaus, Riesen, and Bunke2006] sub-exponential
Hungarian [Riesen and Bunke2009]
VJ [Fankhauser, Riesen, and Bunke2011]
HED [Fischer et al.2015]
GSimCNN this paper
Table 1: Time complexity comparison.

4 Connections with Set Matching

In this section, we look at GSimCNN from the perspective of set matching, by making theoretical connections with two types of graph matching methods: optimal assignment kernels for graph classification and bipartite graph matching for GED computation. Although focusing on graphs, set matching has broader applications in computer networking (e.g. Internet content delivery) [Maggs and Sitaraman2015], internet advertising (e.g. advertisement auctions) [Edelman, Ostrovsky, and Schwarz2007]

, biometrics (e.g. facial recognition)

[Leng, Moutafis, and Kakadiaris2015], etc.

4.1 Connection with Optimal Assignment Kernels

Graph kernels measure the similarity between two graphs, and have been extensively applied to the task of graph classification. Formally speaking, a valid kernel on a set is a function such that there is a real Hilbert space (feature space) and a feature map function such that for every and in , where denotes the inner product of . Among different families of graph kernels, optimal assignment kernels establish the correspondence between the parts of two graphs, and have many variants [Johansson and Dubhashi2015, Kriege, Giscard, and Wilson2016, Nikolentzos, Meladianos, and Vazirgiannis2017]. Let denote the set of all bijections between two sets of nodes, and , and Let denote a base kernel that measures the similarity between two nodes and , an optimal assignment graph kernel is then defined as

(3)

Intuitively, the optimal assignment graph kernels maximize the total similarity between the assigned parts. If the two sets are of different cardinality, one can add new objects with to the smaller set [Kriege, Giscard, and Wilson2016]

Let us take the Earth Mover’s Distance (EMD) kernel [Nikolentzos, Meladianos, and Vazirgiannis2017] as an example, since it is among the most similar methods to GSimCNN. It treats a graph as a bag of node embedding vectors, but instead of utilizing the pairwise inner products between node embeddings to approximate GED, it computes the optimal “travel cost” between two graphs, where the cost is defined as the -2 distance between node embeddings. Given two graphs with node embeddings and , it solves the following transportation problem [Rubner, Tomasi, and Guibas2000]:

(4)

where denotes the flow matrix, with being how much of node in travels to node in . It has been shown that if , the optimal solution satisfies [Balinski1961], satisfying the optimal bijection requirement of the assignment kernel. Even if , this can still be regarded as approximating an assignment problem [Fan, Su, and Guibas2017].

To show the relation between the EMD kernel and our approach, we consider GSimCNN as a mapping function that, given two graphs with node embeddings and , produces one score as the predicted similarity score, which is compared against the ground-truth similarity score:

(5)

where represents the Similarity Generation and CNN layers with parameters , which can potentially be replaced by any neural network transformation.

To further see the connection, we consider one CNN layer with one filter of size by without nonlinear activation, where . Then Eq. 5 becomes:

(6)

where is the convlutional filter, which can be viewed as “soft” matching version of in Problem (4), relaxing all the constraints. For each convolution operation of CNN, it can be seen as discovering the local matching scores (

); and from a global perspective, CNN selects the local best matching by max pooling strategy and combines the local optimal solution to get a global solution.

Compared with the EMD kernel, our method has two benefits. (1) The mapping function and the node embeddings and

are simultaneously learned through backpropagation, while the kernel method solves the assignment problem to obtain

and uses fixed node embeddings and , e.g. generated by the decomposition of the graph Laplacian matrix. Thus, GSimCNN is suitable for learning an approximation of the GED graph distance metric, while the kernel method cannot. (2) The best average time complexity of solving Problem (4) is [Pele and Werman2009], where denotes the number of total nodes in two graphs, while the convolution operation is in O() time.

4.2 Connection with Bipartite Graph Matching

Among the existing approximate GED computation algorithms, Hungarian [Riesen and Bunke2009] and VJ [Fankhauser, Riesen, and Bunke2011] are two classic ones based on bipartite graph matching. Similar to the optimal assignment kernels, Hungarian and VJ also find an optimal match between the nodes of two graphs. However, different from the EMD kernel, the assignment problem has stricter constraints: One node in can only be mapped to one other node in . Thus, the entries in the assignment matrix are either 0 or 1, denoting the operations transforming into , where . The assignment problem takes the following form:

(7)

The cost matrix reflects the GED model, and is defined as follows:

where denotes the cost of a substitution, denotes the cost of a node deletion, and denotes the cost of a node insertion. Note that the assignment matrix directly maps to the operations of cost matrix defined above. According to our GED definition described in Section 2, if the labels of node and node are the same, and 1 otherwise; .

Exactly solving this constrained optimization program would yield the exact GED solution [Fankhauser, Riesen, and Bunke2011], but it is NP-complete since it is equivalent to finding an optimal matching in a complete bipartite graph [Riesen and Bunke2009].

To efficiently solve the assignment problem, the Hungarian algorithm [Kuhn1955] and the Volgenant Jonker (VJ) [Jonker and Volgenant1987] algorithm are commonly used, which both run in cubic time. In contrast, GSimCNN takes advantage of the exact solutions of the instances of this problem during the training stage, and computes the approximate GED during testing in quadratic time, without the need for solving any optimization problem for a new graph pair.

5 Related Work

5.1 Network/Graph Embedding

Over the years, there have been a great number of works dealing with the representation of nodes [Hamilton, Ying, and Leskovec2017], and graphs [Ying et al.2018]. A great amount of graph-based applications have been tackled by neural network based methods, most of which are framed as node-level prediction tasks [Xu et al.2018] or single graph classification [Ying et al.2018]. In this work, we consider the task of graph similarity computation, which is under the general problem of graph matching.

5.2 Text and Graph Matching with Neural Networks

The use of CNNs to compare two natural language sentences has been explored [Hu et al.2014, He and Lin2016], yet we are still among the first to adopt it on the computation of graph-graph similarities. As for graph matching using neural networks in general, the only works we are aware of are: (1) Ktena et al. apply the GCN model (denoted as “GCNMax” and “GCNMean” in Section 6) to model the similarity between functional brain networks [Ktena et al.2017]. However, their model is trained by maximizing the similarity between graphs belonging to the same class for the task of graph classification. (2) Riba et al. employ a message passing neural network to model the graph-graph similarities [Riba et al.2018]

, but the goal is to classify whether a graph pair is similar or not, which is a classification task. To the best of our knowledge, we are among the first to adopt graph neural networks to predict the GED for two graphs, which is essentially a regression task.

6 Experiments

6.1 Datasets

Three real-world graph datasets are used for the experiments. A concise summary can be found in Table 2. Specifically, AIDS consists of 42,687 chemical compounds from the Developmental Therapeutics Program at NCI/NIH 7, out of which we randomly select 700 small graphs, where each node is labeled with one of 29 types. LINUX [Wang et al.2012] consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. IMDB [Yanardag and Vishwanathan2015] consists of 1500 ego-networks of movie actors/actresses with unlabeled nodes representing the people and edges representing the collaboration relationship. The nodes of LINUX and IMDB are unlabeled.

Dataset #Graphs #Pairs Min Max Mean Std
AIDS 700 490K 2 10 8.9 1.4
LINUX 1000 1M 4 10 7.7 1.5
IMDB 1500 2.25M 7 89 13.0 8.5
Table 2:

Statistics of datasets. “Min”, “Max”, “Mean”, and “Std” refer to the minimum, maximum, mean, and standard deviation of the graph sizes (number of nodes), respectively.

For each dataset, we randomly split 60%, 20%, and 20% of all the graphs as training set, validation set, and testing set, respectively. For each graph in the testing set, we treat it as a query graph, and let the model compute the similarity between the query graph and every graph in the training and validation sets.

Since graphs from AIDS and LINUX are relatively small, the exponential-time exact GED computation algorithm, A* [Hart, Nilsson, and Raphael1968], is used to compute the GEDs between all the graph pairs. For the IMDB dataset, however, A* can no longer be used, as “no currently available algorithm manages to reliably compute GED within reasonable time between graphs with more than 16 nodes” [Blumenthal and Gamper2018]. Instead, we take the minimum distance computed by Beam [Neuhaus, Riesen, and Bunke2006], Hungarian [Riesen and Bunke2009], and VJ [Fankhauser, Riesen, and Bunke2011]. The minimum is taken because their returned GEDs are guaranteed to be upper bounds of the true GEDs. Incidentally, the ICPR 2016 Graph Distance Contest 222https://gdc2016.greyc.fr/ also adopts this approach to handle large graphs.

To transform ground-truth GEDs into ground-truth similarity scores to train our model, we first normalize the GEDs: , where denotes the number of nodes of [Qureshi, Ramel, and Cardot2007], then adopt the exponential function to transform the normalized GED into a similarity score in the range of .

6.2 Baselines

Our baselines include two types of approaches, fast approximate GED computation algorithms and neural network based models.

The first category of baselines includes four classic algorithms for GED computation, whose time complexities can be found in Table. 1. (1) A*-Beamsearch (Beam), (2) Hungarian, and (3) VJ return upper bounds of the true GEDs. (2) and (3) are described in Section 4.2. (4) HED [Fischer et al.2015] is based on Hausdorff matching, and yields GEDs smaller than or equal to the actual GEDs.

The second category of baselines includes the following neural network architectures. (1) EmbAvg , (2) GCNMean and (3) GCNMax [Defferrard, Bresson, and Vandergheynst2016] are three neural network architectures which take the dot product of the graph-level embeddings of the two graphs to produce the similarity score. (1) simply takes the unweighted average of node embeddings, while (2) and (3) are the original GCN architectures based on graph coarsening with mean and max pooling, respectively. (4) GSimCNN is our complete model.

6.3 Parameter Settings

For the proposed model, to make a fair comparison with baselines, we use a single network architecture on all the datasets, and run the model using exactly the same test graphs as used in the baselines. We set the number of GCN layers to 3, and use ReLU as the activation function.

For the padding scheme, graphs in AIDS and LINUX are padded to 10 nodes, and graphs in IMDB are padded to 90 nodes. For the resizing scheme, all the similarity matrices are resized to 10 by 10. For the CNNs, we use the following architecture: conv(6,1,1,16), maxpool(2), conv(6,1,16,32), maxpool(2), conv(5,1,32,64), maxpool(2), conv(5,1,64,128), maxpool(3), conv(5,1,128,128), maxpool(3) (“conv(window size, kernel stride, input channels, output channels)”; “maxpool(pooling size)”).

We conduct all the experiments on a single machine with an Intel i7-6800K CPU and one Nvidia Titan GPU. As for training, we set the batch size to 128, use the Adam algorithm for optimization [Kingma and Ba2015], and fix the initial learning rate to 0.001. We set the number of iterations to 15000, and select the best model based on the lowest validation loss.

6.4 Evaluation Metrics

Mean Squared Error (mse). The mean squared error measures the average squared difference between the computed similarities and the ground-truth similarities.

We also adopt the following metrics to evaluate the ranking results. Spearman’s Rank Correlation Coefficient () [Spearman1904] and Kendall’s Rank Correlation Coefficient () [Kendall1938] measure how well the predicted ranking results match the true ranking results. Precision at (p@) is computed by taking the intersection of the predicted top results and the ground-truth top results divided by . Compared with p@, and evaluate the global ranking results instead of focusing on the top results.

6.5 Results

Effectiveness

The effectiveness results on the three datasets can be found in Table 3, 4, and 5. Our full model, GSimCNN, consistently achieves the best results on all metrics across the three datasets. It is worth noting that even our single level model with the padding scheme is very competitive with the baseline GED computation models, suggesting that the adoption of CNNs is quite effective.

Method mse( p@10 p@20
Beam
Hungarian
VJ
HED
EmbAvg
GCNMean
GCNMax
GSimCNN 0.807 0.863 0.714 0.514 0.580
Table 3: Results on AIDS.
Method mse( p@10 p@20
Beam
Hungarian
VJ
HED
EmbAvg
GCNMean
GCNMax
GSimCNN 0.141 0.988 0.951 0.983 0.961
Table 4: Results on LINUX.
Method mse( p@10 p@20
HED
EmbAvg
GCNMean
GCNMax
GSimCNN 6.455 0.661 0.562 0.552 0.598
Table 5: Results on IMDB. Beam, Hungarian, and VJ together are used to determine the ground-truth results.

6.6 Case Studies

We demonstrate three example queries, one from each dataset in Fig. 6. In each demo, the top row depicts the query along with the ground-truth ranking results, labeled with their normalized GEDs to the query. The bottom row shows the graphs returned by our model, each with its rank shown at the top. GSimCNN is able to retrieve graphs similar to the query. For example, in the case of LINUX (Fig. (b)b), the top 6 results are exactly the isomorphic graphs to the query.

(a) On AIDS. Different colors represent different node labels.
(b) On LINUX.
(c) On IMDB.
Figure 6: Query case studies.

7 Conclusion

We tackle the classic yet challenging problem of computing the Graph Edit Distance between two graphs. A novel CNN based neural network model is proposed, which takes any two graphs as input and outputs their similarity score. Our model, GSimCNN, transforms each graph into a set of node embeddings, and performs set matching to produce their similarity score. Theoretical comparisons of our model with graph matching methods provide insights into our adoption of CNNs, and provide a new direction for future research on the general problem of set matching.

Our model, GSimCNN runs in quadratic time with respect to the maximum number of nodes in the two graphs, which is among the fastest state-of-the-art computation methods. It also achieves state-of-the-art accuracy on three real graph datasets.

References

  • [Balinski1961] Balinski, M. L. 1961. Fixed-cost transportation problems. Naval Research Logistics Quarterly 8(1):41–54.
  • [Blumenthal and Gamper2018] Blumenthal, D. B., and Gamper, J. 2018. On the exact computation of the graph edit distance. Pattern Recognition Letters.
  • [Bunke and Shearer1998] Bunke, H., and Shearer, K. 1998. A graph distance metric based on the maximal common subgraph. Pattern recognition letters 19(3-4):255–259.
  • [Bunke1983] Bunke, H. 1983. What is the distance between graphs. Bulletin of the EATCS 20:35–39.
  • [Defferrard, Bresson, and Vandergheynst2016] Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, 3844–3852.
  • [Edelman, Ostrovsky, and Schwarz2007] Edelman, B.; Ostrovsky, M.; and Schwarz, M. 2007. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review 97(1):242–259.
  • [Fan, Su, and Guibas2017] Fan, H.; Su, H.; and Guibas, L. J. 2017. A point set generation network for 3d object reconstruction from a single image. In CVPR, volume 2,  6.
  • [Fankhauser, Riesen, and Bunke2011] Fankhauser, S.; Riesen, K.; and Bunke, H. 2011. Speeding up graph edit distance computation through fast bipartite matching. In GbRPR, 102–111. Springer.
  • [Fischer et al.2013] Fischer, A.; Suen, C. Y.; Frinken, V.; Riesen, K.; and Bunke, H. 2013. A fast matching algorithm for graph-based handwriting recognition. In GbRPR, 194–203. Springer.
  • [Fischer et al.2015] Fischer, A.; Suen, C. Y.; Frinken, V.; Riesen, K.; and Bunke, H. 2015. Approximation of graph edit distance based on hausdorff matching. Pattern Recognition 48(2):331–343.
  • [Hamilton, Ying, and Leskovec2017] Hamilton, W.; Ying, Z.; and Leskovec, J. 2017. Inductive representation learning on large graphs. In NIPS, 1024–1034.
  • [Hart, Nilsson, and Raphael1968] Hart, P. E.; Nilsson, N. J.; and Raphael, B. 1968.

    A formal basis for the heuristic determination of minimum cost paths.

    IEEE transactions on Systems Science and Cybernetics 4(2):100–107.
  • [He and Lin2016] He, H., and Lin, J. 2016. Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. In NAACL HLT, 937–948.
  • [Hu et al.2014] Hu, B.; Lu, Z.; Li, H.; and Chen, Q. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS, 2042–2050.
  • [Johansson and Dubhashi2015] Johansson, F. D., and Dubhashi, D. 2015. Learning with similarity functions on graphs using matchings of geometric embeddings. In KDD, 467–476. ACM.
  • [Jonker and Volgenant1987] Jonker, R., and Volgenant, A. 1987. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340.
  • [Kendall1938] Kendall, M. G. 1938. A new measure of rank correlation. Biometrika 30(1/2):81–93.
  • [Kingma and Ba2015] Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. ICLR.
  • [Kipf and Welling2016] Kipf, T. N., and Welling, M. 2016. Semi-supervised classification with graph convolutional networks. ICLR.
  • [Kriege, Giscard, and Wilson2016] Kriege, N. M.; Giscard, P.-L.; and Wilson, R. 2016. On valid optimal assignment kernels and applications to graph classification. In NIPS, 1623–1631.
  • [Ktena et al.2017] Ktena, S. I.; Parisot, S.; Ferrante, E.; Rajchl, M.; Lee, M.; Glocker, B.; and Rueckert, D. 2017. Distance metric learning using graph convolutional networks: Application to functional brain networks. In MICCAI, 469–477. Springer.
  • [Kuhn1955] Kuhn, H. W. 1955. The hungarian method for the assignment problem. Naval research logistics quarterly 2(1-2):83–97.
  • [Leng, Moutafis, and Kakadiaris2015] Leng, M.; Moutafis, P.; and Kakadiaris, I. A. 2015. Joint prototype and metric learning for set-to-set matching: Application to biometrics. In BTAS, 1–8.
  • [Liang and Zhao2017] Liang, Y., and Zhao, P. 2017. Similarity search in graph databases: A multi-layered indexing approach. In ICDE, 783–794. IEEE.
  • [Maggs and Sitaraman2015] Maggs, B. M., and Sitaraman, R. K. 2015. Algorithmic nuggets in content delivery. ACM SIGCOMM Computer Communication Review 45(3):52–66.
  • [Neuhaus, Riesen, and Bunke2006] Neuhaus, M.; Riesen, K.; and Bunke, H. 2006. Fast suboptimal algorithms for the computation of graph edit distance. In S+SSPR, 163–172. Springer.
  • [Niepert, Ahmed, and Kutzkov2016] Niepert, M.; Ahmed, M.; and Kutzkov, K. 2016. Learning convolutional neural networks for graphs. In ICML, 2014–2023.
  • [Nikolentzos, Meladianos, and Vazirgiannis2017] Nikolentzos, G.; Meladianos, P.; and Vazirgiannis, M. 2017. Matching node embeddings for graph similarity. In AAAI, 2429–2435.
  • [Pele and Werman2009] Pele, O., and Werman, M. 2009. Fast and robust earth mover’s distances. In ICCV, volume 9, 460–467.
  • [Qureshi, Ramel, and Cardot2007] Qureshi, R. J.; Ramel, J.-Y.; and Cardot, H. 2007. Graph based shapes representation and recognition. In GbRPR, 49–60. Springer.
  • [Riba et al.2018] Riba, P.; Fischer, A.; Lladós, J.; and Fornés, A. 2018. Learning graph distances with message passing neural networks. In ICPR, 2239–2244.
  • [Riesen and Bunke2008] Riesen, K., and Bunke, H. 2008.

    Iam graph database repository for graph based pattern recognition and machine learning.

    In S+SSPR, 287–297. Springer.
  • [Riesen and Bunke2009] Riesen, K., and Bunke, H. 2009. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing 27(7):950–959.
  • [Riesen, Emmenegger, and Bunke2013] Riesen, K.; Emmenegger, S.; and Bunke, H. 2013. A novel software toolkit for graph edit distance computation. In GbRPR, 142–151. Springer.
  • [Rubner, Tomasi, and Guibas2000] Rubner, Y.; Tomasi, C.; and Guibas, L. J. 2000.

    The earth mover’s distance as a metric for image retrieval.

    International journal of computer vision

    40(2):99–121.
  • [Spearman1904] Spearman, C. 1904. The proof and measurement of association between two things. The American journal of psychology 15(1):72–101.
  • [Wang et al.2012] Wang, X.; Ding, X.; Tung, A. K.; Ying, S.; and Jin, H. 2012. An efficient graph indexing method. In ICDE, 210–221. IEEE.
  • [Xiao et al.2008] Xiao, B.; Gao, X.; Tao, D.; and Li, X. 2008. Hmm-based graph edit distance for image indexing. International Journal of Imaging Systems and Technology 18(2-3):209–218.
  • [Xu et al.2018] Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.-i.; and Jegelka, S. 2018. Representation learning on graphs with jumping knowledge networks. ICML.
  • [Yanardag and Vishwanathan2015] Yanardag, P., and Vishwanathan, S. 2015. Deep graph kernels. In SIGKDD, 1365–1374. ACM.
  • [Ying et al.2018] Ying, R.; You, J.; Morris, C.; Ren, X.; Hamilton, W. L.; and Leskovec, J. 2018. Hierarchical graph representation learning with differentiable pooling. NIPS.
  • [You et al.2018] You, J.; Ying, R.; Ren, X.; Hamilton, W.; and Leskovec, J. 2018. Graphrnn: Generating realistic graphs with deep auto-regressive models. In ICML, 5694–5703.