1 A Multi-Scale Convolutional Model for Pairwise Graph Similarity
We introduce GSimCNN (Graph Similarity Computation via Convolutional Neural Networks) for predicting the similarity score between two graphs. As the core operation of graph similarity search, pairwise graph similarity computation is a challenging problem due to the NP-hard nature of computing many graph distance/similarity metrics.
We demonstrate our model using the Graph Edit Distance (GED) bunke1983distance as the example metric. It is defined as the number of edit operations in the optimal alignments that transform one graph into the other, where an edit operation can be an insertion or a deletion of a node/edge, or relabelling of a node. It is NP-hard zeng2009comparing and costly to compute in practice blumenthal2018exact .
The key idea is to turn the pairwise graph distance computation problem into a learning problem. This new approach not only offers parallelizability and efficiency due to the nature of neural computation, but also achieves significant improvement over state-of-the-art GED approximation algorithms.
Definitions We are given an undirected, unweighted graph with nodes. Node features are summarized in an matrix . We transform GED into a similarity metric ranging between 0 and 1. Our goal is to learn a neural network based function that takes two graphs as input and outputs the similarity score that can be transformed back to GED through a one-to-one mapping.
GSimCNN GSimCNN consists of the following sequential stages: 1) Multi-Scale Graph Convolutional Network kipf2016semi layers
generate vector representations for each node in the two graphs at different scales; 2)Graph Interaction layers compute the inner products between the embeddings of every pair of nodes in the two graphs, resulting in multiple similarity matrices capturing the node-node interaction scores at different scales; 3) Convolutional Neural Network layers
convert the similarity computation problem into a pattern recognition problem, which provides multi-scale features to afully connected network to obtain a final predicted graph-graph similarity score. An overview of our model is illustrated in Fig. 2.
1.1 Stage I: Multi-Scale GCN Layers
In Stage I, we generate node embeddings by multi-layer GCNs, where each layer is defined as kipf2016semi :
Here, is the set of all first-order neighbors of node plus node itself; is the degree of node plus 1; is the weight matrix associated with the -th GCN layer; is the bias; and
is the activation function.
In Fig. 2
, different node types are represented by different colors and one-hot encoded as the initial node representation. For graphs with unlabeled nodes, we use the same constant vector as the initial representation. As shown inkipf2016variational and hamilton2017inductive , the graph convolution operation aggregates the features from the first-order neighbors of the node. Stacking GCN layers would enable the final representation of a node to include its -th order neighbors.
Multi-Scale GCN The potential issue of using a deep GCN structure is that the embeddings may be too coarse after aggregating neighbors from multiple scales. The problem is especially severe when the two graphs are very similar, as the differences mainly lie in small substructures. Due to the fact that structural difference may occur at different scales, we extract the output of each GCN layer and construct multi-scale interaction matrices, which will be described in the next stage.
1.2 Stage II: Graph Interaction Layers
We calculate the inner products between all possible pairs of node embeddings between two graphs from different GCN layers, resulting in multiple similarity matrices . Since we later use CNNs to process these matrices, we utilize the breadth-first-search (BFS) node-ordering scheme proposed in you2018graphrnn to reorder the node embeddings, running in quadratic time in the worst case.
Max Padding Suppose and contain and
nodes, respectively. To reflect the difference in graph sizes in the similarity matrix, we padrows of zeros to the node embedding matrix of the smaller of the two graphs, so that both graphs contain nodes.
To apply CNNs to the similarity matrices, we apply bilinear interpolation, an image resampling techniquethevenaz2000image to resize every similarity matrix. The resulting similarity matrix has fixed shape , where
is a hyperparameter controlling the degree of loss of information due to the resampling.
The following equation summarizes a single Graph Interaction Layer:
where is the padded node embedding matrix with zero or nodes padded, and is the resizing function.
1.3 Stage III: CNN and Dense Layers
The similarity matrices at different scales are processed by multiple independent CNNs, turning the task of graph similarity measurement into an image processing problem. The filter of CNN detect the optimal node matching pattern in the image, and max pooling in CNN select the best matching. The CNN results are concatenated and fed into multiple fully connected layers, so that a final similarity scoreis generated for the graph pair and
. The mean squared error loss function is used to train the model.
2 Set Matching Based Graph Similarity Computation
Through GCN transformation, GSimCNN encodes the link structure around each node into its vector representation, and thus regards a graph as a set of node embeddings. It essentially reduces the link structure, and simplifies the graph similarity/distance computation into matching two sets. In this section, we formally define the general approach of using set matching to compute graph similarity/distance, and provide detailed theoretical analysis in Appendix A.
Graph transforming function: A graph transforming function transforms a graph into a set of objects, .
Set matching function: A set matching function takes two sets as input, and returns a score denoting the degree of matching between the two input sets.
In fact, the forward pass of GSimCNN can be interpreted as a two-step procedure: 1. Applying a GCN-based graph transforming function; 2. Applying a CNN-based set matching function. The Appendix A furnishes the comparisons with two types of graph distance algorithms, which would shed light on why GSimCNN works effectively.
3 Experiments on Graph Similarity Search
Graph similarity search is among the most important graph-based applications, e.g. finding the chemical compounds that are most similar to a query compound. The goal of these experiments is to demonstrate that GSimCNN can alleviate the computational burden while preserving a good performance of GED approximation. We train GSimCNN on three real graph datasets zeng2009comparing ; wang2012efficient ; yanardag2015deep , whose details 111We make the datasets used in this paper publicly available at https://drive.google.com/drive/folders/1BFj66jqzR_VlWgASEfNMHwAQZ967HV0W?usp=sharing. can be found in Appendix B.
We compare methods based on their ability to correctly compute the pairwise graph similarity and rank the database graphs for user query graphs. The training and validation sets contain 60% and 20% of graphs, respectively, and serve as the database graphs. The validation set is used for optimization of hyperparameters. The test set contains 20% of graphs, treated as the query graphs.
We compare against two sets of baselines: (1) Combinatorial optimization-based algorithms for approximate GED computation: Beamneuhaus2006fast , VJ fankhauser2011speeding , Hungarian riesen2009approximate , HED fischer2015approximation ; (2) Neural Network based models: Siamese MPNN ribalearning , EebAvg, GCNMean, GCNMax defferrard2016convolutional (see the Appendix C for details).
To transform ground-truth GEDs into ground-truth similarity scores to train our model, we first normalize the GEDs: , where denotes the number of nodes of qureshi2007graph , and then adopt the exponential function , an one-to-one function, to transform the normalized GED into a similarity score in the range of .
Effectiveness The results on the three datasets can be found in Table 1. We report Mean Squared Error (mse), Kendall’s Rank Correlation Coefficient () kendall1938new and Precision at (p@) for each model on the test set. As shown in Fig. 6. In each demo, the top row depicts the query along with the ground-truth ranking results, labeled with their normalized GEDs to the query. The bottom row shows the graphs returned by our model, each with its rank shown at the top. GSimCNN is able to retrieve graphs similar to the query.
* On AIDS and LINUX, A* is used as the ground truth. On the largest dataset, IMDB, A* runs too slow; Since Beam, Hungarian, and VJ are guaranteed to return upper bounds to the exact GEDs, we take the minimum of the three as the ground truth. This approach has been adopted by the ICPR 2016 Graph Distance Contest: https://gdc2016.greyc.fr/.
. The results are based on the split ratio of 6:2:2. We repeated 10 times on AIDS, and the standard deviation of mse is.
Efficiency In Fig. 3, the results are averaged across queries and in wall time. EmbAvg is the fastest method among all, but its performance is poor, since it simply takes the dot product between two graph-level embeddings (average of node embeddings) as the predicted similarity score. Beam and Hungarian run fast on LINUX, but due to their higher time complexity as shown in Table 2, they scale poorly on the largest dataset, IMDB. In general, neural network based models benefit from the parallelizability and acceleration provided by GPU, and in particular, our model GSimCNN achieves the best trade-off between running time and performance.
Future work will investigate the generation of edit sequences for better interpretability of the predicted similarity, the effects of the usage of other node embedding methods, e.g. GraphSAGE hamilton2017inductive , and the adoption of other graph similarity metrics.
Statement of Overlapping Work
At the time of submission, most content of this paper is under review for AAAI 2019.
A Connections with Set Matching
In this section, we present GSimCNN from the perspective of set matching, by making theoretical connections with two types of graph matching methods: optimal assignment kernels for graph classification and bipartite graph matching for GED computation. In fact, beyond graphs, set matching has broader applications in Computer Networking (e.g. Internet content delivery) maggs2015algorithmic
, Computer Vision (e.g. semantic visual matching)zanfir2018deep , Bioinformatics (e.g. protein alignment) zaslavskiy2009global , Internet Advertising (e.g. advertisement auctions) edelman2005advertising , Labor Markets (e.g. intern host matching) roth1984medical , etc. This opens massive possibilities for future work and suggests the potential impact of GSimCNN beyond the graph learning community.
a.1 Connection with Optimal Assignment Kernels
Graph kernels measure the similarity between two graphs, and have been extensively applied to the task of graph classification. Formally speaking, a valid kernel on a set is a function such that there is a real Hilbert space (feature space) and a feature map function such that for every and in , where denotes the inner product of .
Among different families of graph kernels, optimal assignment kernels establish the correspondence between parts of the two graphs, and have many variants frohlich2005optimal ; johansson2015learning ; kriege2016valid ; nikolentzos2017matching . Let denote the set of all bijections between two sets of nodes, and Let denote a base kernel that measures the similarity between two nodes and . An optimal assignment graph kernel is defined as
Intuitively, the optimal assignment graph kernels maximize the total similarity between the assigned parts. If the two sets are of different cardinalities, one can add new objects with to the smaller set kriege2016valid .
Let us take the Earth Mover’s Distance (EMD) kernel nikolentzos2017matching as an example, since it is among the most similar method to our proposed approach. It treats a graph as a bag of node embedding vectors, but instead of utilizing the pairwise inner products between node embeddings to approximate GED, it computes the optimal “travel cost” between two graphs, where the cost is defined as the -2 distance between node embeddings. Given two graphs with node embeddings and , it solves the following transportation problem rubner2000earth :
where denotes the flow matrix, with being how much of node in travels (or “flows”) to node in . In other words, the EMD between two graphs is the minimum amount of “work” that needs to be done to transform one graph to another, where the optimal transportation plan is encoded by .
It has been shown that if , the optimal solution satisfies balinski1961fixed , satisfying the optimal bijection requirement of the assignment kernel. Even if , this can still be regarded as approximating an assignment problem fan2017point .
To show the relation between the EMD kernel and our approach, we consider GSimCNN as a mapping function that, given two graphs with node embeddings and , produces one score as the predicted similarity score, which is compared against the ground-truth similarity score:
where represents the Graph Interaction and CNN layers but can potentially be replaced by any neural network transformation.
To further see the connection, we consider one CNN layer with one filter of size by , where . Then Eq. 5 becomes:
where is the convolutional filter.
Compared with the EMD kernel, our method has two benefits. (1) The mapping function and the node embeddings and
are simultaneously learned through backpropagation, while the kernel method solves the assignment problem to obtainand uses fixed node embeddings and , e.g. generated by the decomposition of the graph Laplacian matrix. Thus, GSimCNN is suitable for learning
an approximation of the GED graph distance metric, while the kernel method cannot. The typical usage of a graph kernel is to feed the graph-graph similarities into a SVM classifier for graph classification. (2) The best average time complexity of solving Eq.4 scales pele2009fast , where denotes the number of total nodes in two graphs, while the convolution operation is in O() time.
a.2 Connection with Bipartite Graph Matching
Among the existing approximate GED computation algorithms, Hungarian riesen2009approximate and VJ fankhauser2011speeding are two classic ones based on bipartite graph matching. Similar to the optimal assignment kernels, Hungarian and VJ also find an optimal match between the nodes of two graphs. However, different from the EMD kernel, the assignment problem has stricter constraints: One node in can be only mapped to one other node in . Thus, the entries in the assignment matrix are either 0 or 1, denoting the operations transforming into , where . The assignment problem takes the following form:
The cost matrix reflects the GED model, and is defined as follows:
where denotes the cost of a substitution, denotes the cost of a node deletion, and denotes the cost of a node insertion. According to our GED definition, if the labels of node and node are the same, and 1 otherwise; .
Exactly solving this constrained optimization program would yield the exact GED solution fankhauser2011speeding , but it is NP-complete since it is equivalent to finding an optimal matching in a complete bipartite graph riesen2009approximate .
To efficiently solve the assignment problem, the Hungarian algorithm kuhn1955hungarian and the Volgenant Jonker (VJ) jonker1987shortest algorithm are commonly used, which both run in cubic time. In contrast, GSimCNN takes advantage of the exact solutions of this problem during the training stage, and computes the approximate GED during testing in quadratic time, without the need for solving any optimization problem for a new graph pair.
a.3 Summary of Connections with Set Matching
To sum up, our model, GSimCNN, represents a new approach to modeling the similarities between graphs, by first transforming each graph into a set of node embeddings, where embeddings encode the link structure around each node, and then matching two sets of node embeddings. The entire model can be trained in an end-to-end fashion. In contrast, the other two approaches in Table 2 also model the graph-graph similarity by viewing a graph as a set, but suffer from limited learnability and cannot be trained end-to-end. Due to its neural network nature, the convolutional set matching approach enjoys flexibility and thus has the potential to be extended to solve other set matching problems.
|Optimal Alignment Kernels||EMD kernel nikolentzos2017matching||Node Embedding||Solver of Eq. 4|
|Bipartite Graph Matching||Hungarian riesen2009approximate , VJ fankhauser2011speeding||Nodes of||Solver of Eq. 7|
|Convolutional Set Matching||GSimCNN||Node Embedding||Graph Interaction + CNNs|
B Dataset Description
Three real-world graph datasets are used for the experiments. A concise summary can be found in Table 3.
AIDS AIDS is a collection of antivirus screen chemical compounds from the Developmental Therapeutics Program at NCI/NIH 7 111https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data., and has been used in several existing work on graph similarity search zeng2009comparing ; wang2012efficient . It contains 42,687 chemical compound structures with Hydrogen atoms omitted. We randomly select 700 graphs, each of which has 10 or less than 10 nodes.
LINUX The LINUX dataset was originally introduced in wang2012efficient . It consists of 48,747 Program Dependence Graphs (PDG) generated from the Linux kernel. Each graph represents a function, where a node represents one statement and an edge represents the dependency between the two statements. We randomly select 1000 graphs of equal or less than 10 nodes each.
IMDB The IMDB dataset yanardag2015deep (named “IMDB-MULTI”) consists of 1500 ego-networks of movie actors/actresses, where there is an edge if the two people appear in the same movie. To test the scalability and efficiency of our proposed approach, we use the full dataset without any selection.
Since the GED computation is pairwise, it is necessary to take the number of pairs into consideration. There are 294K, 0.6M and 1.35M total number of training graph pairs in the AIDS, LINUX and IMDB dataset, respectively.
|LINUX||Program Dependence Graphs||1||1000||4||10||7.7||1.5|
C Baseline Details
Our baselines include two types of approaches, fast approximate GED computation algorithms and neural network based models.
The first category of baselines includes four classic algorithms for GED computation. (1) A*-Beamsearch (Beam), (2) Hungarian, and (3) VJ return upper bounds of the true GEDs. (2) and (3) are described in Section A.2. (4) HED fischer2015approximation is based on Hausdorff matching, and yields GEDs smaller than or equal to the actual GEDs. Therefore, Beam, Hungarian, and VJ are used to determine the ground truth for IMDB without considering HED.
The second category of baselines includes the following neural network architectures. (1) Siamese MPNN ribalearning , (2) EmbAvg , (3) GCNMean and (4) GCNMax defferrard2016convolutional are four neural network architectures. (1) generates all the pairwise node embedding similarity scores, and for each node, it finds one node in the other graph with the highest similarity score. It simply sums up all these similarity scores as the final result. (2), (3), and (4) take the dot product of the graph-level embeddings of the two graphs to produce the similarity score. (2) takes the unweighted average of node embeddings as the graph embedding. (3) and (4) adopt the original GCN architectures based on graph coarsening with mean and max pooling, respectively, to gain the graph embedding. GSimCNN is our complete model with three levels of comparison granularities.
D Parameter Setting and Implementation Details
For the proposed model, we use the same network architecture on all the datasets. We set the number of GCN layers to 3, and use ReLU as the activation function. For the resizing scheme, all the similarity matrices are resized to 10 by 10. For the CNNs, we use the following architecture: conv(6,1,1,16), maxpool(2), conv(6,1,16,32), maxpool(2), conv(5,1,32,64), maxpool(2), conv(5,1,64,128), maxpool(3), conv(5,1,128,128), maxpool(3) (“conv(window size, kernel stride, input channels, output channels)”; “maxpool(pooling size)”).
GSimCNN is implemented using TensorFlow, and tested on a single machine with an Intel i7-6800K CPU and one Nvidia Titan GPU. As for training, we set the batch size to 128, use the Adam algorithm for optimizationkingma2014adam , and fix the initial learning rate to 0.001. We set the number of iterations to 15000, and select the best model based on the lowest validation loss.
E Discussion and Result Analysis
e.1 Comparison between GSimCNN and Baselines
The classic algorithms for GED computation (e.g. A*, Beam, Hungarian, VJ, HED, etc.) usually require rather complicated design and implementation based on discrete optimization or combinatorial search. In contrast, GSimCNN is learnable and can take advantage of the exact solutions of GED computation during the training stage. Regarding time complexity, GSimCNN computes the approximate GED in quadratic time, without the need for solving any optimization problem for a new graph pair. In fact, GSimCNN computes the similarity score. However, it can be mapped back to the corresponding GED.
For the simple neural network based approaches:
1. For EmbAvg, GCNMean and GCNMax, they are all calculating the similarity score based on the inner product of graph embeddings. For EmbAvg, it first takes the unweighted average of node embeddings to gain the graph embedding, while GCNMean and GCNMax adopt graph coarsening with mean and max pooling, respectively, to obtain the graph embedding. The potential issue for these models are: (1) They fail to leverage the information from fine-grained node-level comparisons; (2) They calculate the final result based on the inner product between two graph embeddings, without any module to learn the graph level interactions. In contrast, GSimCNN constructs multi-scale similarity matrices to encode the node-level interaction at different scales, and adopts CNNs to detect the matching patterns afterwards.
2. For Siamese MPNN, it also computes all the pairwise node embedding similarity scores, like GSimCNN. However, it goes through all the nodes in both graphs, and for every node, it finds one node in the other graph with the highest similarity score, and simply sums up all these similarity scores as the final result with no trainable components afterwards. Our model, instead, is equipped with the learnable CNN kernels and dense layers to extract matching patterns with similarity scores from multiple scales.
For efficiency, notice that A* can no longer be used to provide the ground truth for IMDB, the largest dataset, as “no currently available algorithm manages to reliably compute GED within reasonable time between graphs with more than 16 nodes” blumenthal2018exact . This not only shows the significance of time reduction for computing pairwise graph similarities/distances, but also highlights the challenges of creating a fast and accurate graph similarity search system.
As seen in Table 1 and Fig. 4 in the main paper, GSimCNN strikes a good balance between effectiveness and efficiency. Specifically, GSimCNN achieves the smallest error, the best ranking performance, and great time reduction on the task of graph similarity search.
e.2 Comparison between GSimCNN and Simple Variants of GSimCNN
The effectiveness of multiple levels of comparison over a single level can be seen from the performance boost from the last two rows of Table 4. The improvement is especially significant on IMDB, which can be attributed to the large average graph size, as seen from Table 3
. The large variance of graph sizes associated with IMDB also favors the proposed resizing scheme over the padding scheme, which is also reflected in the results. On AIDS and LINUX, the use of resizing does not improve the performance much, but on IMDB, the improvement is much more significant.
F Extra Visualization
A few more visualizations are included in Fig. 10, 14, 18, 22, and 26. In each figure, two similarity matrices are visualized, the left showing the similarity matrix between the query graph and the most similar graph, the right showing the similarity matrix between the query and the least similar graph.
Statement of Overlapping Work
At the time of submission, most contents of this paper are under review for AAAI 2019.
-  Xiaoli Wang, Xiaofeng Ding, Anthony KH Tung, Shanshan Ying, and Hai Jin. An efficient graph indexing method. In ICDE, pages 210–221. IEEE, 2012.
-  H Bunke. What is the distance between graphs. Bulletin of the EATCS, 20:35–39, 1983.
-  Zhiping Zeng, Anthony KH Tung, Jianyong Wang, Jianhua Feng, and Lizhu Zhou. Comparing stars: On approximating graph edit distance. PVLDB, 2(1):25–36, 2009.
-  David B Blumenthal and Johann Gamper. On the exact computation of the graph edit distance. Pattern Recognition Letters, 2018.
-  Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. ICLR, 2016.
Thomas N Kipf and Max Welling.
Variational graph auto-encoders.
NIPS Workshop on Bayesian Deep Learning, 2016.
-  Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NIPS, pages 1024–1034, 2017.
-  Jiaxuan You, Rex Ying, Xiang Ren, William Hamilton, and Jure Leskovec. Graphrnn: Generating realistic graphs with deep auto-regressive models. In ICML, pages 5694–5703, 2018.
-  Philippe Thévenaz, Thierry Blu, and Michael Unser. Image interpolation and resampling. Handbook of medical imaging, processing and analysis, 1(1):393–420, 2000.
-  Pinar Yanardag and SVN Vishwanathan. Deep graph kernels. In SIGKDD, pages 1365–1374. ACM, 2015.
-  Michel Neuhaus, Kaspar Riesen, and Horst Bunke. Fast suboptimal algorithms for the computation of graph edit distance. In S+SSPR, pages 163–172. Springer, 2006.
-  Stefan Fankhauser, Kaspar Riesen, and Horst Bunke. Speeding up graph edit distance computation through fast bipartite matching. In GbRPR, pages 102–111. Springer, 2011.
-  Kaspar Riesen and Horst Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision computing, 27(7):950–959, 2009.
-  Andreas Fischer, Ching Y Suen, Volkmar Frinken, Kaspar Riesen, and Horst Bunke. Approximation of graph edit distance based on hausdorff matching. Pattern Recognition, 48(2):331–343, 2015.
-  Pau Riba, Andreas Fischer, Josep Lladós, and Alicia Fornés. Learning graph distances with message passing neural networks. In ICPR, pages 2239–2244, 2018.
-  Michaël Defferrard, Xavier Bresson, and Pierre Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In NIPS, pages 3844–3852, 2016.
-  Rashid Jalal Qureshi, Jean-Yves Ramel, and Hubert Cardot. Graph based shapes representation and recognition. In GbRPR, pages 49–60. Springer, 2007.
-  Maurice G Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
Peter E Hart, Nils J Nilsson, and Bertram Raphael.
A formal basis for the heuristic determination of minimum cost paths.IEEE transactions on Systems Science and Cybernetics, 4(2):100–107, 1968.
-  Bruce M Maggs and Ramesh K Sitaraman. Algorithmic nuggets in content delivery. ACM SIGCOMM Computer Communication Review, 45(3):52–66, 2015.
-  Andrei Zanfir and Cristian Sminchisescu. Deep learning of graph matching. In CVPR, pages 2684–2693, 2018.
-  Mikhail Zaslavskiy, Francis Bach, and Jean-Philippe Vert. Global alignment of protein–protein interaction networks by graph matching methods. Bioinformatics, 25(12):i259–1267, 2009.
-  Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. American Economic Review, 97(1):242–259, March 2007.
Alvin E Roth.
The evolution of the labor market for medical interns and residents: A case study in game theory.Journal of Political Economy, 92:991–1016, 1984.
-  Holger Fröhlich, Jörg K Wegner, Florian Sieker, and Andreas Zell. Optimal assignment kernels for attributed molecular graphs. In ICML, pages 225–232. ACM, 2005.
-  Fredrik D Johansson and Devdatt Dubhashi. Learning with similarity functions on graphs using matchings of geometric embeddings. In KDD, pages 467–476. ACM, 2015.
-  Nils M Kriege, Pierre-Louis Giscard, and Richard Wilson. On valid optimal assignment kernels and applications to graph classification. In NIPS, pages 1623–1631, 2016.
-  Giannis Nikolentzos, Polykarpos Meladianos, and Michalis Vazirgiannis. Matching node embeddings for graph similarity. In AAAI, pages 2429–2435, 2017.
Yossi Rubner, Carlo Tomasi, and Leonidas J Guibas.
The earth mover’s distance as a metric for image retrieval.International journal of computer vision, 40(2):99–121, 2000.
-  Michel L Balinski. Fixed-cost transportation problems. Naval Research Logistics Quarterly, 8(1):41–54, 1961.
-  Haoqiang Fan, Hao Su, and Leonidas J Guibas. A point set generation network for 3d object reconstruction from a single image. In CVPR, volume 2, page 6, 2017.
-  Ofir Pele and Michael Werman. Fast and robust earth mover’s distances. In ICCV, volume 9, pages 460–467, 2009.
-  Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 2(1-2):83–97, 1955.
-  Roy Jonker and Anton Volgenant. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38(4):325–340, 1987.
-  Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ICLR, 2015.