Graph edit distance (GED) is a popular similarity measurement between graphs, which lies in the core of many vision and pattern recognition tasks including image matching[ChoICCV13], signature verification [MaergnerPRL19], scene-graph edition [ChenECCV20], drug discovery [RiesenSSPR08], and case-based reasoning [ZeyenICCBR20]. In general, GED algorithms aim to find an optimal edit path from source graph to target graph with minimum edit cost, which is inherently an NP-complete combinatorial problem [AbuICPRAM15]:
where denote the set of all possible “edit paths” transforming source graph to target graph . measures the cost of edit operation .
Exact GED solvers [AbuICPRAM15, RiesenMLG07]
guarantee to find the optimal solution under dynamic condition, at the cost of poor scalability on large graphs, and these exact solvers heavily rely on heuristics to estimate the corresponding graph similarity based on the current partial solution. Recent efforts in deep graph similarity learning[BaiWSDM19, BaiAAAI20, LiICML19]
adopt graph neural networks[KipfICLR17, ScarselliNN09] to directly regress graph similarity scores, without explicitly incorporating the intrinsic combinatorial nature of GED, hence fail to recover the edit path. However, the edit path is often of the central interest in many applications [ChenECCV20, ChoICCV13] and most GED works [AbuICPRAM15, RiesenIVC09, FankGBRPR11, ZengVLDB09, RiesenMLG07] still are more focused on finding the edit path itself.
As the growth of graph size, it calls for more scalable GED solvers which are meanwhile expected to recover the exact edit path. However, these two merits cannot both hold by existing methods. As discussed above, deep learning-based solvers have difficulty in recovering the edit path while the learning-free methods suffer scalability issue. In this paper, we are aimed to design a hybrid solver by combining the best of the two worlds.
Specifically, we resort to A* algorithm [RiesenMLG07]
which is a popular solution among open source GED softwares[ChangICDE20, RiesenGithub], and we adopt neural networks to predict similarity scores which are used to guide A* search, in replacement of manually designed heuristics in traditional A*. We want to highlight our proposed Graph Edit Neural Network (GENN) in two aspects regarding the dynamic programming concepts: Firstly, we propose to reuse the previous embedding information given a graph modification (e.g. node deletion) where among the states of A* search tree the graph nodes are deleted progressively111To distinguish the “nodes” in graphs and the “nodes” in the search tree, we name “state” for the ones in the search tree.; Secondly, we propose to learn more effective heuristic to avoid unnecessary exploration over suboptimal branches to achieve significant speed-up.
The contributions made in this paper are:
1) We propose the first (to our best knowledge) deep network solver for GED, where a search tree state selection heuristic is learned by dynamic graph embedding. It outperforms traditional heuristics in efficacy.
2) Specifically, we devise a specific graph embedding method in the spirit of dynamic programming to reuse the previous computation to the utmost extent. In this sense, our method can be naturally integrated with the A* procedure where a dynamical graph similarity prediction is involved after each graph modification, achieving much lower complexity compared to vanilla graph embeddings.
3) Experimental results on real-world graph data show that our learning-based approach achieves higher accuracy than state-of-the-art manually designed inexact solvers [FankGBRPR11, RiesenIVC09]. It also runs much faster than A* exact GED solvers [BergmannInfoSys14, RiesenMLG07] that perform exhaustive search to ensure the global optimum, with comparable accuracy.
2 Related Work
2.1 Traditional GED Solvers
Exact GED solvers. For small-scale problems, an exhaustive search can be used to find the global optimum. Exact methods are mostly based on tree-search algorithms such as A* algorithm [RiesenMLG07], whereby a priority queue is maintained for all pending states to search, and the visiting order is controlled by the cost of the current partial edit path and a heuristic prediction on the edit distance between the remaining subgraphs [RiesenIVC09, ZengVLDB09]
. Other combinatorial optimization techniques, e.g. depth-first branch-and-bound[AbuICPRAM15]
and linear programming lower bound[LerougePR17] can also be adopted to prune unnecessary branches in the searching tree. However, exact GED methods are too time-consuming and they suffer from poor scalability on large graphs [AbuPR17].
Inexact GED solvers aim to mitigate the scalability issue by predicting sub-optimal solutions in (usually) polynomial time. To our knowledge, bipartite matching based methods [FankGBRPR11, RiesenIVC09, ZengVLDB09] so far show competitive trade-off between time and accuracy, where edge edition costs are encoded into node costs and the resulting bipartite matching problem can be solved in polynomial time by either Hungarian [KuhnNavalResearch55, RiesenIVC09] or Volgenant-Jonker [FankGBRPR11, VJ87] algorithm. Beam search [RiesenGithub] is the greedy version of the exact A* algorithm. Another line of works namely approximate graph matching [ChoECCV10, JiangNIPS17, WangPAMI17, YanICMR16, YuNIPS18, ZhouCVPR12] are closely related to inexact GED, and there are efforts adopting graph matching methods e.g. IPFP [LeordeanuNIPS09] to solve GED problems [BougleuxICPR16]. Two drawbacks in inexact solvers are that they rely heavily on human knowledge and their solution qualities are relatively poor.
2.2 Deep Graph Similarity Learning
Regression-based Similarity Learning.
The recent success in machine learning on non-euclidean data (i.e. graphs) via GNNs[FeyCVPR18, KipfICLR17, ScarselliNN09, ZhouArxiv18] has encouraged researchers to design approximators for graph similarity measurements such as GED. SimGNN [BaiWSDM19] first formulates graph similarity learning as a regression task, where its GCN [KipfICLR17] and attention [VaswaniNIPS17] layers are supervised by GED scores solved by A* [RiesenGithub]. Bai et al. [BaiAAAI20] extends their previous work by processing a multi-scale node-wise similarity map using CNNs. Li et al. [LiICML19] propose a cross-graph module in feed-forward GNNs which elaborates similarity learning. Such a scheme is also adopted in information retrieval, where [DaiSIGIR20] adopts a convolutional net to predict the edit cost between texts. However, all these regression models can not predict an edit path, which is mandatory in the GED problem.
Deep Graph Matching. As another combinatorial problem closely related to GED, there is increasing attention in developing deep learning graph matching approaches [FeyICLR20, JiangArxiv19, WangICCV19] since the seminal work [ZanfirCVPR18], and many researchers [RolinekECCV20, WangICCV19, WangArxiv19, YuICLR20] start to take a combinatorial view of graph matching learning rather than a regression task. Compared to graph similarity learning methods, deep graph matching can predict the edit path, but they are designated to match similarly structured graphs and lack particular mechanisms to handle node/edge insertion/deletions. Therefore, modification is needed to fit deep graph matching methods into GED, which is beyond the scope of this paper.
2.3 Dynamic Graph Embedding
The major line of graph embedding methods [FeyCVPR18, KipfICLR17, ScarselliNN09, ZhouArxiv18] assumes that graphs are static which limit their application on real-world graphs that evolve over time. A line of works namely dynamic graph embedding [pareja2020evolvegcn, manessi2020dynamic, zheng2019addgraph]
aims to solve such issue, whereby recurrent neural networks (RNNs) are typically combined with GNNs to capture the temporal information in graph evolution. The applications include graph sequence classification[manessi2020dynamic], dynamic link prediction [pareja2020evolvegcn]
, and anomaly detection[zheng2019addgraph]. Dynamic graph embedding is also encountered in our GED learning task, however, all these aforementioned works cannot be applied to our setting where the graph structure evolves at different states of the search tree, instead of time steps.
3 Our Approach
3.1 Preliminaries on A* Algorithm for GED
To exactly solve the GED problem, researchers usually adopt tree-search based algorithms which traverse all possible combinations of edit operations. Among them, A* algorithm is rather popular [RiesenIVC09, RiesenGithub, RiesenMLG07, ChangICDE20] and we base our learning method on it. In this section, we introduce notations for GED and discuss the key components in A* algorithm.
GED aims to find the optimal edit path with minimum edit cost, to transform the source graph to the target graph , where . We denote , as the nodes in the source graph and the target graph, respectively, and as the “void node”. Possible node edit operations include node substitution , node insertion and node deletion , and the cost of each operation is defined by the problem. As shown in Fig. 2, the edge editions can be induced given node editions, therefore only node editions are explicitly considered in A* algorithm.222Node substitution can be viewed as node-to-node matching between two graphs, and node insertion/deletion can be viewed as matching nodes in source/target graph to the void node, respectively. The concepts “matching” and “edition” may interchange with each other through this paper.
Alg. 1 illustrates a standard A* algorithm in line with [RiesenIVC09, RiesenMLG07]. A priority queue is maintained where each state of the search tree contains a partial solution to the GED problem. As shown in Fig. 2, the priority of each state is defined as the summation of two metrics: representing the cost of the current partial solution which can be computed exactly, and means the heuristic prediction of GED between the unmatched subgraphs. A* always explores the state with minimum at each iteration and the optimality is guaranteed if holds for all partial solutions [RiesenIVC09], where means the optimal edit cost between the unmatched subgraphs.
A proper is rather important to speed up the algorithm, and we discuss three variants of A* accordingly: 1) If , one can directly find the optimal path greedily. However, computing requires another exponential-time solver which is intractable. 2) Heuristics can be utilized to predict where . Hungarian bipartite heuristic [RiesenMLG07] is among the best-performing heuristic where the time complexity is . In our experiments, Hungarian-A* [RiesenMLG07] is adopted as the baseline traditional exact solver. 3) Plain-A* is the simplest, where it always holds and such strategy introduces no overhead when computing . However, the search tree may become too large without any “look ahead” on the future cost.
The recent success of graph similarity learning [BaiWSDM19, BaiAAAI20, LiICML19] inspires us to predict high-quality which is close to in a cost-efficient manner via learning. In this paper, we propose to mitigate the scalability issue of A* by predicting via dynamic graph embedding networks, where is efficiently learned and predicted and the suboptimal branches in A* are pruned. It is worth noting that we break the optimality condition , but the loss of accuracy is acceptable, as shown in experiments.
3.2 Graph Edit Neural Network
An overview of our proposed Graph Edit Neural Network-based A* (GENN-A*) learning algorithm is shown in Fig. 3. Our GENN-A* can be split into node embedding module (Sec. 3.2.1), dynamic embedding technique (Sec. 3.2.2), graph similarity prediction module (Sec. 3.2.3) and finally the training procedure (Sec. 3.2.4).
3.2.1 Node Embedding Module
The overall pipeline of our GENN is built in line with SimGNN [BaiWSDM19], and we remove the redundant histogram module in SimGNN in consideration of efficiency. Given input graphs, node embeddings are computed via GNNs.
Firstly, the node embeddings are initialized as the one-hot encoding of the node degree. For graphs with node labels (e.g. molecule graphs), we encode the node labels by one-hot vector and concatenate it to the degree embedding. The edges can be initialized as weighted or unweighted according to different definitions of graphs.
GNN backbone. Based on different types of graph data, Graph Convolutional Network (GCN) [KipfICLR17] is utilized for ordinary graph data (e.g. molecule graphs and program graphs) and SplineCNN [FeyCVPR18] is adopted for graphs built from 2D images, considering the recent success of adopting spline kernels to learn geometric features [FeyICLR20, RolinekECCV20]. The node embeddings obtained by the GNN backbone are cached for further efficient dynamic graph embedding. We build three GNN layers for our GENN in line with [BaiWSDM19].
3.2.2 Dynamic Embedding with A* Search Tree
A* is inherently a dynamic programming (DP) algorithm where matched nodes in partial solutions are progressively masked. When solving GED, each state of A* contains a partial solution and in our method embedding networks are adopted to predict the edit distance between two unmatched subgraphs. At each state, one more node is masked out in the unmatched subgraph compared to its parent state. Such a DP setting differs from existing so-called dynamic graph embedding problems [pareja2020evolvegcn, manessi2020dynamic, zheng2019addgraph] and calls for efficient cues since the prediction of is encountered at every state of the search tree. In this section, we discuss and compare three possible dynamic embedding approaches, among which our proposed GENN is built based on DP concepts.
Vanilla GNN. The trivial way of handling the dynamic condition is that when the graph is modified, a complete feed-forward pass is called for all nodes in the new graph. However, such practice involves redundant computation, which is discussed as follows. We denote as the number of nodes, as embedding dimensions, and as the number of GNN layers. Assuming fully-connected graph as the worst case, the time complexity of vanilla GNN is and no caching is needed.
Exact Dynamic GNN. As shown in the second row of Fig. 4, when a node is masked, only the embeddings of neighboring nodes are affected. If we cache all intermediate embeddings of the forward pass, one can compute the exact embedding at a minimum computational cost. Based on the message-passing nature of GNNs, at the k-th convolution layer, only the k-hop neighbors of the masked node are updated. However, the worst-case time complexity is still (for fully-connected graphs), and it requires memory cache for all convolution layers. If all possible subgraphs are cached for best time efficiency, the memory cost grows to which is unacceptable. Experiment result shows that the speed-up of this strategy is negligible with our testbed.
Our GENN. As shown in the last row of Fig. 4, we firstly perform a forward convolution pass and cache the embeddings of the last convolution layer. During A* algorithm, if some nodes are masked out, we simply delete their embeddings from the last convolution layer and feed the remaining embeddings into the similarity prediction module. Our GENN involves single forward pass which is negligible, and the time complexity of loading caches is simply and the memory consumption of caching is .
Our design of the caching scheme of GENN is mainly inspired by DP: given modification on the input graph (node deletion in our A* search case), the DP algorithm reuses the previous results for further computations in consideration of best efficiency. In our GENN, the node embeddings are cached for similarity computation on its subgraphs. In addition, DP algorithms tend to minimize the exploration space for best efficiency, and our learned prunes sub-optimal branches more aggressively than traditional heuristics which speeds up the A* solver.
3.2.3 Graph Similarity Prediction
After obtaining the embedding vectors from cache, the attention module and neural tensor network are called to predict the similarity score. For notation simplicity, our discussions here are based on full-sized, original input graphs.
Attention module for graph-level embedding. Given node-level embeddings, the graph-level embedding is obtained through attention mechanism [VaswaniNIPS17]. We denote as the node embeddings from GNN backbone. The global keys are obtained by mean aggregation followed with nonlinear transform:
where is performed on the first dimension (node dimension) and is learnable attention weights. Aggregation coefficients are computed from and :
where is the scaling factor and means sigmoid. The graph-level embedding is obtained by weighted summation of node embeddings based on aggregation coefficients :
Neural Tensor Network for similarity prediction. Neural Tensor Network (NTN) [SocherNIPS13] is adopted to measure the similarity between :
where are learnable, the first term means computing for all and then stacking them, denotes a fully-connected layer with sigmoid activation, and means to concat along the last dimension. controls the number of channels in NTN and we empirically set .
In line with [BaiWSDM19], the model prediction lies within which represents a normalized graph similarity score with the following connection to GED:
For partial edit path encountered in A* algorithm, the predicted similarity score can be transformed to following Eq. 7:
where means the number of nodes in the unmatched subgraph. The time complexities of attention and NTN are and , respectively. Since the convolution layers are called only once which is negligible, and the time complexity of loading cached GENN embedding is , the overall time complexity of each prediction is . Our time complexity is comparable to the best-known learning-free prediction of [RiesenMLG07] which is .
|Beam Search [RiesenGithub]||✓||12.090||0.609||0.481||9.268||0.827||0.973||1.820||0.815||0.725|
. The evaluation metrics are defined and used by[BaiWSDM19, BaiAAAI20]: mse stands for mean square error between predicted similarity score and ground truth similarity score. means the Spearman’s correlation between prediction and ground truth. p@10 means the precision of finding the closest graph among the predicted top 10 most similar ones. Willow-Cars is not compared with deep learning methods because optimal GED labels are not available for the training set. The AIDS and LINUX peer method results are quoted from [BaiAAAI20].
3.2.4 Supervised Dynamic Graph Learning
The training of our GENN consists of two steps: Firstly, GENN weights are initialized with graph similarity score labels from the training dataset. Secondly, the model is finetuned with the optimal edit path solved by A* algorithm. The detailed training procedure is listed in Alg. 2.
Following deep graph similarity learning peer methods [BaiWSDM19, BaiAAAI20], our GENN weights are supervised by ground truth labels provided by the dataset. For datasets with relatively small graphs, optimal GED scores can be solved as ground truth labels. In cases where optimal GEDs are not available, we can build the training set based on other meaningful measurements, e.g. adopting semantic node matching ground truth to compute GED labels.
We further propose a finetuning scheme of GENN to better suit the A* setting. However, tuning GENN with the states of the search tree means we require labels of , while solving the for an arbitrary partial edit path is again NP-complete. Instead of solving as many as needed, here we propose an efficient way of obtaining multiple labels by solving the GED only once.
Given an optimal edit path and the corresponding , for any partial edit path , there holds .
If , then the minimum edit cost following is larger than , therefore is not a partial optimal edit path, which violates . If , it means that there exists a better edit path whose cost is smaller than , which violates the condition that is the optimal edit path. Thus, . ∎
Based on Theorem 1, there holds for any partial optimal edit path. Therefore, if we solve an optimal with node editions, optimal partial edit paths can be used for finetuning. In experiments, we randomly select 200 graph pairs for finetuning since we find it adequate for convergence.
4.1 Settings and Datasets
We evaluate our learning-based A* method on three challenging real-world datasets: AIDS, LINUX [WangICDE12], and Willow dataset [ChoICCV13].
AIDS dataset contains chemical compounds evaluated for the evidence of anti-HIV activity333https://wiki.nci.nih.gov/display/NCIDTPdata/AIDS+Antiviral+Screen+Data. AIDS dataset is pre-processed by [BaiWSDM19] who remove graphs more than 10 nodes and the optimal GED between any two graphs is provided. Following [BaiWSDM19], we define the node edition cost if are different atoms, else . The node insertion and deletion costs are both defined as 1. The edges are regraded as non-attributed, therefore edge substitution cost and edge insertion/deletion cost .
LINUX dataset is proposed by [WangICDE12] which contains Program Dependency Graphs (PDG) from the LINUX kernel, and the authors of [BaiWSDM19] also provides a pre-processed version where graphs are with maximum 10 nodes and optimal GED values are provided as ground truth. All nodes and edges are unattributed therefore the substitution cost is 0, and the insertion/deletion cost is 1.
Willow dataset is originally proposed by [ChoICCV13]
for semantic image keypoint matching problem, and we validate the performance of our GENN-A* on computer vision problems with the Willow dataset. All images from the same category share 10 common semantic keypoints. “Cars” dataset is selected in our experiment. With Willow-Cars dataset, graphs are built with 2D keypoint positions by Delaunay triangulation, and the edge edition cost is defined aswhere are the length of two edges. Edge insertion/deletion costs of are defined as . All edge lengths are normalized by 300 for numerical concerns. The node substitution has 0 cost, and therefore node insertion/deletion are prohibited. We build the training set labels by computing the GED based on semantic keypoint matching relationship, and it is worth noting such GEDs are different from the optimal ones. However, experiment results show that such supervision is adequate to initialize the model weights of GENN.
Among all three datasets, LINUX has the simplest definition of edit costs. In comparison, AIDS has attributed nodes and Willow dataset has attributed edges, making these two datasets more challenging than LINUX dataset. In line with [BaiWSDM19], we split all datasets by 60% for training, 20% for validation, and 20% for testing.
Our GENN-A* is implemented with Pytorch-Geometric[FeyICLR19] and the A* algorithm is implemented with Cython [Cython] in consideration of performance. We adopt GCN [KipfICLR17] for AIDS and LINUX datasets and SplineCNN [FeyCVPR18] for 2D Euclidean data from Willow-Cars (#kernels=16). The number of feature channels are defined as 64, 32, 16 for three GNN layers. Adam optimizer [adam] is used with 0.001 learning rate and weight decay. We set batch size=128 for LINUX and AIDS, and 16 for Willow. All experiments are run on our workstation with Intel i7-7820X@3.60GHz and 64GB memory. Parallelization techniques e.g. multi-threading and GPU parallelism are not considered in our experiment.
|Vanilla GNN||Exact Dynamic GNN||GENN (ours)||Hungarian [RiesenMLG07]|
4.2 Peer Methods
Hungarian-A* [RiesenMLG07] is selected as the exact solver baseline, where Hungarian bipartite matching is used to predict . We reimplement Hungarian-A* based on our Cython implementation for fair comparison. We also select Hungarian solver [RiesenIVC09] as the traditional inexact solver baseline in our experiments. It is worth noting that Hungarian bipartite matching can be either adopted as heuristic in A* algorithm (Hungarian heuristic for A*), or to provide a fast sub-optimal solution to GED (Hungarian solver), and readers should distinguish between these two methods. Other inexact solvers are also considered including Beam search [RiesenGithub] which is the greedy version of A* and VJ [FankGBRPR11] which is an variant from Hungarian solver.
For regression-based deep graph similarity learning methods, we compare SimGNN [BaiWSDM19], GMN [LiICML19] and GraphSim [BaiAAAI20]. Our GENN backbone can be viewed as a simplified version from these methods, because the time efficiency with dynamic graphs is our main concern.
4.3 Results and Discussions
The evaluation of AIDS, LINUX, and Willow-Cars dataset in line with [BaiAAAI20] is presented in Tab. 1, where the problem is defined as querying a graph in the test dataset from all graphs in the training set. The similarity score is defined as Eq. 7. Our regression model GENN has comparable performance against state-of-the-art with a simplified pipeline, and our GENN-A* best performs among all inexact GED solvers. We would like to point out that mse may not be a fair measurement when comparing GED solvers with regression-based models: Firstly, GED solvers can predict edit paths while such a feature is not supported by regression-based models. Secondly, the solutions of GED solvers are upper bounds of the optimal values, but regression-based graph similarity models [BaiWSDM19, BaiAAAI20, LiICML19] predicts GED values on both sides of the optimums. Actually, one can reduce the mse of GED solvers by adding a bias to the predicted GED values, which is exactly what the regression models are doing.
The number of states which have been added to OPEN in Alg. 1 is plotted in Fig. 5, where our GENN-A* significantly reduces the search tree size compared to Hungarian-A*. Such search-tree reduction results in the speed-up of A* algorithm, as shown in Tab. 2. Both evidences show that our GENN learns stronger than Hungarian heuristic [RiesenMLG07] whereby redundant explorations on suboptimal solutions are pruned. We further compare the inference time of three discussed dynamic graph embedding method in Tab. 3, where our GENN runs comparatively fast against Hungarian heuristic, despite the overhead of calling PyTorch functions from Cython. Exact Dynamic GNN is even slower than the vanilla version, since its frequent caching and loading operations may consume additional time. It is worth noting that further speedup can be achieved by implementing all algorithms in C++ and adopting parallelism techniques, but these may be beyond the scope of this paper.
In Fig. 7 we show the scatter plot of GENN-A* and inexact Hungarian solver [RiesenIVC09] as GED solvers, as well as GENN and Hungarian heuristic as the prediction methods on . Our GENN-A* benefits from the more accurate prediction of by GENN, solving the majority of problem instances to optimal. We also visualize a query example on Willow-Car images in Fig. 6 done by our GENN-A*.
This paper has presented a hybrid approach for solving the classic graph edit distance (GED) problem by integrating a dynamic graph embedding network for similarity score prediction into the edit path search procedure. Our approach inherits the good interpretability of classic GED solvers as it can recover the explicit edit path between two graphs while it achieves better cost-efficiency by replacing the manual heuristics with the fast embedding module. Our learning-based A* algorithm can reduce the search tree size and save running time, at the cost of little accuracy lost.