1 Introduction
Graph matching refers to the problem of establishing meaningful structural correspondences of nodes between two or more graphs by taking both node similarities and pairwise edge similarities into account (Wang et al., 2019b). Since graphs are natural representations for encoding relational data, the problem of graph matching lies at the heart of many realworld applications. For example, comparing molecules in cheminformatics (Kriege et al., 2019b), matching protein networks in bioinformatics (Sharan & Ideker, 2006; Singh et al., 2008), linking user accounts in social network analysis (Zhang & Philip, 2015), and tracking objects, matching 2D/3D shapes or recognizing actions in computer vision (Vento & Foggia, 2012) can be formulated as a graph matching problem.
The problem of graph matching has been heavily investigated in theory (Grohe et al., 2018) and practice (Conte et al., 2004), usually by relating it to domainagnostic distances such as the graph edit distance (Stauffer et al., 2017) and the maximum common subgraph problem (Bunke & Shearer, 1998), or by formulating it as a quadratic assignment problem (Yan et al., 2016). Since all three approaches are hard, solving them to optimality may not be tractable for largescale, realworld instances. Moreover, these purely combinatorial approaches do not adapt to the given data distribution and often do not consider continuous node embeddings which can provide crucial information about node semantics.
Recently, various neural architectures have been proposed to tackle the task of graph matching (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d, b; Derr et al., 2019; Zhang et al., 2019a; Heimann et al., 2018) or graph similarity (Bai et al., 2018, 2019; Li et al., 2019) in a datadependent fashion. However, these approaches are either only capable of computing similarity scores between whole graphs (Bai et al., 2018, 2019; Li et al., 2019), rely on an inefficient global matching procedure (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Xu et al., 2019d; Li et al., 2019), or do not generalize to unseen graphs (Xu et al., 2019b; Derr et al., 2019; Zhang et al., 2019a). Moreover, they might be prone to match neighborhoods between graphs inconsistently by only taking localized embeddings into account (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d; Derr et al., 2019; Heimann et al., 2018).
Here, we propose a fullydifferentiable graph matching procedure which aims to reach a datadriven neighborhood consensus between matched node pairs without the need to solve any optimization problem during inference. In addition, our approach is purely local, i.e., it operates on fixedsize neighborhoods around nodes, and is sparsityaware, i.e., it takes the sparsity of the underlying structures into account. Hence, our approach scales well to large input domains, and can be trained in an endtoend fashion to adapt to a given data distribution. Finally, our approach improves upon the stateoftheart on several realworld applications from the fields of computer vision and entity alignment on knowledge graphs.
2 Problem Definition
A graph consists of a finite set of nodes , an adjacency matrix , a node feature matrix , and an optional (sparse) edge feature matrix . For a subset of nodes , denotes the subgraph of induced by . We refer to as the hop neighborhood around node , where denotes the shortestpath distance in . A node coloring is a function with arbitrary codomain .
The problem of graph matching refers to establishing node correspondences between two graphs. Formally, we are given two graphs, a source graph and a target graph , w.l.o.g. , and are interested in finding a correspondence matrix which minimizes an objective subject to the onetoone mapping constraints and . As a result, infers an injective mapping which maps each node in to a node in .
Typically, graph matching is formulated as an edgepreserving, quadratic assignment problem (Anstreicher, 2003; Gold & Rangarajan, 1996; Caetano et al., 2009; Cho et al., 2013), i.e.,
(1) 
subject to the onetoone mapping constraints mentioned above. This formulation is based on the intuition of finding correspondences based on neighborhood consensus (Rocco et al., 2018), which shall prevent adjacent nodes in the source graph from being mapped to different regions in the target graph. Formally, a neighborhood consensus is reached if for all node pairs with , it holds that for every node there exists a node such that .
In this work, we consider the problem of supervised and semisupervised matching of graphs while employing the intuition of neighborhood consensus as an inductive bias into our model. In the supervised setting, we are given pairwise groundtruth correspondences for a set of graphs and want our model to generalize to unseen graph pairs. In the semisupervised setting, source and target graphs are fixed, and groundtruth correspondences are only given for a small subset of nodes. However, we are allowed to make use of the complete graph structures.
3 Methodology
In the following, we describe our proposed endtoend, deep graph matching architecture in detail. See Figure 1 for a highlevel illustration. The method consists of two stages: a local feature matching procedure followed by an iterative refinement strategy using synchronous message passing networks. The aim of the feature matching step, see Section 3.1, is to compute initial correspondence scores based on the similarity of local node embeddings. The second step is an iterative refinement strategy, see Sections 3.2 and 3.3, which aims to reach neighborhood consensus for correspondences using a differentiable validator for graph isomorphism. Finally, in Section 3.4, we show how to scale our method to large, realworld inputs.
3.1 Local Feature Matching
We model our local feature matching procedure in close analogy to related approaches (Bai et al., 2018, 2019; Wang et al., 2019b; Zhang & Lee, 2019; Wang & Solomon, 2019) by computing similarities between nodes in the source graph and the target graph based on node embeddings. That is, given latent node embeddings and computed by a shared neural network for source graph and target graph , respectively, we obtain initial soft correspondences as
Here, normalization is applied to obtain rectangular doublystochastic correspondence matrices that fulfill the constraints and (Sinkhorn & Knopp, 1967; Adams & Zemel, 2011; Cour et al., 2006).
We interpret the
th row vector
as a discrete distribution over potential correspondences in for each node . We train in a dicriminative, supervised fashion against ground truth correspondences by minimizing the negative loglikelihood of correct correspondence scores .We implement as a Graph Neural Network (GNN) to obtain localized, permutation equivariant vectorial node representations (Bronstein et al., 2017; Hamilton et al., 2017; Battaglia et al., 2018; Goyal & Ferrara, 2018). Formally, a GNN follows a neural message passing scheme (Gilmer et al., 2017) and updates its node features in layer by aggregating localized information via
(2) 
where and denotes a multiset. The recent work in the fields of
geometric deep learning
and relational representation learning provides a large number of operators to choose from (Kipf & Welling, 2017; Gilmer et al., 2017; Veličković et al., 2018; Schlichtkrull et al., 2018; Xu et al., 2019c), which allows for precise control of the properties of extracted features.3.2 Synchronous Message Passing for Neighborhood Consensus
Due to the purely local nature of the used node embeddings, our feature matching procedure is prone to finding false correspondences which are locally similar to the correct one. Formally, those cases pose a violation of the neighborhood consensus criteria employed in Equation (1). Since finding a global optimum is hard, we aim to detect violations of the criteria in local neighborhoods and resolve them in an iterative fashion.
We utilize graph neural networks to detect these violations in a neighborhood consensus step and iteratively refine correspondences , , starting from . Key to the proposed algorithm is the following observation: The soft correspondence matrix is a map from the node function space to the node function space . Therefore, we can use to pass node functions , along the soft correspondences by
(3) 
to obtain functions , in the other domain, respectively.
Then, our consensus method works as follows: Using , we first map node indicator functions, given as an injective node coloring
in the form of an identity matrix
, from to . Then, we distribute this coloring in corresponding neighborhoods by performing synchronous message passing on both graphs via a shared graph neural network , i.e.,(4) 
We can compare the results of both GNNs to recover a vector which measures the neighborhood consensus between node pairs . This measure can be used to perform trainable updates of the correspondence scores
(5) 
based on an . The process can be applied times to iteratively improve the consensus in neighborhoods. The final objective with
combines both the feature matching error and neighborhood consensus error. This objective is fullydifferentiable and can hence be optimized in an endtoend fashion using stochastic gradient descent. Overall, the consensus stage distributes global node colorings to resolve ambiguities and false matchings made in the first stage of our architecture by only using purely local operators. Since an initial matching is needed to test for neighborhood consensus, this task cannot be fulfilled by
alone, which stresses the importance of our twostage approach.The following two theorems show that is a good measure of how well local neighborhoods around and are matched by the soft correspondence between and . The proofs can be found in Appendix B and C, respectively.
Theorem 1.
Let and be two isomorphic graphs and let be a permutation equivariant GNN, i.e., for any permutation matrix . If encodes an isomorphism between and , then for all .
Theorem 2.
Let and be two graphs and let be a permutation equivariant and layered GNN for which both and are injective for all . If , then the resulting submatrix is a permutation matrix describing an isomorphism between the hop subgraph around and the hop subgraph around . Moreover, if for all , then denotes a full isomorphism between and .
Hence, a GNN that satisfies both criteria in Theorem 1 and 2 provides equal node embeddings and if and only if nodes in a local neighborhood are correctly matched to each other. A value indicates the existence of inconsistent matchings in the local neighborhoods around and , and can hence be used to refine the correspondence score .
Note that both requirements, permutation equivariance and injectivity, are easily fulfilled: (1) All common graph neural network architectures following the message passing scheme of Equation (2) are equivariant due to the use of permutation invariant neighborhood aggregators. (2) Injectivity of graph neural networks is a heavily discussed topic in recent literature. It can be fulfilled by using a GNN that is as powerful as the Weisfeiler & Lehman (1968)
(WL) heuristic in distinguishing graph structures,
e.g., by using aggregation in combination with s on the multiset of neighboring node features, cf. (Xu et al., 2019c; Morris et al., 2019).3.3 Relation to the Graduated Assignment Algorithm
Theoretically, we can relate our proposed approach to classical graph matching techniques that consider a doublystochastic relaxation of the problem defined in Equation (1), cf. (Lyzinski et al., 2016) and Appendix F for more details. A seminal work following this method is the graduated assignment algorithm (Gold & Rangarajan, 1996). By starting from an initial feasible solution , a new solution is iteratively computed from by approximately solving a linear assignment problem according to
(6) 
where denotes the gradient of Equation (1) at .^{1}^{1}1For clarity of presentation, we closely follow the original formulation of the method for simple graphs but ignore the edge similarities and adapt the constant factor of the gradient according to our objective function. The operator is implemented by applying normalization on rescaled inputs, where the scaling factor grows in every iteration to increasingly encourage integer solutions. Our approach also resembles the approximation of the linear assignment problem via normalization.
Moreover, the gradient is closely related to our neighborhood consensus scheme for the particular simple, nontrainable GNN instantiation . Given and , we obtain by substitution. Instead of updating based on the similarity between and obtained from a fixedfunction GNN , we choose to update correspondence scores via trainable neural networks and based on the difference between and . This allows us to interpret our model as a deep parameterized generalization of the graduated assignment algorithm. In addition, specifying node and edge attribute similarities in graph matching is often difficult and complicates its computation (Zhou & De la Torre, 2016; Zhang et al., 2019c), whereas our approach naturally supports continuous node and edge features via established GNN models. We experimentally verify the benefits of using trainable neural networks instead of in Appendix D.
3.4 Scaling to Large Input
We apply a number of optimizations to our proposed algorithm to make it scale to large input domains. See Algorithm 1 in Appendix A for the final optimized algorithm.
Sparse correspondences.
We propose to sparsify initial correspondences by filtering out low score correspondences before neighborhood consensus takes place. That is, we sparsify by computing top correspondences with the help of the KeOps library (Charlier et al., 2019) without ever storing its dense version, reducing its required memory footprint from to . In addition, the time complexity of the refinement phase is reduced from to , where and denote the number of edges in and , respectively. Note that sparsifying initial correspondences assumes that the feature matching procedure ranks the correct correspondence within the top elements for each node . Hence, also optimizing the initial feature matching loss is crucial, and can be further accelerated by training only against sparsified correspondences with groundtruth entries .
Replacing node indicators functions.
Although applying on node indicator functions is computationally efficient, it requires a parameter complexity of . Hence, we propose to replace node indicator functions with randomly drawn node functions , where with , in iteration . By sampling from a continuous distribution, node indicator functions are still guaranteed to be injective (DeGroot & Schervish, 2012). Note that Theorem 1 still holds because it does not impose any restrictions on the function space . Theorem 2 does not necessarily hold anymore, but we expect our refinement strategy to resolve any ambiguities by resampling in every iteration . We verify this empirically in Section 4.1.
Softmax normalization.
The normalization fulfills the requirements of rectangular doublystochastic solutions. However, it may eventually push correspondences to inconsistent integer solutions very early on from which the neighborhood consensus method cannot effectively recover. Furthermore, it is inherently inefficient to compute and runs the risk of vanishing gradients (Zhang et al., 2019b). Here, we propose to relax this constraint by only applying rowwise normalization on , and expect our supervised refinement procedure to naturally resolve violations of on its own by reranking false correspondences via neighborhood consensus. Experimentally, we show that rowwise normalization is sufficient for our algorithm to converge to the correct solution, cf. Section 4.1.
Number of refinement iterations.
Instead of holding fixed, we propose to differ the number of refinement iterations and , , for training and testing, respectively. This does not only speed up training runtime, but it also encourages the refinement procedure to reach convergence with as few steps as necessary while we can run the refinement procedure until convergence during testing. We show empirically that decreasing does not affect the convergence abilities of our neighborhood consensus procedure during testing, cf. Section 4.1.
4 Experiments
We verify our method on three different tasks. We first show the benefits of our approach in an ablation study on synthetic graphs (Section 4.1), and apply it to the realworld tasks of supervised keypoint matching in natural images (Sections 4.2 and 4.3) and semisupervised crosslingual knowledge graph alignment (Section 4.4) afterwards. All dataset statistics can be found in Appendix H.
Our method is implemented in PyTorch (Paszke et al., 2017) using the PyTorch Geometric (Fey & Lenssen, 2019) and the KeOps (Charlier et al., 2019) libraries. Our implementation can process sparse minibatches with parallel GPU acceleration and minimal memory footprint in all algorithm steps. For all experiments, optimization is done via Adam (Kingma & Ba, 2015) with a fixed learning rate of . We use similar architectures for and except that we omit dropout (Srivastava et al., 2014) in . For all experiments, we report Hits@ to evaluate and compare our model to previous lines of work, where Hits@ measures the proportion of correctly matched entities ranked in the top .
4.1 Ablation Study on Synthetic Graphs
In our first experiment, we evaluate our method on synthetic graphs where we aim to learn a matching for pairs of graphs in a supervised fashion. Each pair of graphs consists of an undirected Erdős & Rényi (1959) graph with
nodes and edge probability
, and a target graph which is constructed from by removing edges with probability without disconnecting any nodes (Heimann et al., 2018). Training and evaluation is done on graphs each for different configurations . In Appendix E, we perform additional experiments to also verify the robustness of our approach towards node addition or removal.Architecture and parameters.
We implement the graph neural network operators and by stacking three layers () of the GIN operator (Xu et al., 2019c)
(7) 
due to its expressiveness in distinguishing raw graph structures. The number of layers and hidden dimensionality of all s is set to and , respectively, and we apply activation (Glorot et al., 2011)
(Ioffe & Szegedy, 2015)after each of its layers. Input features are initialized with onehot encodings of node degrees. We employ a
Jumping Knowledge style concatenation (Xu et al., 2018) to compute final node representations . We train and test our procedure with and refinement iterations, respectively.Results.
Figures 2(a) and 2(b) show the matching accuracy Hits@1 for different choices of and . We observe that the purely local matching approach via starts decreasing in performance with the structural noise increasing. This also holds when applying global normalization on . However, our proposed twostage architecture can recover all correspondences, independent of the applied structural noise . This applies to both variants discussed in the previous sections, i.e., our initial formulation , and our optimized architecture using random node indicator sampling and rowwise normalization . This highlights the overall benefits of applying matching consensus and justifies the usage of the enhancements made towards scalability in Section 3.4.
In addition, Figure 2(c) visualizes the test error for varying number of iterations . We observe that even when training to nonconvergence, our procedure is still able to converge by increasing the number of iterations during testing.
Moreover, Figure 2(d) shows the performance of our refinement strategy when operating on sparsified top correspondences. In contrast to its dense version, it cannot match all nodes correctly due to the poor initial feature matching quality. However, it consistently converges to the perfect solution of Hits@1 Hits@ in case the correct match is included in the initial top ranking of correspondences. Hence, with increasing , we can recover most of the correct correspondences, making it an excellent option to scale our algorithm to large graphs, cf. Section 4.4.
4.2 Supervised Keypoint Matching in Natural Images
We perform experiments on the PascalVOC (Everingham et al., 2010) with Berkeley annotations (Bourdev & Malik, 2009) and WILLOWObjectClass (Cho et al., 2013) datasets which contain sets of image categories with labeled keypoint locations. For PascalVOC, we follow the experimental setups of Zanfir & Sminchisescu (2018) and Wang et al. (2019b) and use the training and test splits provided by Choy et al. (2016). We prefilter the dataset to exclude difficult, occluded and truncated objects, and require examples to have at least one keypoint, resulting in and annotated images for training and testing, respectively. The PascalVOC dataset contains instances of varying scale, pose and illumination, and the number of keypoints ranges from to . In contrast, the WILLOWObjectClass dataset contains at least 40 images with consistent orientations for each of its five categories, and each image consists of exactly 10 keypoints. Following the experimental setup of peer methods (Cho et al., 2013; Wang et al., 2019b), we pretrain our model on PascalVOC and finetune it over 20 random splits with 20 perclass images used for training. We construct graphs via the Delaunay triangulation of keypoints. For fair comparison with Zanfir & Sminchisescu (2018) and Wang et al. (2019b), input features of keypoints are given by the concatenated output of relu4_2 and relu5_1 of a pretrained VGG16 (Simonyan & Zisserman, 2014) on ImageNet (Deng et al., 2009).
Architecture and parameters.
We adopt SplineCNN (Fey et al., 2018) as our graph neural network operator
(8) 
whose trainable Bspline based kernel function is conditioned on edge features between nodepairs. To align our results with the related work, we evaluate both isotropic and anisotropic edge features which are given as normalized relative distances and 2D Cartesian coordinates, respectively. For SplineCNN, we use a kernel size of in each dimension, a hidden dimensionality of , and apply as our nonlinearity function . Our network architecture consists of two convolutional layers (), followed by dropout with probability , and a final linear layer. During training, we form pairs between any two training examples of the same category, and evaluate our model by sampling a fixed number of test graph pairs belonging to the same category.
Results.
Method  Aero  Bike  Bird  Boat  Bottle  Bus  Car  Cat  Chair  Cow  Table  Dog  Horse  MBike  Person  Plant  Sheep  Sofa  Train  TV  Mean  

GMN  31.1  46.2  58.2  45.9  70.6  76.5  61.2  61.7  35.5  53.7  58.9  57.5  56.9  49.3  34.1  77.5  57.1  53.6  83.2  88.6  57.9  
PCAGM  40.9  55.0  65.8  47.9  76.9  77.9  63.5  67.4  33.7  66.5  63.6  61.3  58.9  62.8  44.9  77.5  67.4  57.5  86.7  90.9  63.8  
isotropic 
34.7  42.6  41.5  50.4  50.3  72.2  60.1  59.4  24.6  38.1  86.2  47.7  56.3  37.6  35.4  58.0  45.8  74.8  64.1  75.3  52.8  
45.8  58.2  45.5  57.6  68.2  82.1  75.3  60.2  31.7  52.9  88.2  56.2  68.2  50.7  46.5  66.3  58.8  89.0  85.1  79.9  63.3  
45.3  57.1  54.9  54.7  71.7  82.6  75.3  65.9  31.6  50.8  86.1  56.9  67.1  53.1  49.2  77.3  59.2  91.7  82.0  84.2  64.8  
isotropic 
44.3  62.0  48.4  53.9  73.3  80.4  72.2  64.2  30.3  52.7  79.4  56.6  62.3  56.2  47.5  74.0  59.8  79.9  81.9  83.0  63.1  
46.5  63.7  54.9  60.9  79.4  84.1  76.4  68.3  38.5  61.5  80.6  59.7  69.8  58.4  54.3  76.4  64.5  95.7  87.9  81.3  68.1  
50.1  65.4  55.7  65.3  80.0  83.5  78.3  69.7  34.7  60.7  70.4  59.9  70.0  62.2  56.1  80.2  70.3  88.8  81.1  84.3  68.3  
anisotropic 
34.3  45.9  37.3  47.7  53.3  75.2  64.5  61.7  27.7  40.5  85.9  46.6  50.2  39.0  37.3  58.0  49.2  82.9  65.0  74.2  53.8  
44.6  51.2  50.7  58.5  72.3  83.3  76.6  65.6  31.0  57.5  91.7  55.4  69.5  56.2  47.5  85.1  57.9  92.3  86.7  85.9  66.0  
48.7  57.2  47.0  65.3  73.9  87.6  76.7  70.0  30.0  55.5  92.8  59.5  67.9  56.9  48.7  87.2  58.3  94.9  87.9  86.0  67.6  
anisotropic 
42.1  57.5  49.6  59.4  83.8  84.0  78.4  67.5  37.3  60.4  85.0  58.0  66.0  54.1  52.6  93.9  60.2  85.6  87.8  82.5  67.3  
45.5  67.6  56.5  66.8  86.9  85.2  84.2  73.0  43.6  66.0  92.3  64.0  79.8  56.6  56.1  95.4  64.4  95.0  91.3  86.3  72.8  
47.0  65.7  56.8  67.6  86.9  87.7  85.3  72.6  42.9  69.1  84.5  63.8  78.1  55.6  58.4  98.0  68.4  92.2  94.5  85.5  73.0 
Method  Face  Motorbike  Car  Duck  Winebottle  

GMN (Zanfir & Sminchisescu, 2018)  99.3  71.4  74.3  82.8  76.7  
PCAGM (Wang et al., 2019b)  100.0  76.7  84.0  93.5  96.9  
isotropic  98.07 0.79  48.97 4.62  65.30 3.16  66.02 2.51  77.72 3.32  
100.00 0.00  67.28 4.93  85.07 3.93  83.10 3.61  92.30 2.11  
100.00 0.00  68.57 3.94  82.75 5.77  84.18 4.15  90.36 2.42  
isotropic  99.62 0.28  73.47 3.32  77.47 4.92  77.10 3.25  88.04 1.38  
100.00 0.00  92.05 3.49  90.05 5.10  88.98 2.75  97.14 1.41  
100.00 0.00  92.05 3.24  90.28 4.67  88.97 3.49  97.14 1.83  
anisotropic  98.47 0.61  49.28 4.31  64.95 3.52  66.17 4.08  78.08 2.61  
100.00 0.00  76.28 4.77  86.70 3.25  83.22 3.52  93.65 1.64  
100.00 0.00  76.57 5.28  89.00 3.88  84.78 2.73  95.29 2.22  
anisotropic  99.96 0.06  91.90 2.30  91.28 4.89  86.58 2.99  98.25 0.71  
100.00 0.00  98.80 1.58  96.53 1.55  93.22 3.77  99.87 0.31  
100.00 0.00  99.40 0.80  95.53 2.93  93.00 2.71  99.39 0.70 
Hits@1 (%) with standard deviations on the
WILLOWObjectClass dataset.We follow the experimental setup of Wang et al. (2019b) and train our models using negative loglikelihood due to its superior performance in contrast to the displacement loss used in Zanfir & Sminchisescu (2018). We evaluate our complete architecture using isotropic and anisotropic GNNs for , and include ablation results obtained from using for the local node matching procedure. Results of Hits@1 are shown in Table 1 and 2 for PascalVOC and WILLOWObjectClass, respectively. We visualize qualitative results of our method in Appendix I.
We observe that our refinement strategy is able to significantly outperform competing methods as well as our nonrefined baselines. On the WILLOWObjectClass dataset, our refinement stage at least reduces the error of the initial model () by half across all categories. The benefits of the second stage are even more crucial when starting from a weaker initial feature matching baseline (), with overall improvements of up to percentage points on PascalVOC. However, good initial matchings do help our consensus stage to improve its performance further, as indicated by the usage of taskspecific isotropic or anisotropic GNNs for .
4.3 Supervised Geometric Keypoint Matching
We also verify our approach by tackling the geometric feature matching problem, where we only make use of point coordinates and no additional visual features are available. Here, we follow the experimental training setup of Zhang & Lee (2019), and test the generalization capabilities of our model on the PascalPF dataset (Ham et al., 2016). For training, we generate a synthetic set of graph pairs: We first randomly sample 30–60 source points uniformly from , and add Gaussian noise from
to these points to obtain the target points. Furthermore, we add 0–20 outliers from
to each point cloud. Finally, we construct graphs by connecting each node with its nearest neighbors (). We train our unmodified anisotropic keypoint architecture from Section 4.2 with input until it has seen synthetic examples.Results.
We evaluate our trained model on the PascalPF dataset (Ham et al., 2016) which consists of image pairs within 20 classes, with the number of keypoints ranging from 4 to 17. Results of Hits@1 are shown in Table 3. Overall, our consensus architecture improves upon the stateoftheart results of Zhang & Lee (2019) on almost all categories while our baseline is weaker than the results reported in Zhang & Lee (2019), showing the benefits of applying our consensus stage. In addition, it shows that our method works also well even when not taking any visual information into account.
Method  Aero  Bike  Bird  Boat  Bottle  Bus  Car  Cat  Chair  Cow  Table  Dog  Horse  MBike  Person  Plant  Sheep  Sofa  Train  TV  Mean  

(Zhang & Lee, 2019)  76.1  89.8  93.4  96.4  96.2  97.1  94.6  82.8  89.3  96.7  89.7  79.5  82.6  83.5  72.8  76.7  77.1  97.3  98.2  99.5  88.5  
Ours  69.2  87.7  77.3  90.4  98.7  98.3  92.5  91.6  94.7  79.4  95.8  90.1  80.0  79.5  72.5  98.0  76.5  89.6  93.4  97.8  87.6  
81.3  92.2  94.2  98.8  99.3  99.1  98.6  98.2  99.6  94.1  100.0  99.4  86.6  86.6  88.7  100.0  100.0  100.0  100.0  99.3  95.8  
81.1  92.0  94.7  100.0  99.3  99.3  98.9  97.3  99.4  93.4  100.0  99.1  86.3  86.2  87.7  100.0  100.0  100.0  100.0  99.3  95.7 
4.4 Semisupervised Crosslingual Knowledge Graph Alignment
We evaluate our model on the DBP15K datasets (Sun et al., 2017) which link entities of the Chinese, Japanese and French knowledge graphs of DBpedia into the English version and vice versa. Each dataset contains exactly links between equivalent entities, and we split those links into training and testing following upon previous works. For obtaining entity input features, we follow the experimental setup of Xu et al. (2019d): We retrieve monolingual fastText embeddings (Bojanowski et al., 2017) for each language separately, and align those into the same vector space afterwards (Lample et al., 2018). We use the sum of word embeddings as the final entity input representation (although more sophisticated approaches are just as conceivable).
Architecture and parameters.
Our graph neural network operator mostly matches the one proposed in Xu et al. (2019d) where the direction of edges is retained, but not their specific relation type:
(9) 
We use followed by dropout with probability as our nonlinearity , and obtain final node representations via . We use a threelayer GNN () both for obtaining initial similarities and for refining alignments with dimensionality and , respectively. Training is performed using negative log likelihood in a semisupervised fashion: For each training node in , we train sparsely by using the corresponding groundtruth node in , the top entries in and randomly sampled entities in . For the refinement phase, we update the sparse top correspondence matrix times. For efficiency reasons, we train and sequentially for epochs each.
Results.
Method  ZHEN  ENZH  JAEN  ENJA  FREN  ENFR  

@1  @10  @1  @10  @1  @10  @1  @10  @1  @10  @1  @10  
GCN (Wang et al., 2018)  41.25  74.38  36.49  69.94  39.91  74.46  38.42  71.81  37.29  74.49  36.77  73.06  
BootEA (Sun et al., 2018)  62.94  84.75  62.23  85.39  65.30  87.44  
MuGNN (Cao et al., 2019)  49.40  84.40  50.10  85.70  49.60  87.00  
NAEA (Zhu et al., 2019)  65.01  86.73  64.14  87.27  67.32  89.43  
RDGCN (Wu et al., 2019)  70.75  84.55  76.74  89.54  88.64  95.72  
GMNN (Xu et al., 2019d)  67.93  78.48  65.28  79.64  73.97  87.15  71.29  84.63  89.38  95.25  88.18  94.75  
58.53  78.04  54.99  74.25  59.18  79.16  55.40  75.53  76.07  91.54  74.89  90.57  
Ours (sparse)  67.59  87.47  64.38  83.56  71.95  89.74  68.88  86.84  83.36  96.03  82.16  95.28  
80.12  87.47  76.77  83.56  84.80  89.74  81.09  86.84  93.34  96.03  91.95  95.28 
We report Hits@1 and Hits@10 to evaluate and compare our model to previous lines of work, see Table 4. In addition, we report results of a simple threelayer which matches nodes purely based on initial word embeddings, and a variant of our model without the refinement of initial correspondences (). Our approach improves upon the stateoftheart on all categories with gains of up to percentage points. In addition, our refinement strategy consistently improves upon the Hits@1 of initial correspondences by a significant margin, while results of Hits@10 are shared due to the refinement operating only on sparsified top initial correspondences. Due to the scalability of our approach, we can easily apply a multitude of refinement iterations while still retaining large hidden feature dimensionalities.
5 Limitations
Our experimental results demonstrate that the proposed approach effectively solves challenging realworld problems. However, the expressive power of GNNs is closely related to the WL heuristic for graph isomorphism testing (Xu et al., 2019c; Morris et al., 2019), whose power and limitations are well understood (Arvind et al., 2015). Our method generally inherits these limitations. Hence, one possible limitation is that whenever two nodes are assigned the same color by WL, our approach may fail to converge to one of the possible solutions. For example, there may exist two nodes with equal neighborhood sets . One can easily see that the feature matching procedure generates equal initial correspondence distributions , resulting in the same mapped node indicator functions from to nodes and , respectively. Since both nodes share the same neighborhood, also produces the same distributed functions . As a result, both column vectors and receive the same update, leading to nonconvergence. In theory, one might resolve these ambiguities by adding a small amount of noise to . However, the general amount of feature noise present in realworld datasets already ensures that this scenario is unlikely to occur.
6 Related Work
Identifying correspondences between the nodes of two graphs has been studied in various domains and an extensive body of literature exists. Closely related problems are summarized under the terms maximum common subgraph (Kriege et al., 2019b), network alignment (Zhang, 2016), graph edit distance (Chen et al., 2019) and graph matching (Yan et al., 2016). We refer the reader to the Appendix F for a detailed discussion of the related work on these problems. Recently, graph neural networks have become a focus of research leading to various proposed deep graph matching techniques (Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d; Derr et al., 2019). In Appendix G, we present a detailed overview of the related work in this field while highlighting individual differences and similarities to our proposed graph matching consensus procedure.
7 Conclusion
We presented a twostage neural architecture for learning node correspondences between graphs in a supervised or semisupervised fashion. Our approach is aimed towards reaching a neighborhood consensus between matchings, and can resolve violations of this criteria in an iterative fashion. In addition, we proposed enhancements to let our algorithm scale to large input domains. We evaluated our architecture on realworld datasets on which it consistently improved upon the stateoftheart.
Acknowledgements
This work has been supported by the German Research Association (DFG) within the Collaborative Research Center SFB 876 Providing Information by ResourceConstrained Analysis, projects A6 and B2.
References
 Adams & Zemel (2011) R. P. Adams and R. S. Zemel. Ranking via sinkhorn propagation. CoRR, abs/1106.1925, 2011.
 Aflalo et al. (2015) Y. Aflalo, A. Bronstein, and R. Kimmel. On convex relaxation of graph isomorphism. Proceedings of the National Academy of Sciences, 112(10), 2015.
 Anstreicher (2003) K. Anstreicher. Recent advances in the solution of quadratic assignment problems. Mathematical Programming, 97, 2003.
 Arvind et al. (2015) V. Arvind, J. Köbler, G. Rattan, and O. Verbitsky. On the power of color refinement. In Fundamentals of Computation Theory, 2015.
 Bai et al. (2018) Y. Bai, H. Ding, Y. Sun, and W. Wang. Convolutional set matching for graph similarity. In NeurIPSW, 2018.
 Bai et al. (2019) Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, and W. Wang. SimGNN: A neural network approach to fast graph similarity computation. In WSDM, 2019.
 Battaglia et al. (2018) P. W. Battaglia, J. B. Hamrick, V. Bapst, A. SanchezGonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, Ç. Gülçehre, F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018.
 Bayati et al. (2013) M. Bayati, D. F. Gleich, A. Saberi, and Y. Wang. Messagepassing algorithms for sparse network alignment. ACM Transactions on Knowledge Discovery from Data, 7(1), 2013.
 Bento & Ioannidis (2018) J. Bento and S. Ioannidis. A family of tractable graph distances. In SDM, 2018.
 Bojanowski et al. (2017) P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 2017.
 Bougleux et al. (2017) S. Bougleux, L. Brun, V. Carletti, P. Foggia, B. Gaüzère, and M. Vento. Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters, 87, 2017.
 Bourdev & Malik (2009) L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3D human pose annotations. In ICCV, 2009.
 Bronstein et al. (2017) M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning: Going beyond euclidean data. In Signal Processing Magazine, 2017.
 Bunke (1997) H. Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 18(8), 1997.
 Bunke & Shearer (1998) H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(4), 1998.
 Caetano et al. (2009) T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. Learning graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 2009.
 Cao et al. (2019) Y. Cao, Z. Liu, C. Li, Z. Liu, J. Li, and T. Chua. Multichannel graph neural network for entity alignment. In ACL, 2019.
 Charlier et al. (2019) B. Charlier, J. Feydy, and Glaunès. KeOps. https://github.com/getkeops/keops, 2019.
 Chen et al. (2019) X. Chen, H. Huo, J. Huan, and J. S. Vitter. An efficient algorithm for graph edit distance computation. KnowledgeBased Systems, 163, 2019.
 Cho et al. (2013) M. Cho, K. Alahari, and J. Ponce. Learning graphs to match. In ICCV, 2013.
 Choy et al. (2016) C. B. Choy, J. Gwak, S. Savarese, and M. Chandraker. Universal correspondence network. In NIPS, 2016.

Conte et al. (2004)
D. Conte, P. Foggia, C. Sansone, and M. Vento.
Thirty years of graph matching in pattern recognition.
International Journal of Pattern Recognition and Artificial Intelligence
, 18, 2004. 
Cortés et al. (2019)
X. Cortés, D. Conte, and H. Cardot.
Learning edit cost estimation models for graph edit distance.
Pattern Recognition Letters, 125, 2019.  Cour et al. (2006) T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In NIPS, 2006.
 DeGroot & Schervish (2012) M. H. DeGroot and M. J. Schervish. Probability and Statistics. AddisonWesley, 2012.
 Deng et al. (2009) J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. FeiFei. ImageNet: A largescale hierarchical image database. In CVPR, 2009.
 Derr et al. (2019) T. Derr, H. Karimi, X. Liu, J. Xu, and J. Tang. Deep adversarial network alignment. CoRR, abs/1902.10307, 2019.
 Egozi et al. (2013) A. Egozi, Y. Keller, and H. Guterman. A probabilistic approach to spectral graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 2013.
 Erdős & Rényi (1959) P. Erdős and A. Rényi. On random graphs I. Publicationes Mathematicae Debrecen, 6, 1959.
 Everingham et al. (2010) M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes (VOC) challenge. In IJCV, 2010.
 Fey & Lenssen (2019) M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLRW, 2019.
 Fey et al. (2018) M. Fey, J. E. Lenssen, F. Weichert, and H. Müller. SplineCNN: Fast geometric deep learning with continuous Bspline kernels. In CVPR, 2018.
 Garey & Johnson (1979) M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman, 1979.
 Gilmer et al. (2017) J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In ICML, 2017.
 Glorot et al. (2011) X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In AISTATS, 2011.
 Gold & Rangarajan (1996) S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4), 1996.
 Gori et al. (2005) M. Gori, M. Maggini, and L. Sarti. Exact and approximate graph matching using random walks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005.
 Gouda & Hassaan (2016) K. Gouda and M. Hassaan. CSI_GED: An efficient approach for graph edit similarity computation. In ICDE, 2016.
 Goyal & Ferrara (2018) P. Goyal and E. Ferrara. Graph embedding techniques, applications, and performance: A survey. KnowledgeBased Systems, 151, 2018.
 Grohe et al. (2018) M. Grohe, G. Rattan, and G. J. Woeginger. Graph similarity and approximate isomorphism. In Mathematical Foundations of Computer Science, 2018.
 Grover & Leskovec (2016) A. Grover and J. Leskovec. Node2Vec: Scalable feature learning for networks. In SIGKDD, 2016.

Halimi et al. (2019)
O. Halimi, O. Litany, E. Rodolà, A. M. Bronstein, and R. Kimmel.
Selfsupervised learning of dense shape correspondence.
In CVPR, 2019.  Ham et al. (2016) B. Ham, M. Cho, C. Schmid, and J. Ponce. Proposal flow. In CVPR, 2016.
 Hamilton et al. (2017) W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 40(3), 2017.
 Heimann et al. (2018) M. Heimann, H. Shen, T. Safavi, and D. Koutra. REGAL: Representation learningbased graph alignment. In CIKM, 2018.
 Ioffe & Szegedy (2015) S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
 Jaggi (2013) M. Jaggi. Revisiting FrankWolfe: Projectionfree sparse convex optimization. In ICML, 2013.
 Kann (1992) V. Kann. On the approximability of the maximum common subgraph problem. In STACS, 1992.
 Kersting et al. (2014) Kristian Kersting, Martin Mladenov, Roman Garnett, and Martin Grohe. Power iterated color refinement. In AAAI, 2014.
 Kingma & Ba (2015) D. P. Kingma and J. L. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
 Kipf & Welling (2017) T. N. Kipf and M. Welling. Semisupervised classification with graph convolutional networks. In ICLR, 2017.
 Klau (2009) G. W. Klau. A new graphbased method for pairwise global network alignment. BMC Bioinformatics, 10, 2009.
 Kollias et al. (2012) G. Kollias, S. Mohammadi, and A. Grama. Network similarity decomposition (NSD): A fast and scalable approach to network alignment. IEEE Tranactions on Knowledge and Data Engineering, 24(12), 2012.
 Kriege et al. (2019a) N. M. Kriege, P. L. Giscard, F. Bause, and R. C. Wilson. Computing optimal assignments in linear time for approximate graph matching. In ICDM, 2019a.
 Kriege et al. (2019b) N. M. Kriege, L. Humbeck, and O. Koch. Chemical similarity and substructure searches. In Encyclopedia of Bioinformatics and Computational Biology. Academic Press, 2019b.
 Lample et al. (2018) G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jégou. Word translation without parallel data. In ICLR, 2018.
 Leordeanu & Hebert (2005) M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints. In ICCV, 2005.
 Leordeanu et al. (2009) M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and MAP inference. In NIPS, 2009.

Lerouge et al. (2017)
J. Lerouge, Z. AbuAisheh, R. Raveaux, P. Héroux, and S. Adam.
New binary linear programming formulation to compute the graph edit distance.
Pattern Recognition, 72, 2017.  Li et al. (2019) Y. Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli. Graph matching networks for learning the similarity of graph structured objects. In ICML, 2019.
 Litany et al. (2017) O. Litany, T. Remez, E. Rodolà, A. M. Bronstein, and M. M. Bronstein. Deep functional maps: Structured prediction for dense shape correspondence. In ICCV, 2017.
 Lyzinski et al. (2016) V. Lyzinski, D. E. Fishkind, M. Fiori, J. T. Vogelstein, C. E. Priebe, and G. Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 2016.
 Matula (1978) D. W. Matula. Subtree isomorphism in . In Algorithmic Aspects of Combinatorics, volume 2. Elsevier, 1978.
 Morris et al. (2019) C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman go neural: Higherorder graph neural networks. In AAAI, 2019.
 Murphy et al. (2019) R. L. Murphy, B. Srinivasan, V. Rao, and B. Ribeiro. Relational pooling for graph representations. In ICML, 2019.
 Ovsjanikov et al. (2012) M. Ovsjanikov, M. BenChen, J. Solomon, A. Butscher, and L. J. Guibas. Functional maps: A flexible representation of maps between shapes. ACM Transactions on Graphics, 31(4), 2012.
 Page et al. (1999) L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
 Paszke et al. (2017) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. In NIPSW, 2017.
 Peyré et al. (2016) G. Peyré, M. Cuturi, and J. Solomon. GromovWasserstein averaging of kernel and distance matrices. In ICML, 2016.
 Riesen & Bunke (2009) K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision Computing, 27(7), 2009.
 Riesen et al. (2015a) K. Riesen, M. Ferrer, R. Dornberger, and H. Bunke. Greedy graph edit distance. In Machine Learning and Data Mining in Pattern Recognition, 2015a.
 Riesen et al. (2015b) K. Riesen, M. Ferrer, A. Fischer, and H. Bunke. Approximation of graph edit distance in quadratic time. In GraphBased Representations in Pattern Recognition, 2015b.
 Rocco et al. (2018) I. Rocco, M. Cimpo, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic. Neighbourhood consensus networks. In NeurIPS, 2018.
 Rodolà et al. (2017) E. Rodolà, L. Cosmo, M. M. Bronstein, A. Torsello, and D. Cremers. Partial functional correspondence. Computer Graphics Forum, 36(1), 2017.
 Sanfeliu & Fu (1983) A. Sanfeliu and K. S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13(3), 1983.
 Sattler et al. (2009) T. Sattler, B. Leibe, and L. Kobbelt. SCRAMSAC: Improving RANSAC’s efficiency with a spatial consistency filter. In ICCV, 2009.
 Schlichtkrull et al. (2018) M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, 2018.

Schmid & Mohr (1997)
C. Schmid and R. Mohr.
Local grayvalue invariants for image retrieval.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 1997.  Sharan & Ideker (2006) R. Sharan and T. Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology, 24(4), 2006.
 Simonyan & Zisserman (2014) K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. In ICLR, 2014.
 Singh et al. (2008) R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. In National Academy of Sciences, 2008.
 Sinkhorn & Knopp (1967) R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2), 1967.
 Sivic & Zisserman (2003) J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, 2003.
 Srivastava et al. (2014) N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 2014.
 Stauffer et al. (2017) M. Stauffer, T. Tschachtli, A. Fischer, and K. Riesen. A survey on applications of bipartite graph edit distance. In GraphBased Representations in Pattern Recognition, 2017.
 Sun et al. (2017) Z. Sun, W. Hu, and C. Li. Crosslingual entity alignment via joint attributepreserving embedding. In ISWC, 2017.
 Sun et al. (2018) Z. Sun, W. Hu, Q. Zhang, and Y. Qu. Bootstrapping entity alignment with knowledge graph embedding. In IJCAI, 2018.
 Swoboda et al. (2017) P. Swoboda, C. Rother, H. A. Ahljaija, D. Kainmueller, and B. Savchynskyy. A study of lagrangean decompositions and dual ascent solvers for graph matching. In CVPR, 2017.
 Tinhofer (1991) G. Tinhofer. A note on compact graphs. Discrete Applied Mathematics, 30(2), 1991.
 Umeyama (1988) S. Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5), 1988.
 Veličković et al. (2018) P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018.
 Vento & Foggia (2012) M. Vento and P. Foggia. Graph matching techniques for computer vision. GraphBased Methods in Computer Vision: Developments and Applications, 1, 2012.
 Wang et al. (2019a) F. Wang, N. Xue, Y. Zhang, G. Xia, and M. Pelillo. A functional representation for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019a.
 Wang et al. (2019b) R. Wang, J. Yan, and X. Yang. Learning combinatorial embedding networks for deep graph matching. In ICCV, 2019b.
 Wang & Solomon (2019) Y. Wang and J. M. Solomon. Deep closest point: Learning representations for point cloud registration. In ICCV, 2019.
 Wang et al. (2018) Z. Wang, Q. Lv, X. Lan, and Y. Zhang. Crosslingual knowledge graph alignment via graph convolutional networks. In EMNLP, 2018.
 Weisfeiler & Lehman (1968) B. Weisfeiler and A. A. Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. NauchnoTechnicheskaya Informatsia, 2(9), 1968.
 Wu et al. (2019) Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao. Relationaware entity alignment for heterogeneous knowledge graphs. In IJCAI, 2019.
 Xu et al. (2019a) H. Xu, D. Luo, and L. Carin. Scalable GromovWasserstein learning for graph partitioning and matching. CoRR, abs/1905.07645, 2019a.
 Xu et al. (2019b) H. Xu, D. Luo, H. Zha, and L. Carin. Gromovwasserstein learning for graph matching and node embedding. In ICML, 2019b.
 Xu et al. (2018) K. Xu, C. Li, Y. Tian, T. Sonobe, K. Kawarabayashi, and S. Jegelka. Representation learning on graphs with jumping knowledge networks. In ICML, 2018.
 Xu et al. (2019c) K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In ICLR, 2019c.
 Xu et al. (2019d) K. Xu, L. Wang, M. Yu, Y. Feng, Y. Song, Z. Wang, and D. Yu. Crosslingual knowledge graph alignment via graph matching neural network. In ACL, 2019d.
 Yan et al. (2016) J. Yan, X. C. Yin, W. Lin, C. Deng, H. Zha, and X. Yang. A short survey of recent advances in graph matching. In ICMR, 2016.
 Zanfir & Sminchisescu (2018) A. Zanfir and C. Sminchisescu. Deep learning of graph matching. In CVPR, 2018.
 Zaslavskiy et al. (2009) M. Zaslavskiy, F. Bach, and J. P. Vert. A path following algorithm for the graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2009.
 Zhang (2016) H. Zhang, S.and Tong. FINAL: fast attributed network alignment. In SIGKDD, 2016.
 Zhang & Philip (2015) J. Zhang and S. Y. Philip. Multiple anonymized social networks alignment. In ICDM, 2015.
 Zhang et al. (2019a) W. Zhang, K. Shu, H. Liu, and Y. Wang. Graph neural networks for user identity linkage. CoRR, abs/1903.02174, 2019a.
 Zhang et al. (2019b) Y. Zhang, A. PrügelBennett, and J. Hare. Learning representations of sets through optimized permutations. In ICLR, 2019b.
 Zhang & Lee (2019) Z. Zhang and W. S. Lee. Deep graphical feature learning for the feature matching problem. In ICCV, 2019.
 Zhang et al. (2019c) Z. Zhang, Y. Xiang, L. Wu, B. Xue, and A. Nehorai. KerGM: Kernelized graph matching. In NeurIPS, 2019c.
 Zhou & De la Torre (2016) F. Zhou and F. De la Torre. Factorized graph matching. In CVPR, 2016.

Zhu et al. (2017)
J. Y. Zhu, T. Park, P. Isola, and A. A. Efros.
Unpaired imagetoimage translation using cycleconsistent adversarial networks.
In ICCV, 2017.  Zhu et al. (2019) Q. Zhu, X. Zhou, J. Wu, J. Tan, and L. Guo. Neighborhoodaware attentional representation for multilingual knowledge graphs. In IJCAI, 2019.
Appendix A Optimized Graph Matching Consensus Algorithm
Our final optimized algorithm is given in Algorithm 1:
Appendix B Proof for Theorem 1
Proof.
Since is permutation equivariant, it holds for any node feature matrix that . With and , it follows that
Hence, it shows that for any node , resulting in . ∎
Appendix C Proof for Theorem 2
Proof.
Let be . Then, the layered GNN maps both hop neighborhoods around nodes and
Comments
There are no comments yet.