Deep Graph Matching Consensus

by   Matthias Fey, et al.
TU Dortmund

This work presents a two-stage neural architecture for learning and refining structural correspondences between graphs. First, we use localized node embeddings computed by a graph neural network to obtain an initial ranking of soft correspondences between nodes. Secondly, we employ synchronous message passing networks to iteratively re-rank the soft correspondences to reach a matching consensus in local neighborhoods between graphs. We show, theoretically and empirically, that our message passing scheme computes a well-founded measure of consensus for corresponding neighborhoods, which is then used to guide the iterative re-ranking process. Our purely local and sparsity-aware architecture scales well to large, real-world inputs while still being able to recover global correspondences consistently. We demonstrate the practical effectiveness of our method on real-world tasks from the fields of computer vision and entity alignment between knowledge graphs, on which we improve upon the current state-of-the-art. Our source code is available under deep-graph-matching-consensus.


Image Keypoint Matching using Graph Neural Networks

Image matching is a key component of many tasks in computer vision and i...

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching

Graph neural networks (GNNs) and message passing neural networks (MPNNs)...

Neural Graph Matching for Pre-training Graph Neural Networks

Recently, graph neural networks (GNNs) have been shown powerful capacity...

ComENet: Towards Complete and Efficient Message Passing for 3D Molecular Graphs

Many real-world data can be modeled as 3D graphs, but learning represent...

PACE: A Parallelizable Computation Encoder for Directed Acyclic Graphs

Optimization of directed acyclic graph (DAG) structures has many applica...

Distributable Consistent Multi-Graph Matching

In this paper we propose an optimization-based framework to multiple gra...

Joint Deep Multi-Graph Matching and 3D Geometry Learning from Inhomogeneous 2D Image Collections

Graph matching aims to establish correspondences between vertices of gra...

Code Repositories



view repo

1 Introduction

Graph matching refers to the problem of establishing meaningful structural correspondences of nodes between two or more graphs by taking both node similarities and pairwise edge similarities into account (Wang et al., 2019b). Since graphs are natural representations for encoding relational data, the problem of graph matching lies at the heart of many real-world applications. For example, comparing molecules in cheminformatics (Kriege et al., 2019b), matching protein networks in bioinformatics (Sharan & Ideker, 2006; Singh et al., 2008), linking user accounts in social network analysis (Zhang & Philip, 2015), and tracking objects, matching 2D/3D shapes or recognizing actions in computer vision (Vento & Foggia, 2012) can be formulated as a graph matching problem.

The problem of graph matching has been heavily investigated in theory (Grohe et al., 2018) and practice (Conte et al., 2004), usually by relating it to domain-agnostic distances such as the graph edit distance (Stauffer et al., 2017) and the maximum common subgraph problem (Bunke & Shearer, 1998), or by formulating it as a quadratic assignment problem (Yan et al., 2016). Since all three approaches are -hard, solving them to optimality may not be tractable for large-scale, real-world instances. Moreover, these purely combinatorial approaches do not adapt to the given data distribution and often do not consider continuous node embeddings which can provide crucial information about node semantics.

Recently, various neural architectures have been proposed to tackle the task of graph matching (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d, b; Derr et al., 2019; Zhang et al., 2019a; Heimann et al., 2018) or graph similarity (Bai et al., 2018, 2019; Li et al., 2019) in a data-dependent fashion. However, these approaches are either only capable of computing similarity scores between whole graphs (Bai et al., 2018, 2019; Li et al., 2019), rely on an inefficient global matching procedure (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Xu et al., 2019d; Li et al., 2019), or do not generalize to unseen graphs (Xu et al., 2019b; Derr et al., 2019; Zhang et al., 2019a). Moreover, they might be prone to match neighborhoods between graphs inconsistently by only taking localized embeddings into account (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d; Derr et al., 2019; Heimann et al., 2018).

Here, we propose a fully-differentiable graph matching procedure which aims to reach a data-driven neighborhood consensus between matched node pairs without the need to solve any optimization problem during inference. In addition, our approach is purely local, i.e., it operates on fixed-size neighborhoods around nodes, and is sparsity-aware, i.e., it takes the sparsity of the underlying structures into account. Hence, our approach scales well to large input domains, and can be trained in an end-to-end fashion to adapt to a given data distribution. Finally, our approach improves upon the state-of-the-art on several real-world applications from the fields of computer vision and entity alignment on knowledge graphs.

2 Problem Definition

A graph consists of a finite set of nodes , an adjacency matrix , a node feature matrix , and an optional (sparse) edge feature matrix . For a subset of nodes , denotes the subgraph of induced by . We refer to as the -hop neighborhood around node , where denotes the shortest-path distance in . A node coloring is a function with arbitrary codomain .

The problem of graph matching refers to establishing node correspondences between two graphs. Formally, we are given two graphs, a source graph and a target graph , w.l.o.g. , and are interested in finding a correspondence matrix which minimizes an objective subject to the one-to-one mapping constraints and . As a result, infers an injective mapping which maps each node in to a node in .

Typically, graph matching is formulated as an edge-preserving, quadratic assignment problem (Anstreicher, 2003; Gold & Rangarajan, 1996; Caetano et al., 2009; Cho et al., 2013), i.e.,


subject to the one-to-one mapping constraints mentioned above. This formulation is based on the intuition of finding correspondences based on neighborhood consensus (Rocco et al., 2018), which shall prevent adjacent nodes in the source graph from being mapped to different regions in the target graph. Formally, a neighborhood consensus is reached if for all node pairs with , it holds that for every node there exists a node such that .

In this work, we consider the problem of supervised and semi-supervised matching of graphs while employing the intuition of neighborhood consensus as an inductive bias into our model. In the supervised setting, we are given pair-wise ground-truth correspondences for a set of graphs and want our model to generalize to unseen graph pairs. In the semi-supervised setting, source and target graphs are fixed, and ground-truth correspondences are only given for a small subset of nodes. However, we are allowed to make use of the complete graph structures.

3 Methodology

In the following, we describe our proposed end-to-end, deep graph matching architecture in detail. See Figure 1 for a high-level illustration. The method consists of two stages: a local feature matching procedure followed by an iterative refinement strategy using synchronous message passing networks. The aim of the feature matching step, see Section 3.1, is to compute initial correspondence scores based on the similarity of local node embeddings. The second step is an iterative refinement strategy, see Sections 3.2 and 3.3, which aims to reach neighborhood consensus for correspondences using a differentiable validator for graph isomorphism. Finally, in Section 3.4, we show how to scale our method to large, real-world inputs.

Figure 1: High-level illustration of our two-stage neighborhood consensus architecture. Node features are first locally matched based on a graph neural network , before their correspondence scores get iteratively refined based on neighborhood consensus. Here, an injective node coloring of is transferred to via , and distributed by on both graphs. Updates on are performed by a neural network based on pair-wise color differences.

3.1 Local Feature Matching

We model our local feature matching procedure in close analogy to related approaches (Bai et al., 2018, 2019; Wang et al., 2019b; Zhang & Lee, 2019; Wang & Solomon, 2019) by computing similarities between nodes in the source graph and the target graph based on node embeddings. That is, given latent node embeddings and computed by a shared neural network for source graph and target graph , respectively, we obtain initial soft correspondences as

Here, normalization is applied to obtain rectangular doubly-stochastic correspondence matrices that fulfill the constraints and (Sinkhorn & Knopp, 1967; Adams & Zemel, 2011; Cour et al., 2006).

We interpret the

-th row vector

as a discrete distribution over potential correspondences in for each node . We train in a dicriminative, supervised fashion against ground truth correspondences by minimizing the negative log-likelihood of correct correspondence scores .

We implement as a Graph Neural Network (GNN) to obtain localized, permutation equivariant vectorial node representations (Bronstein et al., 2017; Hamilton et al., 2017; Battaglia et al., 2018; Goyal & Ferrara, 2018). Formally, a GNN follows a neural message passing scheme (Gilmer et al., 2017) and updates its node features in layer by aggregating localized information via


where and denotes a multiset. The recent work in the fields of

geometric deep learning

and relational representation learning provides a large number of operators to choose from (Kipf & Welling, 2017; Gilmer et al., 2017; Veličković et al., 2018; Schlichtkrull et al., 2018; Xu et al., 2019c), which allows for precise control of the properties of extracted features.

3.2 Synchronous Message Passing for Neighborhood Consensus

Due to the purely local nature of the used node embeddings, our feature matching procedure is prone to finding false correspondences which are locally similar to the correct one. Formally, those cases pose a violation of the neighborhood consensus criteria employed in Equation (1). Since finding a global optimum is -hard, we aim to detect violations of the criteria in local neighborhoods and resolve them in an iterative fashion.

We utilize graph neural networks to detect these violations in a neighborhood consensus step and iteratively refine correspondences , , starting from . Key to the proposed algorithm is the following observation: The soft correspondence matrix is a map from the node function space to the node function space . Therefore, we can use to pass node functions , along the soft correspondences by


to obtain functions , in the other domain, respectively.

Then, our consensus method works as follows: Using , we first map node indicator functions, given as an injective node coloring

in the form of an identity matrix

, from to . Then, we distribute this coloring in corresponding neighborhoods by performing synchronous message passing on both graphs via a shared graph neural network , i.e.,


We can compare the results of both GNNs to recover a vector which measures the neighborhood consensus between node pairs . This measure can be used to perform trainable updates of the correspondence scores


based on an . The process can be applied times to iteratively improve the consensus in neighborhoods. The final objective with

combines both the feature matching error and neighborhood consensus error. This objective is fully-differentiable and can hence be optimized in an end-to-end fashion using stochastic gradient descent. Overall, the consensus stage distributes global node colorings to resolve ambiguities and false matchings made in the first stage of our architecture by only using purely local operators. Since an initial matching is needed to test for neighborhood consensus, this task cannot be fulfilled by

alone, which stresses the importance of our two-stage approach.

The following two theorems show that is a good measure of how well local neighborhoods around and are matched by the soft correspondence between and . The proofs can be found in Appendix B and C, respectively.

Theorem 1.

Let and be two isomorphic graphs and let be a permutation equivariant GNN, i.e., for any permutation matrix . If encodes an isomorphism between and , then for all .

Theorem 2.

Let and be two graphs and let be a permutation equivariant and -layered GNN for which both and are injective for all . If , then the resulting submatrix is a permutation matrix describing an isomorphism between the -hop subgraph around and the -hop subgraph around . Moreover, if for all , then denotes a full isomorphism between and .

Hence, a GNN that satisfies both criteria in Theorem 1 and 2 provides equal node embeddings and if and only if nodes in a local neighborhood are correctly matched to each other. A value indicates the existence of inconsistent matchings in the local neighborhoods around and , and can hence be used to refine the correspondence score .

Note that both requirements, permutation equivariance and injectivity, are easily fulfilled: (1) All common graph neural network architectures following the message passing scheme of Equation (2) are equivariant due to the use of permutation invariant neighborhood aggregators. (2) Injectivity of graph neural networks is a heavily discussed topic in recent literature. It can be fulfilled by using a GNN that is as powerful as the Weisfeiler & Lehman (1968)

(WL) heuristic in distinguishing graph structures,

e.g., by using aggregation in combination with s on the multiset of neighboring node features, cf. (Xu et al., 2019c; Morris et al., 2019).

3.3 Relation to the Graduated Assignment Algorithm

Theoretically, we can relate our proposed approach to classical graph matching techniques that consider a doubly-stochastic relaxation of the problem defined in Equation (1), cf. (Lyzinski et al., 2016) and Appendix F for more details. A seminal work following this method is the graduated assignment algorithm (Gold & Rangarajan, 1996). By starting from an initial feasible solution , a new solution is iteratively computed from by approximately solving a linear assignment problem according to


where denotes the gradient of Equation (1) at .111For clarity of presentation, we closely follow the original formulation of the method for simple graphs but ignore the edge similarities and adapt the constant factor of the gradient according to our objective function. The operator is implemented by applying normalization on rescaled inputs, where the scaling factor grows in every iteration to increasingly encourage integer solutions. Our approach also resembles the approximation of the linear assignment problem via normalization.

Moreover, the gradient is closely related to our neighborhood consensus scheme for the particular simple, non-trainable GNN instantiation . Given and , we obtain by substitution. Instead of updating based on the similarity between and obtained from a fixed-function GNN , we choose to update correspondence scores via trainable neural networks and based on the difference between and . This allows us to interpret our model as a deep parameterized generalization of the graduated assignment algorithm. In addition, specifying node and edge attribute similarities in graph matching is often difficult and complicates its computation (Zhou & De la Torre, 2016; Zhang et al., 2019c), whereas our approach naturally supports continuous node and edge features via established GNN models. We experimentally verify the benefits of using trainable neural networks instead of in Appendix D.

3.4 Scaling to Large Input

We apply a number of optimizations to our proposed algorithm to make it scale to large input domains. See Algorithm 1 in Appendix A for the final optimized algorithm.

Sparse correspondences.

We propose to sparsify initial correspondences by filtering out low score correspondences before neighborhood consensus takes place. That is, we sparsify by computing top correspondences with the help of the KeOps library (Charlier et al., 2019) without ever storing its dense version, reducing its required memory footprint from to . In addition, the time complexity of the refinement phase is reduced from to , where and denote the number of edges in and , respectively. Note that sparsifying initial correspondences assumes that the feature matching procedure ranks the correct correspondence within the top elements for each node . Hence, also optimizing the initial feature matching loss is crucial, and can be further accelerated by training only against sparsified correspondences with ground-truth entries .

Replacing node indicators functions.

Although applying on node indicator functions is computationally efficient, it requires a parameter complexity of . Hence, we propose to replace node indicator functions with randomly drawn node functions , where with , in iteration . By sampling from a continuous distribution, node indicator functions are still guaranteed to be injective (DeGroot & Schervish, 2012). Note that Theorem 1 still holds because it does not impose any restrictions on the function space . Theorem 2 does not necessarily hold anymore, but we expect our refinement strategy to resolve any ambiguities by re-sampling in every iteration . We verify this empirically in Section 4.1.

Softmax normalization.

The normalization fulfills the requirements of rectangular doubly-stochastic solutions. However, it may eventually push correspondences to inconsistent integer solutions very early on from which the neighborhood consensus method cannot effectively recover. Furthermore, it is inherently inefficient to compute and runs the risk of vanishing gradients (Zhang et al., 2019b). Here, we propose to relax this constraint by only applying row-wise normalization on , and expect our supervised refinement procedure to naturally resolve violations of on its own by re-ranking false correspondences via neighborhood consensus. Experimentally, we show that row-wise normalization is sufficient for our algorithm to converge to the correct solution, cf. Section 4.1.

Number of refinement iterations.

Instead of holding fixed, we propose to differ the number of refinement iterations and , , for training and testing, respectively. This does not only speed up training runtime, but it also encourages the refinement procedure to reach convergence with as few steps as necessary while we can run the refinement procedure until convergence during testing. We show empirically that decreasing does not affect the convergence abilities of our neighborhood consensus procedure during testing, cf. Section 4.1.

4 Experiments

We verify our method on three different tasks. We first show the benefits of our approach in an ablation study on synthetic graphs (Section 4.1), and apply it to the real-world tasks of supervised keypoint matching in natural images (Sections 4.2 and 4.3) and semi-supervised cross-lingual knowledge graph alignment (Section 4.4) afterwards. All dataset statistics can be found in Appendix H.

Our method is implemented in PyTorch (Paszke et al., 2017) using the PyTorch Geometric (Fey & Lenssen, 2019) and the KeOps (Charlier et al., 2019) libraries. Our implementation can process sparse mini-batches with parallel GPU acceleration and minimal memory footprint in all algorithm steps. For all experiments, optimization is done via Adam (Kingma & Ba, 2015) with a fixed learning rate of . We use similar architectures for and except that we omit dropout (Srivastava et al., 2014) in . For all experiments, we report Hits@ to evaluate and compare our model to previous lines of work, where Hits@ measures the proportion of correctly matched entities ranked in the top .

4.1 Ablation Study on Synthetic Graphs

In our first experiment, we evaluate our method on synthetic graphs where we aim to learn a matching for pairs of graphs in a supervised fashion. Each pair of graphs consists of an undirected Erdős & Rényi (1959) graph with

nodes and edge probability

, and a target graph which is constructed from by removing edges with probability without disconnecting any nodes (Heimann et al., 2018). Training and evaluation is done on graphs each for different configurations . In Appendix E, we perform additional experiments to also verify the robustness of our approach towards node addition or removal.

Architecture and parameters.

We implement the graph neural network operators and by stacking three layers () of the GIN operator (Xu et al., 2019c)


due to its expressiveness in distinguishing raw graph structures. The number of layers and hidden dimensionality of all s is set to and , respectively, and we apply activation (Glorot et al., 2011)

and Batch normalization

(Ioffe & Szegedy, 2015)

after each of its layers. Input features are initialized with one-hot encodings of node degrees. We employ a

Jumping Knowledge style concatenation (Xu et al., 2018) to compute final node representations . We train and test our procedure with and refinement iterations, respectively.

(a) ,
(b) ,
(c) ,
(d) ,
Figure 2: The performance of our method on synthetic data with structural noise.

Figures 2(a) and 2(b) show the matching accuracy Hits@1 for different choices of and . We observe that the purely local matching approach via starts decreasing in performance with the structural noise increasing. This also holds when applying global normalization on . However, our proposed two-stage architecture can recover all correspondences, independent of the applied structural noise . This applies to both variants discussed in the previous sections, i.e., our initial formulation , and our optimized architecture using random node indicator sampling and row-wise normalization . This highlights the overall benefits of applying matching consensus and justifies the usage of the enhancements made towards scalability in Section 3.4.

In addition, Figure 2(c) visualizes the test error for varying number of iterations . We observe that even when training to non-convergence, our procedure is still able to converge by increasing the number of iterations during testing.

Moreover, Figure 2(d) shows the performance of our refinement strategy when operating on sparsified top correspondences. In contrast to its dense version, it cannot match all nodes correctly due to the poor initial feature matching quality. However, it consistently converges to the perfect solution of Hits@1 Hits@ in case the correct match is included in the initial top ranking of correspondences. Hence, with increasing , we can recover most of the correct correspondences, making it an excellent option to scale our algorithm to large graphs, cf. Section 4.4.

4.2 Supervised Keypoint Matching in Natural Images

We perform experiments on the PascalVOC (Everingham et al., 2010) with Berkeley annotations (Bourdev & Malik, 2009) and WILLOW-ObjectClass (Cho et al., 2013) datasets which contain sets of image categories with labeled keypoint locations. For PascalVOC, we follow the experimental setups of Zanfir & Sminchisescu (2018) and Wang et al. (2019b) and use the training and test splits provided by Choy et al. (2016). We pre-filter the dataset to exclude difficult, occluded and truncated objects, and require examples to have at least one keypoint, resulting in and annotated images for training and testing, respectively. The PascalVOC dataset contains instances of varying scale, pose and illumination, and the number of keypoints ranges from to . In contrast, the WILLOW-ObjectClass dataset contains at least 40 images with consistent orientations for each of its five categories, and each image consists of exactly 10 keypoints. Following the experimental setup of peer methods (Cho et al., 2013; Wang et al., 2019b), we pre-train our model on PascalVOC and fine-tune it over 20 random splits with 20 per-class images used for training. We construct graphs via the Delaunay triangulation of keypoints. For fair comparison with Zanfir & Sminchisescu (2018) and Wang et al. (2019b), input features of keypoints are given by the concatenated output of relu4_2 and relu5_1 of a pre-trained VGG16 (Simonyan & Zisserman, 2014) on ImageNet (Deng et al., 2009).

Architecture and parameters.

We adopt SplineCNN (Fey et al., 2018) as our graph neural network operator


whose trainable B-spline based kernel function is conditioned on edge features between node-pairs. To align our results with the related work, we evaluate both isotropic and anisotropic edge features which are given as normalized relative distances and 2D Cartesian coordinates, respectively. For SplineCNN, we use a kernel size of in each dimension, a hidden dimensionality of , and apply as our non-linearity function . Our network architecture consists of two convolutional layers (), followed by dropout with probability , and a final linear layer. During training, we form pairs between any two training examples of the same category, and evaluate our model by sampling a fixed number of test graph pairs belonging to the same category.

Method Aero Bike Bird Boat Bottle Bus Car Cat Chair Cow Table Dog Horse M-Bike Person Plant Sheep Sofa Train TV Mean
GMN 31.1 46.2 58.2 45.9 70.6 76.5 61.2 61.7 35.5 53.7 58.9 57.5 56.9 49.3 34.1 77.5 57.1 53.6 83.2 88.6 57.9
PCA-GM 40.9 55.0 65.8 47.9 76.9 77.9 63.5 67.4 33.7 66.5 63.6 61.3 58.9 62.8 44.9 77.5 67.4 57.5 86.7 90.9 63.8

34.7 42.6 41.5 50.4 50.3 72.2 60.1 59.4 24.6 38.1 86.2 47.7 56.3 37.6 35.4 58.0 45.8 74.8 64.1 75.3 52.8
45.8 58.2 45.5 57.6 68.2 82.1 75.3 60.2 31.7 52.9 88.2 56.2 68.2 50.7 46.5 66.3 58.8 89.0 85.1 79.9 63.3
45.3 57.1 54.9 54.7 71.7 82.6 75.3 65.9 31.6 50.8 86.1 56.9 67.1 53.1 49.2 77.3 59.2 91.7 82.0 84.2 64.8

44.3 62.0 48.4 53.9 73.3 80.4 72.2 64.2 30.3 52.7 79.4 56.6 62.3 56.2 47.5 74.0 59.8 79.9 81.9 83.0 63.1
46.5 63.7 54.9 60.9 79.4 84.1 76.4 68.3 38.5 61.5 80.6 59.7 69.8 58.4 54.3 76.4 64.5 95.7 87.9 81.3 68.1
50.1 65.4 55.7 65.3 80.0 83.5 78.3 69.7 34.7 60.7 70.4 59.9 70.0 62.2 56.1 80.2 70.3 88.8 81.1 84.3 68.3

34.3 45.9 37.3 47.7 53.3 75.2 64.5 61.7 27.7 40.5 85.9 46.6 50.2 39.0 37.3 58.0 49.2 82.9 65.0 74.2 53.8
44.6 51.2 50.7 58.5 72.3 83.3 76.6 65.6 31.0 57.5 91.7 55.4 69.5 56.2 47.5 85.1 57.9 92.3 86.7 85.9 66.0
48.7 57.2 47.0 65.3 73.9 87.6 76.7 70.0 30.0 55.5 92.8 59.5 67.9 56.9 48.7 87.2 58.3 94.9 87.9 86.0 67.6

42.1 57.5 49.6 59.4 83.8 84.0 78.4 67.5 37.3 60.4 85.0 58.0 66.0 54.1 52.6 93.9 60.2 85.6 87.8 82.5 67.3
45.5 67.6 56.5 66.8 86.9 85.2 84.2 73.0 43.6 66.0 92.3 64.0 79.8 56.6 56.1 95.4 64.4 95.0 91.3 86.3 72.8
47.0 65.7 56.8 67.6 86.9 87.7 85.3 72.6 42.9 69.1 84.5 63.8 78.1 55.6 58.4 98.0 68.4 92.2 94.5 85.5 73.0
Table 1: Hits@1 (%) on the PascalVOC dataset with Berkeley keypoint annotations.
Method Face Motorbike Car Duck Winebottle
GMN (Zanfir & Sminchisescu, 2018) 99.3 71.4 74.3 82.8 76.7
PCA-GM (Wang et al., 2019b) 100.0 76.7 84.0 93.5 96.9
isotropic 98.07 0.79 48.97 4.62 65.30 3.16 66.02 2.51 77.72 3.32
100.00 0.00 67.28 4.93 85.07 3.93 83.10 3.61 92.30 2.11
100.00 0.00 68.57 3.94 82.75 5.77 84.18 4.15 90.36 2.42
isotropic 99.62 0.28 73.47 3.32 77.47 4.92 77.10 3.25 88.04 1.38
100.00 0.00 92.05 3.49 90.05 5.10 88.98 2.75 97.14 1.41
100.00 0.00 92.05 3.24 90.28 4.67 88.97 3.49 97.14 1.83
anisotropic 98.47 0.61 49.28 4.31 64.95 3.52 66.17 4.08 78.08 2.61
100.00 0.00 76.28 4.77 86.70 3.25 83.22 3.52 93.65 1.64
100.00 0.00 76.57 5.28 89.00 3.88 84.78 2.73 95.29 2.22
anisotropic 99.96 0.06 91.90 2.30 91.28 4.89 86.58 2.99 98.25 0.71
100.00 0.00 98.80 1.58 96.53 1.55 93.22 3.77 99.87 0.31
100.00 0.00 99.40 0.80 95.53 2.93 93.00 2.71 99.39 0.70
Table 2:

Hits@1 (%) with standard deviations on the

WILLOW-ObjectClass dataset.

We follow the experimental setup of Wang et al. (2019b) and train our models using negative log-likelihood due to its superior performance in contrast to the displacement loss used in Zanfir & Sminchisescu (2018). We evaluate our complete architecture using isotropic and anisotropic GNNs for , and include ablation results obtained from using for the local node matching procedure. Results of Hits@1 are shown in Table 1 and 2 for PascalVOC and WILLOW-ObjectClass, respectively. We visualize qualitative results of our method in Appendix I.

We observe that our refinement strategy is able to significantly outperform competing methods as well as our non-refined baselines. On the WILLOW-ObjectClass dataset, our refinement stage at least reduces the error of the initial model () by half across all categories. The benefits of the second stage are even more crucial when starting from a weaker initial feature matching baseline (), with overall improvements of up to percentage points on PascalVOC. However, good initial matchings do help our consensus stage to improve its performance further, as indicated by the usage of task-specific isotropic or anisotropic GNNs for .

4.3 Supervised Geometric Keypoint Matching

We also verify our approach by tackling the geometric feature matching problem, where we only make use of point coordinates and no additional visual features are available. Here, we follow the experimental training setup of Zhang & Lee (2019), and test the generalization capabilities of our model on the PascalPF dataset (Ham et al., 2016). For training, we generate a synthetic set of graph pairs: We first randomly sample 30–60 source points uniformly from , and add Gaussian noise from

to these points to obtain the target points. Furthermore, we add 0–20 outliers from

to each point cloud. Finally, we construct graphs by connecting each node with its -nearest neighbors (). We train our unmodified anisotropic keypoint architecture from Section 4.2 with input until it has seen synthetic examples.


We evaluate our trained model on the PascalPF dataset (Ham et al., 2016) which consists of image pairs within 20 classes, with the number of keypoints ranging from 4 to 17. Results of Hits@1 are shown in Table 3. Overall, our consensus architecture improves upon the state-of-the-art results of Zhang & Lee (2019) on almost all categories while our baseline is weaker than the results reported in Zhang & Lee (2019), showing the benefits of applying our consensus stage. In addition, it shows that our method works also well even when not taking any visual information into account.

Method Aero Bike Bird Boat Bottle Bus Car Cat Chair Cow Table Dog Horse M-Bike Person Plant Sheep Sofa Train TV Mean
(Zhang & Lee, 2019) 76.1 89.8 93.4 96.4 96.2 97.1 94.6 82.8 89.3 96.7 89.7 79.5 82.6 83.5 72.8 76.7 77.1 97.3 98.2 99.5 88.5
Ours 69.2 87.7 77.3 90.4 98.7 98.3 92.5 91.6 94.7 79.4 95.8 90.1 80.0 79.5 72.5 98.0 76.5 89.6 93.4 97.8 87.6
81.3 92.2 94.2 98.8 99.3 99.1 98.6 98.2 99.6 94.1 100.0 99.4 86.6 86.6 88.7 100.0 100.0 100.0 100.0 99.3 95.8
81.1 92.0 94.7 100.0 99.3 99.3 98.9 97.3 99.4 93.4 100.0 99.1 86.3 86.2 87.7 100.0 100.0 100.0 100.0 99.3 95.7
Table 3: Hits@1 (%) on the PascalPF dataset using a synthetic training setup.

4.4 Semi-supervised Cross-lingual Knowledge Graph Alignment

We evaluate our model on the DBP15K datasets (Sun et al., 2017) which link entities of the Chinese, Japanese and French knowledge graphs of DBpedia into the English version and vice versa. Each dataset contains exactly links between equivalent entities, and we split those links into training and testing following upon previous works. For obtaining entity input features, we follow the experimental setup of Xu et al. (2019d): We retrieve monolingual fastText embeddings (Bojanowski et al., 2017) for each language separately, and align those into the same vector space afterwards (Lample et al., 2018). We use the sum of word embeddings as the final entity input representation (although more sophisticated approaches are just as conceivable).

Architecture and parameters.

Our graph neural network operator mostly matches the one proposed in Xu et al. (2019d) where the direction of edges is retained, but not their specific relation type:


We use followed by dropout with probability as our non-linearity , and obtain final node representations via . We use a three-layer GNN () both for obtaining initial similarities and for refining alignments with dimensionality and , respectively. Training is performed using negative log likelihood in a semi-supervised fashion: For each training node in , we train sparsely by using the corresponding ground-truth node in , the top entries in and randomly sampled entities in . For the refinement phase, we update the sparse top correspondence matrix times. For efficiency reasons, we train and sequentially for epochs each.

@1 @10 @1 @10 @1 @10 @1 @10 @1 @10 @1 @10
GCN (Wang et al., 2018) 41.25 74.38 36.49 69.94 39.91 74.46 38.42 71.81 37.29 74.49 36.77 73.06
BootEA (Sun et al., 2018) 62.94 84.75 62.23 85.39 65.30 87.44
MuGNN (Cao et al., 2019) 49.40 84.40 50.10 85.70 49.60 87.00
NAEA (Zhu et al., 2019) 65.01 86.73 64.14 87.27 67.32 89.43
RDGCN (Wu et al., 2019) 70.75 84.55 76.74 89.54 88.64 95.72
GMNN (Xu et al., 2019d) 67.93 78.48 65.28 79.64 73.97 87.15 71.29 84.63 89.38 95.25 88.18 94.75
58.53 78.04 54.99 74.25 59.18 79.16 55.40 75.53 76.07 91.54 74.89 90.57
Ours (sparse) 67.59 87.47 64.38 83.56 71.95 89.74 68.88 86.84 83.36 96.03 82.16 95.28
80.12 87.47 76.77 83.56 84.80 89.74 81.09 86.84 93.34 96.03 91.95 95.28
Table 4: Hits@1 (%) and Hits@10 (%) on the DBP15K dataset.

We report Hits@1 and Hits@10 to evaluate and compare our model to previous lines of work, see Table 4. In addition, we report results of a simple three-layer which matches nodes purely based on initial word embeddings, and a variant of our model without the refinement of initial correspondences (). Our approach improves upon the state-of-the-art on all categories with gains of up to percentage points. In addition, our refinement strategy consistently improves upon the Hits@1 of initial correspondences by a significant margin, while results of Hits@10 are shared due to the refinement operating only on sparsified top initial correspondences. Due to the scalability of our approach, we can easily apply a multitude of refinement iterations while still retaining large hidden feature dimensionalities.

5 Limitations

Our experimental results demonstrate that the proposed approach effectively solves challenging real-world problems. However, the expressive power of GNNs is closely related to the WL heuristic for graph isomorphism testing (Xu et al., 2019c; Morris et al., 2019), whose power and limitations are well understood (Arvind et al., 2015). Our method generally inherits these limitations. Hence, one possible limitation is that whenever two nodes are assigned the same color by WL, our approach may fail to converge to one of the possible solutions. For example, there may exist two nodes with equal neighborhood sets . One can easily see that the feature matching procedure generates equal initial correspondence distributions , resulting in the same mapped node indicator functions from to nodes and , respectively. Since both nodes share the same neighborhood, also produces the same distributed functions . As a result, both column vectors and receive the same update, leading to non-convergence. In theory, one might resolve these ambiguities by adding a small amount of noise to . However, the general amount of feature noise present in real-world datasets already ensures that this scenario is unlikely to occur.

6 Related Work

Identifying correspondences between the nodes of two graphs has been studied in various domains and an extensive body of literature exists. Closely related problems are summarized under the terms maximum common subgraph (Kriege et al., 2019b), network alignment (Zhang, 2016), graph edit distance (Chen et al., 2019) and graph matching (Yan et al., 2016). We refer the reader to the Appendix F for a detailed discussion of the related work on these problems. Recently, graph neural networks have become a focus of research leading to various proposed deep graph matching techniques (Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d; Derr et al., 2019). In Appendix G, we present a detailed overview of the related work in this field while highlighting individual differences and similarities to our proposed graph matching consensus procedure.

7 Conclusion

We presented a two-stage neural architecture for learning node correspondences between graphs in a supervised or semi-supervised fashion. Our approach is aimed towards reaching a neighborhood consensus between matchings, and can resolve violations of this criteria in an iterative fashion. In addition, we proposed enhancements to let our algorithm scale to large input domains. We evaluated our architecture on real-world datasets on which it consistently improved upon the state-of-the-art.


This work has been supported by the German Research Association (DFG) within the Collaborative Research Center SFB 876 Providing Information by Resource-Constrained Analysis, projects A6 and B2.


  • Adams & Zemel (2011) R. P. Adams and R. S. Zemel. Ranking via sinkhorn propagation. CoRR, abs/1106.1925, 2011.
  • Aflalo et al. (2015) Y. Aflalo, A. Bronstein, and R. Kimmel. On convex relaxation of graph isomorphism. Proceedings of the National Academy of Sciences, 112(10), 2015.
  • Anstreicher (2003) K. Anstreicher. Recent advances in the solution of quadratic assignment problems. Mathematical Programming, 97, 2003.
  • Arvind et al. (2015) V. Arvind, J. Köbler, G. Rattan, and O. Verbitsky. On the power of color refinement. In Fundamentals of Computation Theory, 2015.
  • Bai et al. (2018) Y. Bai, H. Ding, Y. Sun, and W. Wang. Convolutional set matching for graph similarity. In NeurIPS-W, 2018.
  • Bai et al. (2019) Y. Bai, H. Ding, S. Bian, T. Chen, Y. Sun, and W. Wang. SimGNN: A neural network approach to fast graph similarity computation. In WSDM, 2019.
  • Battaglia et al. (2018) P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, Ç. Gülçehre, F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu. Relational inductive biases, deep learning, and graph networks. CoRR, abs/1806.01261, 2018.
  • Bayati et al. (2013) M. Bayati, D. F. Gleich, A. Saberi, and Y. Wang. Message-passing algorithms for sparse network alignment. ACM Transactions on Knowledge Discovery from Data, 7(1), 2013.
  • Bento & Ioannidis (2018) J. Bento and S. Ioannidis. A family of tractable graph distances. In SDM, 2018.
  • Bojanowski et al. (2017) P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5, 2017.
  • Bougleux et al. (2017) S. Bougleux, L. Brun, V. Carletti, P. Foggia, B. Gaüzère, and M. Vento. Graph edit distance as a quadratic assignment problem. Pattern Recognition Letters, 87, 2017.
  • Bourdev & Malik (2009) L. Bourdev and J. Malik. Poselets: Body part detectors trained using 3D human pose annotations. In ICCV, 2009.
  • Bronstein et al. (2017) M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst. Geometric deep learning: Going beyond euclidean data. In Signal Processing Magazine, 2017.
  • Bunke (1997) H. Bunke. On a relation between graph edit distance and maximum common subgraph. Pattern Recognition Letters, 18(8), 1997.
  • Bunke & Shearer (1998) H. Bunke and K. Shearer. A graph distance metric based on the maximal common subgraph. Pattern Recognition Letters, 19(4), 1998.
  • Caetano et al. (2009) T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. Learning graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6), 2009.
  • Cao et al. (2019) Y. Cao, Z. Liu, C. Li, Z. Liu, J. Li, and T. Chua. Multi-channel graph neural network for entity alignment. In ACL, 2019.
  • Charlier et al. (2019) B. Charlier, J. Feydy, and Glaunès. KeOps., 2019.
  • Chen et al. (2019) X. Chen, H. Huo, J. Huan, and J. S. Vitter. An efficient algorithm for graph edit distance computation. Knowledge-Based Systems, 163, 2019.
  • Cho et al. (2013) M. Cho, K. Alahari, and J. Ponce. Learning graphs to match. In ICCV, 2013.
  • Choy et al. (2016) C. B. Choy, J. Gwak, S. Savarese, and M. Chandraker. Universal correspondence network. In NIPS, 2016.
  • Conte et al. (2004) D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition.

    International Journal of Pattern Recognition and Artificial Intelligence

    , 18, 2004.
  • Cortés et al. (2019) X. Cortés, D. Conte, and H. Cardot.

    Learning edit cost estimation models for graph edit distance.

    Pattern Recognition Letters, 125, 2019.
  • Cour et al. (2006) T. Cour, P. Srinivasan, and J. Shi. Balanced graph matching. In NIPS, 2006.
  • DeGroot & Schervish (2012) M. H. DeGroot and M. J. Schervish. Probability and Statistics. Addison-Wesley, 2012.
  • Deng et al. (2009) J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009.
  • Derr et al. (2019) T. Derr, H. Karimi, X. Liu, J. Xu, and J. Tang. Deep adversarial network alignment. CoRR, abs/1902.10307, 2019.
  • Egozi et al. (2013) A. Egozi, Y. Keller, and H. Guterman. A probabilistic approach to spectral graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1), 2013.
  • Erdős & Rényi (1959) P. Erdős and A. Rényi. On random graphs I. Publicationes Mathematicae Debrecen, 6, 1959.
  • Everingham et al. (2010) M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes (VOC) challenge. In IJCV, 2010.
  • Fey & Lenssen (2019) M. Fey and J. E. Lenssen. Fast graph representation learning with PyTorch Geometric. In ICLR-W, 2019.
  • Fey et al. (2018) M. Fey, J. E. Lenssen, F. Weichert, and H. Müller. SplineCNN: Fast geometric deep learning with continuous B-spline kernels. In CVPR, 2018.
  • Garey & Johnson (1979) M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, 1979.
  • Gilmer et al. (2017) J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In ICML, 2017.
  • Glorot et al. (2011) X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier neural networks. In AISTATS, 2011.
  • Gold & Rangarajan (1996) S. Gold and A. Rangarajan. A graduated assignment algorithm for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(4), 1996.
  • Gori et al. (2005) M. Gori, M. Maggini, and L. Sarti. Exact and approximate graph matching using random walks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005.
  • Gouda & Hassaan (2016) K. Gouda and M. Hassaan. CSI_GED: An efficient approach for graph edit similarity computation. In ICDE, 2016.
  • Goyal & Ferrara (2018) P. Goyal and E. Ferrara. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 2018.
  • Grohe et al. (2018) M. Grohe, G. Rattan, and G. J. Woeginger. Graph similarity and approximate isomorphism. In Mathematical Foundations of Computer Science, 2018.
  • Grover & Leskovec (2016) A. Grover and J. Leskovec. Node2Vec: Scalable feature learning for networks. In SIGKDD, 2016.
  • Halimi et al. (2019) O. Halimi, O. Litany, E. Rodolà, A. M. Bronstein, and R. Kimmel.

    Self-supervised learning of dense shape correspondence.

    In CVPR, 2019.
  • Ham et al. (2016) B. Ham, M. Cho, C. Schmid, and J. Ponce. Proposal flow. In CVPR, 2016.
  • Hamilton et al. (2017) W. L. Hamilton, R. Ying, and J. Leskovec. Representation learning on graphs: Methods and applications. IEEE Data Engineering Bulletin, 40(3), 2017.
  • Heimann et al. (2018) M. Heimann, H. Shen, T. Safavi, and D. Koutra. REGAL: Representation learning-based graph alignment. In CIKM, 2018.
  • Ioffe & Szegedy (2015) S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
  • Jaggi (2013) M. Jaggi. Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In ICML, 2013.
  • Kann (1992) V. Kann. On the approximability of the maximum common subgraph problem. In STACS, 1992.
  • Kersting et al. (2014) Kristian Kersting, Martin Mladenov, Roman Garnett, and Martin Grohe. Power iterated color refinement. In AAAI, 2014.
  • Kingma & Ba (2015) D. P. Kingma and J. L. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
  • Kipf & Welling (2017) T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017.
  • Klau (2009) G. W. Klau. A new graph-based method for pairwise global network alignment. BMC Bioinformatics, 10, 2009.
  • Kollias et al. (2012) G. Kollias, S. Mohammadi, and A. Grama. Network similarity decomposition (NSD): A fast and scalable approach to network alignment. IEEE Tranactions on Knowledge and Data Engineering, 24(12), 2012.
  • Kriege et al. (2019a) N. M. Kriege, P. L. Giscard, F. Bause, and R. C. Wilson. Computing optimal assignments in linear time for approximate graph matching. In ICDM, 2019a.
  • Kriege et al. (2019b) N. M. Kriege, L. Humbeck, and O. Koch. Chemical similarity and substructure searches. In Encyclopedia of Bioinformatics and Computational Biology. Academic Press, 2019b.
  • Lample et al. (2018) G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jégou. Word translation without parallel data. In ICLR, 2018.
  • Leordeanu & Hebert (2005) M. Leordeanu and M. Hebert. A spectral technique for correspondence problems using pairwise constraints. In ICCV, 2005.
  • Leordeanu et al. (2009) M. Leordeanu, M. Hebert, and R. Sukthankar. An integer projected fixed point method for graph matching and MAP inference. In NIPS, 2009.
  • Lerouge et al. (2017) J. Lerouge, Z. Abu-Aisheh, R. Raveaux, P. Héroux, and S. Adam.

    New binary linear programming formulation to compute the graph edit distance.

    Pattern Recognition, 72, 2017.
  • Li et al. (2019) Y. Li, C. Gu, T. Dullien, O. Vinyals, and P. Kohli. Graph matching networks for learning the similarity of graph structured objects. In ICML, 2019.
  • Litany et al. (2017) O. Litany, T. Remez, E. Rodolà, A. M. Bronstein, and M. M. Bronstein. Deep functional maps: Structured prediction for dense shape correspondence. In ICCV, 2017.
  • Lyzinski et al. (2016) V. Lyzinski, D. E. Fishkind, M. Fiori, J. T. Vogelstein, C. E. Priebe, and G. Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), 2016.
  • Matula (1978) D. W. Matula. Subtree isomorphism in . In Algorithmic Aspects of Combinatorics, volume 2. Elsevier, 1978.
  • Morris et al. (2019) C. Morris, M. Ritzert, M. Fey, W. L. Hamilton, J. E. Lenssen, G. Rattan, and M. Grohe. Weisfeiler and Leman go neural: Higher-order graph neural networks. In AAAI, 2019.
  • Murphy et al. (2019) R. L. Murphy, B. Srinivasan, V. Rao, and B. Ribeiro. Relational pooling for graph representations. In ICML, 2019.
  • Ovsjanikov et al. (2012) M. Ovsjanikov, M. Ben-Chen, J. Solomon, A. Butscher, and L. J. Guibas. Functional maps: A flexible representation of maps between shapes. ACM Transactions on Graphics, 31(4), 2012.
  • Page et al. (1999) L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.
  • Paszke et al. (2017) A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in PyTorch. In NIPS-W, 2017.
  • Peyré et al. (2016) G. Peyré, M. Cuturi, and J. Solomon. Gromov-Wasserstein averaging of kernel and distance matrices. In ICML, 2016.
  • Riesen & Bunke (2009) K. Riesen and H. Bunke. Approximate graph edit distance computation by means of bipartite graph matching. Image and Vision Computing, 27(7), 2009.
  • Riesen et al. (2015a) K. Riesen, M. Ferrer, R. Dornberger, and H. Bunke. Greedy graph edit distance. In Machine Learning and Data Mining in Pattern Recognition, 2015a.
  • Riesen et al. (2015b) K. Riesen, M. Ferrer, A. Fischer, and H. Bunke. Approximation of graph edit distance in quadratic time. In Graph-Based Representations in Pattern Recognition, 2015b.
  • Rocco et al. (2018) I. Rocco, M. Cimpo, R. Arandjelović, A. Torii, T. Pajdla, and J. Sivic. Neighbourhood consensus networks. In NeurIPS, 2018.
  • Rodolà et al. (2017) E. Rodolà, L. Cosmo, M. M. Bronstein, A. Torsello, and D. Cremers. Partial functional correspondence. Computer Graphics Forum, 36(1), 2017.
  • Sanfeliu & Fu (1983) A. Sanfeliu and K. S. Fu. A distance measure between attributed relational graphs for pattern recognition. IEEE Transactions on Systems, Man, and Cybernetics, 13(3), 1983.
  • Sattler et al. (2009) T. Sattler, B. Leibe, and L. Kobbelt. SCRAMSAC: Improving RANSAC’s efficiency with a spatial consistency filter. In ICCV, 2009.
  • Schlichtkrull et al. (2018) M. S. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In ESWC, 2018.
  • Schmid & Mohr (1997) C. Schmid and R. Mohr.

    Local grayvalue invariants for image retrieval.

    IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 1997.
  • Sharan & Ideker (2006) R. Sharan and T. Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology, 24(4), 2006.
  • Simonyan & Zisserman (2014) K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2014.
  • Singh et al. (2008) R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. In National Academy of Sciences, 2008.
  • Sinkhorn & Knopp (1967) R. Sinkhorn and P. Knopp. Concerning nonnegative matrices and doubly stochastic matrices. Pacific Journal of Mathematics, 21(2), 1967.
  • Sivic & Zisserman (2003) J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, 2003.
  • Srivastava et al. (2014) N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 2014.
  • Stauffer et al. (2017) M. Stauffer, T. Tschachtli, A. Fischer, and K. Riesen. A survey on applications of bipartite graph edit distance. In Graph-Based Representations in Pattern Recognition, 2017.
  • Sun et al. (2017) Z. Sun, W. Hu, and C. Li. Cross-lingual entity alignment via joint attribute-preserving embedding. In ISWC, 2017.
  • Sun et al. (2018) Z. Sun, W. Hu, Q. Zhang, and Y. Qu. Bootstrapping entity alignment with knowledge graph embedding. In IJCAI, 2018.
  • Swoboda et al. (2017) P. Swoboda, C. Rother, H. A. Ahljaija, D. Kainmueller, and B. Savchynskyy. A study of lagrangean decompositions and dual ascent solvers for graph matching. In CVPR, 2017.
  • Tinhofer (1991) G. Tinhofer. A note on compact graphs. Discrete Applied Mathematics, 30(2), 1991.
  • Umeyama (1988) S. Umeyama. An eigendecomposition approach to weighted graph matching problems. IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(5), 1988.
  • Veličković et al. (2018) P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio. Graph attention networks. In ICLR, 2018.
  • Vento & Foggia (2012) M. Vento and P. Foggia. Graph matching techniques for computer vision. Graph-Based Methods in Computer Vision: Developments and Applications, 1, 2012.
  • Wang et al. (2019a) F. Wang, N. Xue, Y. Zhang, G. Xia, and M. Pelillo. A functional representation for graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019a.
  • Wang et al. (2019b) R. Wang, J. Yan, and X. Yang. Learning combinatorial embedding networks for deep graph matching. In ICCV, 2019b.
  • Wang & Solomon (2019) Y. Wang and J. M. Solomon. Deep closest point: Learning representations for point cloud registration. In ICCV, 2019.
  • Wang et al. (2018) Z. Wang, Q. Lv, X. Lan, and Y. Zhang. Cross-lingual knowledge graph alignment via graph convolutional networks. In EMNLP, 2018.
  • Weisfeiler & Lehman (1968) B. Weisfeiler and A. A. Lehman. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsia, 2(9), 1968.
  • Wu et al. (2019) Y. Wu, X. Liu, Y. Feng, Z. Wang, R. Yan, and D. Zhao. Relation-aware entity alignment for heterogeneous knowledge graphs. In IJCAI, 2019.
  • Xu et al. (2019a) H. Xu, D. Luo, and L. Carin. Scalable Gromov-Wasserstein learning for graph partitioning and matching. CoRR, abs/1905.07645, 2019a.
  • Xu et al. (2019b) H. Xu, D. Luo, H. Zha, and L. Carin. Gromov-wasserstein learning for graph matching and node embedding. In ICML, 2019b.
  • Xu et al. (2018) K. Xu, C. Li, Y. Tian, T. Sonobe, K. Kawarabayashi, and S. Jegelka. Representation learning on graphs with jumping knowledge networks. In ICML, 2018.
  • Xu et al. (2019c) K. Xu, W. Hu, J. Leskovec, and S. Jegelka. How powerful are graph neural networks? In ICLR, 2019c.
  • Xu et al. (2019d) K. Xu, L. Wang, M. Yu, Y. Feng, Y. Song, Z. Wang, and D. Yu. Cross-lingual knowledge graph alignment via graph matching neural network. In ACL, 2019d.
  • Yan et al. (2016) J. Yan, X. C. Yin, W. Lin, C. Deng, H. Zha, and X. Yang. A short survey of recent advances in graph matching. In ICMR, 2016.
  • Zanfir & Sminchisescu (2018) A. Zanfir and C. Sminchisescu. Deep learning of graph matching. In CVPR, 2018.
  • Zaslavskiy et al. (2009) M. Zaslavskiy, F. Bach, and J. P. Vert. A path following algorithm for the graph matching problem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12), 2009.
  • Zhang (2016) H. Zhang, S.and Tong. FINAL: fast attributed network alignment. In SIGKDD, 2016.
  • Zhang & Philip (2015) J. Zhang and S. Y. Philip. Multiple anonymized social networks alignment. In ICDM, 2015.
  • Zhang et al. (2019a) W. Zhang, K. Shu, H. Liu, and Y. Wang. Graph neural networks for user identity linkage. CoRR, abs/1903.02174, 2019a.
  • Zhang et al. (2019b) Y. Zhang, A. Prügel-Bennett, and J. Hare. Learning representations of sets through optimized permutations. In ICLR, 2019b.
  • Zhang & Lee (2019) Z. Zhang and W. S. Lee. Deep graphical feature learning for the feature matching problem. In ICCV, 2019.
  • Zhang et al. (2019c) Z. Zhang, Y. Xiang, L. Wu, B. Xue, and A. Nehorai. KerGM: Kernelized graph matching. In NeurIPS, 2019c.
  • Zhou & De la Torre (2016) F. Zhou and F. De la Torre. Factorized graph matching. In CVPR, 2016.
  • Zhu et al. (2017) J. Y. Zhu, T. Park, P. Isola, and A. A. Efros.

    Unpaired image-to-image translation using cycle-consistent adversarial networks.

    In ICCV, 2017.
  • Zhu et al. (2019) Q. Zhu, X. Zhou, J. Wu, J. Tan, and L. Guo. Neighborhood-aware attentional representation for multilingual knowledge graphs. In IJCAI, 2019.

Appendix A Optimized Graph Matching Consensus Algorithm

Our final optimized algorithm is given in Algorithm 1:

Input: , , hidden node dimensionality , sparsity parameter , number of consensus iterations , number of random functions
Output: Sparse soft correspondence matrix with non-zero entries
Compute node embeddings
Compute node embeddings
Local feature matching
Sparsify to top candidates
for  in  do
      Normalize scores
      Sample random node function
      Map random node functions from to
      Distribute function on
      Distribute function on
      Compute neighborhood consensus measure
      Perform trainable correspondence update
end for
Normalize scores
Algorithm 1 Optimized graph matching consensus algorithm

Appendix B Proof for Theorem 1


Since is permutation equivariant, it holds for any node feature matrix that . With and , it follows that

Hence, it shows that for any node , resulting in . ∎

Appendix C Proof for Theorem 2


Let be . Then, the -layered GNN maps both -hop neighborhoods around nodes and to the same vectorial representation: