neural-subgraph-learning-GNN
None
view repo
This work presents a two-stage neural architecture for learning and refining structural correspondences between graphs. First, we use localized node embeddings computed by a graph neural network to obtain an initial ranking of soft correspondences between nodes. Secondly, we employ synchronous message passing networks to iteratively re-rank the soft correspondences to reach a matching consensus in local neighborhoods between graphs. We show, theoretically and empirically, that our message passing scheme computes a well-founded measure of consensus for corresponding neighborhoods, which is then used to guide the iterative re-ranking process. Our purely local and sparsity-aware architecture scales well to large, real-world inputs while still being able to recover global correspondences consistently. We demonstrate the practical effectiveness of our method on real-world tasks from the fields of computer vision and entity alignment between knowledge graphs, on which we improve upon the current state-of-the-art. Our source code is available under https://github.com/rusty1s/ deep-graph-matching-consensus.
READ FULL TEXT VIEW PDFNone
Graph matching refers to the problem of establishing meaningful structural correspondences of nodes between two or more graphs by taking both node similarities and pairwise edge similarities into account (Wang et al., 2019b). Since graphs are natural representations for encoding relational data, the problem of graph matching lies at the heart of many real-world applications. For example, comparing molecules in cheminformatics (Kriege et al., 2019b), matching protein networks in bioinformatics (Sharan & Ideker, 2006; Singh et al., 2008), linking user accounts in social network analysis (Zhang & Philip, 2015), and tracking objects, matching 2D/3D shapes or recognizing actions in computer vision (Vento & Foggia, 2012) can be formulated as a graph matching problem.
The problem of graph matching has been heavily investigated in theory (Grohe et al., 2018) and practice (Conte et al., 2004), usually by relating it to domain-agnostic distances such as the graph edit distance (Stauffer et al., 2017) and the maximum common subgraph problem (Bunke & Shearer, 1998), or by formulating it as a quadratic assignment problem (Yan et al., 2016). Since all three approaches are -hard, solving them to optimality may not be tractable for large-scale, real-world instances. Moreover, these purely combinatorial approaches do not adapt to the given data distribution and often do not consider continuous node embeddings which can provide crucial information about node semantics.
Recently, various neural architectures have been proposed to tackle the task of graph matching (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d, b; Derr et al., 2019; Zhang et al., 2019a; Heimann et al., 2018) or graph similarity (Bai et al., 2018, 2019; Li et al., 2019) in a data-dependent fashion. However, these approaches are either only capable of computing similarity scores between whole graphs (Bai et al., 2018, 2019; Li et al., 2019), rely on an inefficient global matching procedure (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Xu et al., 2019d; Li et al., 2019), or do not generalize to unseen graphs (Xu et al., 2019b; Derr et al., 2019; Zhang et al., 2019a). Moreover, they might be prone to match neighborhoods between graphs inconsistently by only taking localized embeddings into account (Zanfir & Sminchisescu, 2018; Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d; Derr et al., 2019; Heimann et al., 2018).
Here, we propose a fully-differentiable graph matching procedure which aims to reach a data-driven neighborhood consensus between matched node pairs without the need to solve any optimization problem during inference. In addition, our approach is purely local, i.e., it operates on fixed-size neighborhoods around nodes, and is sparsity-aware, i.e., it takes the sparsity of the underlying structures into account. Hence, our approach scales well to large input domains, and can be trained in an end-to-end fashion to adapt to a given data distribution. Finally, our approach improves upon the state-of-the-art on several real-world applications from the fields of computer vision and entity alignment on knowledge graphs.
A graph consists of a finite set of nodes , an adjacency matrix , a node feature matrix , and an optional (sparse) edge feature matrix . For a subset of nodes , denotes the subgraph of induced by . We refer to as the -hop neighborhood around node , where denotes the shortest-path distance in . A node coloring is a function with arbitrary codomain .
The problem of graph matching refers to establishing node correspondences between two graphs. Formally, we are given two graphs, a source graph and a target graph , w.l.o.g. , and are interested in finding a correspondence matrix which minimizes an objective subject to the one-to-one mapping constraints and . As a result, infers an injective mapping which maps each node in to a node in .
Typically, graph matching is formulated as an edge-preserving, quadratic assignment problem (Anstreicher, 2003; Gold & Rangarajan, 1996; Caetano et al., 2009; Cho et al., 2013), i.e.,
(1) |
subject to the one-to-one mapping constraints mentioned above. This formulation is based on the intuition of finding correspondences based on neighborhood consensus (Rocco et al., 2018), which shall prevent adjacent nodes in the source graph from being mapped to different regions in the target graph. Formally, a neighborhood consensus is reached if for all node pairs with , it holds that for every node there exists a node such that .
In this work, we consider the problem of supervised and semi-supervised matching of graphs while employing the intuition of neighborhood consensus as an inductive bias into our model. In the supervised setting, we are given pair-wise ground-truth correspondences for a set of graphs and want our model to generalize to unseen graph pairs. In the semi-supervised setting, source and target graphs are fixed, and ground-truth correspondences are only given for a small subset of nodes. However, we are allowed to make use of the complete graph structures.
In the following, we describe our proposed end-to-end, deep graph matching architecture in detail. See Figure 1 for a high-level illustration. The method consists of two stages: a local feature matching procedure followed by an iterative refinement strategy using synchronous message passing networks. The aim of the feature matching step, see Section 3.1, is to compute initial correspondence scores based on the similarity of local node embeddings. The second step is an iterative refinement strategy, see Sections 3.2 and 3.3, which aims to reach neighborhood consensus for correspondences using a differentiable validator for graph isomorphism. Finally, in Section 3.4, we show how to scale our method to large, real-world inputs.
We model our local feature matching procedure in close analogy to related approaches (Bai et al., 2018, 2019; Wang et al., 2019b; Zhang & Lee, 2019; Wang & Solomon, 2019) by computing similarities between nodes in the source graph and the target graph based on node embeddings. That is, given latent node embeddings and computed by a shared neural network for source graph and target graph , respectively, we obtain initial soft correspondences as
Here, normalization is applied to obtain rectangular doubly-stochastic correspondence matrices that fulfill the constraints and (Sinkhorn & Knopp, 1967; Adams & Zemel, 2011; Cour et al., 2006).
We interpret the
-th row vector
as a discrete distribution over potential correspondences in for each node . We train in a dicriminative, supervised fashion against ground truth correspondences by minimizing the negative log-likelihood of correct correspondence scores .We implement as a Graph Neural Network (GNN) to obtain localized, permutation equivariant vectorial node representations (Bronstein et al., 2017; Hamilton et al., 2017; Battaglia et al., 2018; Goyal & Ferrara, 2018). Formally, a GNN follows a neural message passing scheme (Gilmer et al., 2017) and updates its node features in layer by aggregating localized information via
(2) |
where and denotes a multiset. The recent work in the fields of
geometric deep learning
and relational representation learning provides a large number of operators to choose from (Kipf & Welling, 2017; Gilmer et al., 2017; Veličković et al., 2018; Schlichtkrull et al., 2018; Xu et al., 2019c), which allows for precise control of the properties of extracted features.Due to the purely local nature of the used node embeddings, our feature matching procedure is prone to finding false correspondences which are locally similar to the correct one. Formally, those cases pose a violation of the neighborhood consensus criteria employed in Equation (1). Since finding a global optimum is -hard, we aim to detect violations of the criteria in local neighborhoods and resolve them in an iterative fashion.
We utilize graph neural networks to detect these violations in a neighborhood consensus step and iteratively refine correspondences , , starting from . Key to the proposed algorithm is the following observation: The soft correspondence matrix is a map from the node function space to the node function space . Therefore, we can use to pass node functions , along the soft correspondences by
(3) |
to obtain functions , in the other domain, respectively.
Then, our consensus method works as follows: Using , we first map node indicator functions, given as an injective node coloring
in the form of an identity matrix
, from to . Then, we distribute this coloring in corresponding neighborhoods by performing synchronous message passing on both graphs via a shared graph neural network , i.e.,(4) |
We can compare the results of both GNNs to recover a vector which measures the neighborhood consensus between node pairs . This measure can be used to perform trainable updates of the correspondence scores
(5) |
based on an . The process can be applied times to iteratively improve the consensus in neighborhoods. The final objective with
combines both the feature matching error and neighborhood consensus error. This objective is fully-differentiable and can hence be optimized in an end-to-end fashion using stochastic gradient descent. Overall, the consensus stage distributes global node colorings to resolve ambiguities and false matchings made in the first stage of our architecture by only using purely local operators. Since an initial matching is needed to test for neighborhood consensus, this task cannot be fulfilled by
alone, which stresses the importance of our two-stage approach.The following two theorems show that is a good measure of how well local neighborhoods around and are matched by the soft correspondence between and . The proofs can be found in Appendix B and C, respectively.
Let and be two isomorphic graphs and let be a permutation equivariant GNN, i.e., for any permutation matrix . If encodes an isomorphism between and , then for all .
Let and be two graphs and let be a permutation equivariant and -layered GNN for which both and are injective for all . If , then the resulting submatrix is a permutation matrix describing an isomorphism between the -hop subgraph around and the -hop subgraph around . Moreover, if for all , then denotes a full isomorphism between and .
Hence, a GNN that satisfies both criteria in Theorem 1 and 2 provides equal node embeddings and if and only if nodes in a local neighborhood are correctly matched to each other. A value indicates the existence of inconsistent matchings in the local neighborhoods around and , and can hence be used to refine the correspondence score .
Note that both requirements, permutation equivariance and injectivity, are easily fulfilled: (1) All common graph neural network architectures following the message passing scheme of Equation (2) are equivariant due to the use of permutation invariant neighborhood aggregators. (2) Injectivity of graph neural networks is a heavily discussed topic in recent literature. It can be fulfilled by using a GNN that is as powerful as the Weisfeiler & Lehman (1968)
(WL) heuristic in distinguishing graph structures,
e.g., by using aggregation in combination with s on the multiset of neighboring node features, cf. (Xu et al., 2019c; Morris et al., 2019).Theoretically, we can relate our proposed approach to classical graph matching techniques that consider a doubly-stochastic relaxation of the problem defined in Equation (1), cf. (Lyzinski et al., 2016) and Appendix F for more details. A seminal work following this method is the graduated assignment algorithm (Gold & Rangarajan, 1996). By starting from an initial feasible solution , a new solution is iteratively computed from by approximately solving a linear assignment problem according to
(6) |
where denotes the gradient of Equation (1) at .^{1}^{1}1For clarity of presentation, we closely follow the original formulation of the method for simple graphs but ignore the edge similarities and adapt the constant factor of the gradient according to our objective function. The operator is implemented by applying normalization on rescaled inputs, where the scaling factor grows in every iteration to increasingly encourage integer solutions. Our approach also resembles the approximation of the linear assignment problem via normalization.
Moreover, the gradient is closely related to our neighborhood consensus scheme for the particular simple, non-trainable GNN instantiation . Given and , we obtain by substitution. Instead of updating based on the similarity between and obtained from a fixed-function GNN , we choose to update correspondence scores via trainable neural networks and based on the difference between and . This allows us to interpret our model as a deep parameterized generalization of the graduated assignment algorithm. In addition, specifying node and edge attribute similarities in graph matching is often difficult and complicates its computation (Zhou & De la Torre, 2016; Zhang et al., 2019c), whereas our approach naturally supports continuous node and edge features via established GNN models. We experimentally verify the benefits of using trainable neural networks instead of in Appendix D.
We apply a number of optimizations to our proposed algorithm to make it scale to large input domains. See Algorithm 1 in Appendix A for the final optimized algorithm.
We propose to sparsify initial correspondences by filtering out low score correspondences before neighborhood consensus takes place. That is, we sparsify by computing top correspondences with the help of the KeOps library (Charlier et al., 2019) without ever storing its dense version, reducing its required memory footprint from to . In addition, the time complexity of the refinement phase is reduced from to , where and denote the number of edges in and , respectively. Note that sparsifying initial correspondences assumes that the feature matching procedure ranks the correct correspondence within the top elements for each node . Hence, also optimizing the initial feature matching loss is crucial, and can be further accelerated by training only against sparsified correspondences with ground-truth entries .
Although applying on node indicator functions is computationally efficient, it requires a parameter complexity of . Hence, we propose to replace node indicator functions with randomly drawn node functions , where with , in iteration . By sampling from a continuous distribution, node indicator functions are still guaranteed to be injective (DeGroot & Schervish, 2012). Note that Theorem 1 still holds because it does not impose any restrictions on the function space . Theorem 2 does not necessarily hold anymore, but we expect our refinement strategy to resolve any ambiguities by re-sampling in every iteration . We verify this empirically in Section 4.1.
The normalization fulfills the requirements of rectangular doubly-stochastic solutions. However, it may eventually push correspondences to inconsistent integer solutions very early on from which the neighborhood consensus method cannot effectively recover. Furthermore, it is inherently inefficient to compute and runs the risk of vanishing gradients (Zhang et al., 2019b). Here, we propose to relax this constraint by only applying row-wise normalization on , and expect our supervised refinement procedure to naturally resolve violations of on its own by re-ranking false correspondences via neighborhood consensus. Experimentally, we show that row-wise normalization is sufficient for our algorithm to converge to the correct solution, cf. Section 4.1.
Instead of holding fixed, we propose to differ the number of refinement iterations and , , for training and testing, respectively. This does not only speed up training runtime, but it also encourages the refinement procedure to reach convergence with as few steps as necessary while we can run the refinement procedure until convergence during testing. We show empirically that decreasing does not affect the convergence abilities of our neighborhood consensus procedure during testing, cf. Section 4.1.
We verify our method on three different tasks. We first show the benefits of our approach in an ablation study on synthetic graphs (Section 4.1), and apply it to the real-world tasks of supervised keypoint matching in natural images (Sections 4.2 and 4.3) and semi-supervised cross-lingual knowledge graph alignment (Section 4.4) afterwards. All dataset statistics can be found in Appendix H.
Our method is implemented in PyTorch (Paszke et al., 2017) using the PyTorch Geometric (Fey & Lenssen, 2019) and the KeOps (Charlier et al., 2019) libraries. Our implementation can process sparse mini-batches with parallel GPU acceleration and minimal memory footprint in all algorithm steps. For all experiments, optimization is done via Adam (Kingma & Ba, 2015) with a fixed learning rate of . We use similar architectures for and except that we omit dropout (Srivastava et al., 2014) in . For all experiments, we report Hits@ to evaluate and compare our model to previous lines of work, where Hits@ measures the proportion of correctly matched entities ranked in the top .
In our first experiment, we evaluate our method on synthetic graphs where we aim to learn a matching for pairs of graphs in a supervised fashion. Each pair of graphs consists of an undirected Erdős & Rényi (1959) graph with
nodes and edge probability
, and a target graph which is constructed from by removing edges with probability without disconnecting any nodes (Heimann et al., 2018). Training and evaluation is done on graphs each for different configurations . In Appendix E, we perform additional experiments to also verify the robustness of our approach towards node addition or removal.We implement the graph neural network operators and by stacking three layers () of the GIN operator (Xu et al., 2019c)
(7) |
due to its expressiveness in distinguishing raw graph structures. The number of layers and hidden dimensionality of all s is set to and , respectively, and we apply activation (Glorot et al., 2011)
(Ioffe & Szegedy, 2015)after each of its layers. Input features are initialized with one-hot encodings of node degrees. We employ a
Jumping Knowledge style concatenation (Xu et al., 2018) to compute final node representations . We train and test our procedure with and refinement iterations, respectively.Figures 2(a) and 2(b) show the matching accuracy Hits@1 for different choices of and . We observe that the purely local matching approach via starts decreasing in performance with the structural noise increasing. This also holds when applying global normalization on . However, our proposed two-stage architecture can recover all correspondences, independent of the applied structural noise . This applies to both variants discussed in the previous sections, i.e., our initial formulation , and our optimized architecture using random node indicator sampling and row-wise normalization . This highlights the overall benefits of applying matching consensus and justifies the usage of the enhancements made towards scalability in Section 3.4.
In addition, Figure 2(c) visualizes the test error for varying number of iterations . We observe that even when training to non-convergence, our procedure is still able to converge by increasing the number of iterations during testing.
Moreover, Figure 2(d) shows the performance of our refinement strategy when operating on sparsified top correspondences. In contrast to its dense version, it cannot match all nodes correctly due to the poor initial feature matching quality. However, it consistently converges to the perfect solution of Hits@1 Hits@ in case the correct match is included in the initial top ranking of correspondences. Hence, with increasing , we can recover most of the correct correspondences, making it an excellent option to scale our algorithm to large graphs, cf. Section 4.4.
We perform experiments on the PascalVOC (Everingham et al., 2010) with Berkeley annotations (Bourdev & Malik, 2009) and WILLOW-ObjectClass (Cho et al., 2013) datasets which contain sets of image categories with labeled keypoint locations. For PascalVOC, we follow the experimental setups of Zanfir & Sminchisescu (2018) and Wang et al. (2019b) and use the training and test splits provided by Choy et al. (2016). We pre-filter the dataset to exclude difficult, occluded and truncated objects, and require examples to have at least one keypoint, resulting in and annotated images for training and testing, respectively. The PascalVOC dataset contains instances of varying scale, pose and illumination, and the number of keypoints ranges from to . In contrast, the WILLOW-ObjectClass dataset contains at least 40 images with consistent orientations for each of its five categories, and each image consists of exactly 10 keypoints. Following the experimental setup of peer methods (Cho et al., 2013; Wang et al., 2019b), we pre-train our model on PascalVOC and fine-tune it over 20 random splits with 20 per-class images used for training. We construct graphs via the Delaunay triangulation of keypoints. For fair comparison with Zanfir & Sminchisescu (2018) and Wang et al. (2019b), input features of keypoints are given by the concatenated output of relu4_2 and relu5_1 of a pre-trained VGG16 (Simonyan & Zisserman, 2014) on ImageNet (Deng et al., 2009).
We adopt SplineCNN (Fey et al., 2018) as our graph neural network operator
(8) |
whose trainable B-spline based kernel function is conditioned on edge features between node-pairs. To align our results with the related work, we evaluate both isotropic and anisotropic edge features which are given as normalized relative distances and 2D Cartesian coordinates, respectively. For SplineCNN, we use a kernel size of in each dimension, a hidden dimensionality of , and apply as our non-linearity function . Our network architecture consists of two convolutional layers (), followed by dropout with probability , and a final linear layer. During training, we form pairs between any two training examples of the same category, and evaluate our model by sampling a fixed number of test graph pairs belonging to the same category.
Method | Aero | Bike | Bird | Boat | Bottle | Bus | Car | Cat | Chair | Cow | Table | Dog | Horse | M-Bike | Person | Plant | Sheep | Sofa | Train | TV | Mean | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GMN | 31.1 | 46.2 | 58.2 | 45.9 | 70.6 | 76.5 | 61.2 | 61.7 | 35.5 | 53.7 | 58.9 | 57.5 | 56.9 | 49.3 | 34.1 | 77.5 | 57.1 | 53.6 | 83.2 | 88.6 | 57.9 | |
PCA-GM | 40.9 | 55.0 | 65.8 | 47.9 | 76.9 | 77.9 | 63.5 | 67.4 | 33.7 | 66.5 | 63.6 | 61.3 | 58.9 | 62.8 | 44.9 | 77.5 | 67.4 | 57.5 | 86.7 | 90.9 | 63.8 | |
isotropic |
34.7 | 42.6 | 41.5 | 50.4 | 50.3 | 72.2 | 60.1 | 59.4 | 24.6 | 38.1 | 86.2 | 47.7 | 56.3 | 37.6 | 35.4 | 58.0 | 45.8 | 74.8 | 64.1 | 75.3 | 52.8 | |
45.8 | 58.2 | 45.5 | 57.6 | 68.2 | 82.1 | 75.3 | 60.2 | 31.7 | 52.9 | 88.2 | 56.2 | 68.2 | 50.7 | 46.5 | 66.3 | 58.8 | 89.0 | 85.1 | 79.9 | 63.3 | ||
45.3 | 57.1 | 54.9 | 54.7 | 71.7 | 82.6 | 75.3 | 65.9 | 31.6 | 50.8 | 86.1 | 56.9 | 67.1 | 53.1 | 49.2 | 77.3 | 59.2 | 91.7 | 82.0 | 84.2 | 64.8 | ||
isotropic |
44.3 | 62.0 | 48.4 | 53.9 | 73.3 | 80.4 | 72.2 | 64.2 | 30.3 | 52.7 | 79.4 | 56.6 | 62.3 | 56.2 | 47.5 | 74.0 | 59.8 | 79.9 | 81.9 | 83.0 | 63.1 | |
46.5 | 63.7 | 54.9 | 60.9 | 79.4 | 84.1 | 76.4 | 68.3 | 38.5 | 61.5 | 80.6 | 59.7 | 69.8 | 58.4 | 54.3 | 76.4 | 64.5 | 95.7 | 87.9 | 81.3 | 68.1 | ||
50.1 | 65.4 | 55.7 | 65.3 | 80.0 | 83.5 | 78.3 | 69.7 | 34.7 | 60.7 | 70.4 | 59.9 | 70.0 | 62.2 | 56.1 | 80.2 | 70.3 | 88.8 | 81.1 | 84.3 | 68.3 | ||
anisotropic |
34.3 | 45.9 | 37.3 | 47.7 | 53.3 | 75.2 | 64.5 | 61.7 | 27.7 | 40.5 | 85.9 | 46.6 | 50.2 | 39.0 | 37.3 | 58.0 | 49.2 | 82.9 | 65.0 | 74.2 | 53.8 | |
44.6 | 51.2 | 50.7 | 58.5 | 72.3 | 83.3 | 76.6 | 65.6 | 31.0 | 57.5 | 91.7 | 55.4 | 69.5 | 56.2 | 47.5 | 85.1 | 57.9 | 92.3 | 86.7 | 85.9 | 66.0 | ||
48.7 | 57.2 | 47.0 | 65.3 | 73.9 | 87.6 | 76.7 | 70.0 | 30.0 | 55.5 | 92.8 | 59.5 | 67.9 | 56.9 | 48.7 | 87.2 | 58.3 | 94.9 | 87.9 | 86.0 | 67.6 | ||
anisotropic |
42.1 | 57.5 | 49.6 | 59.4 | 83.8 | 84.0 | 78.4 | 67.5 | 37.3 | 60.4 | 85.0 | 58.0 | 66.0 | 54.1 | 52.6 | 93.9 | 60.2 | 85.6 | 87.8 | 82.5 | 67.3 | |
45.5 | 67.6 | 56.5 | 66.8 | 86.9 | 85.2 | 84.2 | 73.0 | 43.6 | 66.0 | 92.3 | 64.0 | 79.8 | 56.6 | 56.1 | 95.4 | 64.4 | 95.0 | 91.3 | 86.3 | 72.8 | ||
47.0 | 65.7 | 56.8 | 67.6 | 86.9 | 87.7 | 85.3 | 72.6 | 42.9 | 69.1 | 84.5 | 63.8 | 78.1 | 55.6 | 58.4 | 98.0 | 68.4 | 92.2 | 94.5 | 85.5 | 73.0 |
Method | Face | Motorbike | Car | Duck | Winebottle | ||
---|---|---|---|---|---|---|---|
GMN (Zanfir & Sminchisescu, 2018) | 99.3 | 71.4 | 74.3 | 82.8 | 76.7 | ||
PCA-GM (Wang et al., 2019b) | 100.0 | 76.7 | 84.0 | 93.5 | 96.9 | ||
isotropic | 98.07 0.79 | 48.97 4.62 | 65.30 3.16 | 66.02 2.51 | 77.72 3.32 | ||
100.00 0.00 | 67.28 4.93 | 85.07 3.93 | 83.10 3.61 | 92.30 2.11 | |||
100.00 0.00 | 68.57 3.94 | 82.75 5.77 | 84.18 4.15 | 90.36 2.42 | |||
isotropic | 99.62 0.28 | 73.47 3.32 | 77.47 4.92 | 77.10 3.25 | 88.04 1.38 | ||
100.00 0.00 | 92.05 3.49 | 90.05 5.10 | 88.98 2.75 | 97.14 1.41 | |||
100.00 0.00 | 92.05 3.24 | 90.28 4.67 | 88.97 3.49 | 97.14 1.83 | |||
anisotropic | 98.47 0.61 | 49.28 4.31 | 64.95 3.52 | 66.17 4.08 | 78.08 2.61 | ||
100.00 0.00 | 76.28 4.77 | 86.70 3.25 | 83.22 3.52 | 93.65 1.64 | |||
100.00 0.00 | 76.57 5.28 | 89.00 3.88 | 84.78 2.73 | 95.29 2.22 | |||
anisotropic | 99.96 0.06 | 91.90 2.30 | 91.28 4.89 | 86.58 2.99 | 98.25 0.71 | ||
100.00 0.00 | 98.80 1.58 | 96.53 1.55 | 93.22 3.77 | 99.87 0.31 | |||
100.00 0.00 | 99.40 0.80 | 95.53 2.93 | 93.00 2.71 | 99.39 0.70 |
Hits@1 (%) with standard deviations on the
WILLOW-ObjectClass dataset.We follow the experimental setup of Wang et al. (2019b) and train our models using negative log-likelihood due to its superior performance in contrast to the displacement loss used in Zanfir & Sminchisescu (2018). We evaluate our complete architecture using isotropic and anisotropic GNNs for , and include ablation results obtained from using for the local node matching procedure. Results of Hits@1 are shown in Table 1 and 2 for PascalVOC and WILLOW-ObjectClass, respectively. We visualize qualitative results of our method in Appendix I.
We observe that our refinement strategy is able to significantly outperform competing methods as well as our non-refined baselines. On the WILLOW-ObjectClass dataset, our refinement stage at least reduces the error of the initial model () by half across all categories. The benefits of the second stage are even more crucial when starting from a weaker initial feature matching baseline (), with overall improvements of up to percentage points on PascalVOC. However, good initial matchings do help our consensus stage to improve its performance further, as indicated by the usage of task-specific isotropic or anisotropic GNNs for .
We also verify our approach by tackling the geometric feature matching problem, where we only make use of point coordinates and no additional visual features are available. Here, we follow the experimental training setup of Zhang & Lee (2019), and test the generalization capabilities of our model on the PascalPF dataset (Ham et al., 2016). For training, we generate a synthetic set of graph pairs: We first randomly sample 30–60 source points uniformly from , and add Gaussian noise from
to these points to obtain the target points. Furthermore, we add 0–20 outliers from
to each point cloud. Finally, we construct graphs by connecting each node with its -nearest neighbors (). We train our unmodified anisotropic keypoint architecture from Section 4.2 with input until it has seen synthetic examples.We evaluate our trained model on the PascalPF dataset (Ham et al., 2016) which consists of image pairs within 20 classes, with the number of keypoints ranging from 4 to 17. Results of Hits@1 are shown in Table 3. Overall, our consensus architecture improves upon the state-of-the-art results of Zhang & Lee (2019) on almost all categories while our baseline is weaker than the results reported in Zhang & Lee (2019), showing the benefits of applying our consensus stage. In addition, it shows that our method works also well even when not taking any visual information into account.
Method | Aero | Bike | Bird | Boat | Bottle | Bus | Car | Cat | Chair | Cow | Table | Dog | Horse | M-Bike | Person | Plant | Sheep | Sofa | Train | TV | Mean | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(Zhang & Lee, 2019) | 76.1 | 89.8 | 93.4 | 96.4 | 96.2 | 97.1 | 94.6 | 82.8 | 89.3 | 96.7 | 89.7 | 79.5 | 82.6 | 83.5 | 72.8 | 76.7 | 77.1 | 97.3 | 98.2 | 99.5 | 88.5 | |
Ours | 69.2 | 87.7 | 77.3 | 90.4 | 98.7 | 98.3 | 92.5 | 91.6 | 94.7 | 79.4 | 95.8 | 90.1 | 80.0 | 79.5 | 72.5 | 98.0 | 76.5 | 89.6 | 93.4 | 97.8 | 87.6 | |
81.3 | 92.2 | 94.2 | 98.8 | 99.3 | 99.1 | 98.6 | 98.2 | 99.6 | 94.1 | 100.0 | 99.4 | 86.6 | 86.6 | 88.7 | 100.0 | 100.0 | 100.0 | 100.0 | 99.3 | 95.8 | ||
81.1 | 92.0 | 94.7 | 100.0 | 99.3 | 99.3 | 98.9 | 97.3 | 99.4 | 93.4 | 100.0 | 99.1 | 86.3 | 86.2 | 87.7 | 100.0 | 100.0 | 100.0 | 100.0 | 99.3 | 95.7 |
We evaluate our model on the DBP15K datasets (Sun et al., 2017) which link entities of the Chinese, Japanese and French knowledge graphs of DBpedia into the English version and vice versa. Each dataset contains exactly links between equivalent entities, and we split those links into training and testing following upon previous works. For obtaining entity input features, we follow the experimental setup of Xu et al. (2019d): We retrieve monolingual fastText embeddings (Bojanowski et al., 2017) for each language separately, and align those into the same vector space afterwards (Lample et al., 2018). We use the sum of word embeddings as the final entity input representation (although more sophisticated approaches are just as conceivable).
Our graph neural network operator mostly matches the one proposed in Xu et al. (2019d) where the direction of edges is retained, but not their specific relation type:
(9) |
We use followed by dropout with probability as our non-linearity , and obtain final node representations via . We use a three-layer GNN () both for obtaining initial similarities and for refining alignments with dimensionality and , respectively. Training is performed using negative log likelihood in a semi-supervised fashion: For each training node in , we train sparsely by using the corresponding ground-truth node in , the top entries in and randomly sampled entities in . For the refinement phase, we update the sparse top correspondence matrix times. For efficiency reasons, we train and sequentially for epochs each.
Method | ZHEN | ENZH | JAEN | ENJA | FREN | ENFR | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@1 | @10 | @1 | @10 | @1 | @10 | @1 | @10 | @1 | @10 | @1 | @10 | ||
GCN (Wang et al., 2018) | 41.25 | 74.38 | 36.49 | 69.94 | 39.91 | 74.46 | 38.42 | 71.81 | 37.29 | 74.49 | 36.77 | 73.06 | |
BootEA (Sun et al., 2018) | 62.94 | 84.75 | 62.23 | 85.39 | 65.30 | 87.44 | |||||||
MuGNN (Cao et al., 2019) | 49.40 | 84.40 | 50.10 | 85.70 | 49.60 | 87.00 | |||||||
NAEA (Zhu et al., 2019) | 65.01 | 86.73 | 64.14 | 87.27 | 67.32 | 89.43 | |||||||
RDGCN (Wu et al., 2019) | 70.75 | 84.55 | 76.74 | 89.54 | 88.64 | 95.72 | |||||||
GMNN (Xu et al., 2019d) | 67.93 | 78.48 | 65.28 | 79.64 | 73.97 | 87.15 | 71.29 | 84.63 | 89.38 | 95.25 | 88.18 | 94.75 | |
58.53 | 78.04 | 54.99 | 74.25 | 59.18 | 79.16 | 55.40 | 75.53 | 76.07 | 91.54 | 74.89 | 90.57 | ||
Ours (sparse) | 67.59 | 87.47 | 64.38 | 83.56 | 71.95 | 89.74 | 68.88 | 86.84 | 83.36 | 96.03 | 82.16 | 95.28 | |
80.12 | 87.47 | 76.77 | 83.56 | 84.80 | 89.74 | 81.09 | 86.84 | 93.34 | 96.03 | 91.95 | 95.28 |
We report Hits@1 and Hits@10 to evaluate and compare our model to previous lines of work, see Table 4. In addition, we report results of a simple three-layer which matches nodes purely based on initial word embeddings, and a variant of our model without the refinement of initial correspondences (). Our approach improves upon the state-of-the-art on all categories with gains of up to percentage points. In addition, our refinement strategy consistently improves upon the Hits@1 of initial correspondences by a significant margin, while results of Hits@10 are shared due to the refinement operating only on sparsified top initial correspondences. Due to the scalability of our approach, we can easily apply a multitude of refinement iterations while still retaining large hidden feature dimensionalities.
Our experimental results demonstrate that the proposed approach effectively solves challenging real-world problems. However, the expressive power of GNNs is closely related to the WL heuristic for graph isomorphism testing (Xu et al., 2019c; Morris et al., 2019), whose power and limitations are well understood (Arvind et al., 2015). Our method generally inherits these limitations. Hence, one possible limitation is that whenever two nodes are assigned the same color by WL, our approach may fail to converge to one of the possible solutions. For example, there may exist two nodes with equal neighborhood sets . One can easily see that the feature matching procedure generates equal initial correspondence distributions , resulting in the same mapped node indicator functions from to nodes and , respectively. Since both nodes share the same neighborhood, also produces the same distributed functions . As a result, both column vectors and receive the same update, leading to non-convergence. In theory, one might resolve these ambiguities by adding a small amount of noise to . However, the general amount of feature noise present in real-world datasets already ensures that this scenario is unlikely to occur.
Identifying correspondences between the nodes of two graphs has been studied in various domains and an extensive body of literature exists. Closely related problems are summarized under the terms maximum common subgraph (Kriege et al., 2019b), network alignment (Zhang, 2016), graph edit distance (Chen et al., 2019) and graph matching (Yan et al., 2016). We refer the reader to the Appendix F for a detailed discussion of the related work on these problems. Recently, graph neural networks have become a focus of research leading to various proposed deep graph matching techniques (Wang et al., 2019b; Zhang & Lee, 2019; Xu et al., 2019d; Derr et al., 2019). In Appendix G, we present a detailed overview of the related work in this field while highlighting individual differences and similarities to our proposed graph matching consensus procedure.
We presented a two-stage neural architecture for learning node correspondences between graphs in a supervised or semi-supervised fashion. Our approach is aimed towards reaching a neighborhood consensus between matchings, and can resolve violations of this criteria in an iterative fashion. In addition, we proposed enhancements to let our algorithm scale to large input domains. We evaluated our architecture on real-world datasets on which it consistently improved upon the state-of-the-art.
This work has been supported by the German Research Association (DFG) within the Collaborative Research Center SFB 876 Providing Information by Resource-Constrained Analysis, projects A6 and B2.
International Journal of Pattern Recognition and Artificial Intelligence
, 18, 2004.Learning edit cost estimation models for graph edit distance.
Pattern Recognition Letters, 125, 2019.Self-supervised learning of dense shape correspondence.
In CVPR, 2019.New binary linear programming formulation to compute the graph edit distance.
Pattern Recognition, 72, 2017.Local grayvalue invariants for image retrieval.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 1997.Unpaired image-to-image translation using cycle-consistent adversarial networks.
In ICCV, 2017.Our final optimized algorithm is given in Algorithm 1:
Since is permutation equivariant, it holds for any node feature matrix that . With and , it follows that
Hence, it shows that for any node , resulting in . ∎
Let be . Then, the -layered GNN maps both -hop neighborhoods around nodes and to the same vectorial representation: