Source code and datasets for IJCAI 2019 paper: Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs.
Entity alignment is the task of linking entities with the same real-world identity from different knowledge graphs (KGs), which has been recently dominated by embedding-based methods. Such approaches work by learning KG representations so that entity alignment can be performed by measuring the similarities between entity embeddings. While promising, prior works in the field often fail to properly capture complex relation information that commonly exists in multi-relational KGs, leaving much room for improvement. In this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to incorporate relation information via attentive interactions between the knowledge graph and its dual relation counterpart, and further capture neighboring structures to learn better entity representations. Experiments on three real-world cross-lingual datasets show that our approach delivers better and more robust results over the state-of-the-art alignment methods by learning better KG representations.READ FULL TEXT VIEW PDF
Entity alignment is a viable means for integrating heterogeneous knowled...
Learning knowledge graph (KG) embeddings has received increasing attenti...
We study the problem of embedding-based entity alignment between knowled...
Research on link prediction in knowledge graphs has mainly focused on st...
In this work, we focus on the problem of entity alignment in Knowledge G...
The Entity Linking (EL) approaches have been a long-standing research fi...
Representation Learning of words and Knowledge Graphs (KG) into low
Source code and datasets for IJCAI 2019 paper: Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs.
Knowledge graphs (KGs) are the building blocks for various AI applications like question-answering [Zhang et al.2018], text classification [Wang et al.2016], recommendation systems [Zhang et al.2016], etc. Knowledge in KGs is usually organized into triples of head entity, relation, tail entity
. There are considerable works on knowledge representation learning to construct distributed representations for both entities and relations. Exemplary works are the so calledtrans-family methods like TransE [Bordes et al.2013], TransH [Wang et al.2014], and PTransE [Lin et al.2015], which interpret a relation as the translation operating on the embeddings of its head entity and tail entity.
However, KGs are usually incomplete, and different KGs are often complementary to each other. This makes a compelling case to design a technique that can integrate heterogeneous knowledge among different KGs. An effective way for doing this is Entity Alignment. There have been existing efforts devoted to embed different KGs towards entity alignment. Most of them, like JE [Hao et al.2016], MTransE [Chen et al.2017], JAPE [Sun et al.2017], IPTransE [Zhu et al.2017] and BootEA [Sun et al.2018], rely on trans-family models to learn entity representations according to a set of prior alignments. The most recent work [Wang et al.2018], takes a different approach by utilizing the Graph Convolutional Networks (GCNs) [Kipf and Welling2017] to jointly represent multiple KG entities, showing a new, promising direction for entity alignment.
Compared with conventional feature based methods [Sarasua et al.2012, Mahdisoltani et al.2013], embedding-based methods have the advantage of requiring less human involvement in feature construction and can be scaled to large KGs. However, there are still several hurdles that prevent wider adoption of embedding-based approaches. First, as mentioned above, most existing methods use trans-family models as the backbone to embed KGs, which are constrained by the assumption . This strong assumption makes it inefficient for the model to capture more complex relation information in multi-relational graphs.
As a motivation example, Figure 1 shows a real-world example from the DBP15K [Sun et al.2017] dataset. Prior study [Li et al.2018b] shows that trans-family methods cannot capture the triangular structures depicted in the diagram. For instance, for the structure of Figure 1(a), TransE requires the three formulas , and to hold at the same time. However, to satisfy the former two equations, we would have , which is contradictory to the third equation . Accordingly, the alignment performance will inevitably be compromised if the KG representations are learned with the trans-family, since more complex structures such as triangular ones frequently appear in multi-relational graphs.
The GCN-based model [Wang et al.2018] represents a leap forward for embedding-based entity alignment. However, this approach is also unable to properly model relation information. Since the vanilla GCN operates on the undirected and unlabeled graphs, a GCN-based model would ignore the useful relation information of KGs. Although the Relational Graph Convolutional Networks (R-GCNs) [Schlichtkrull et al.2018] could be used to model multi-relational graphs, an R-GCN simply employs one weight matrix for each relation and would require an excessive set of parameters for real-world KGs that often contain thousands of relations. This drawback makes it difficult to learn an effective R-GCN model. Dual-Primal Graph CNN (DPGCNN) [Monti et al.2018] offers a new solution for the problem. DPGCNN alternates convolution operations on the graph and its dual graph, whose vertices correspond to the edges of the original graph, and iteratively applies a graph attention mechanism to enhance primal edge representations using its dual graph. Compared with GCNs and R-GCNs, the DPGCNN can better explore complex edge structures and produce better KG representations.
Inspired by the DPGCNN, in this paper, we propose a novel Relation-aware Dual-Graph Convolutional Network (RDGCN) to tackle the challenge of proper capturing and integration for relation information. While the DPGCNN serves a good starting point, applying it to learn KG representations is not trivial. Doing so requires us to find a way to better approximate relation representations and characterize the relationship between different KG relations. We address this by extending the DPGCNN to develop a weighted model, and explore the head/tail representations initialized with entity names as a proxy to capture relation information without excessive model parameters that are often hard to train.
As a departure from GCNs and R-GCNs, our RDGCN allows multiple rounds of interactions between the primal entity graph and its dual relation graph, enabling the model to effectively incorporate more complex relation information into entity representations. To further integrate neighboring structural information, we also extend GCNs with highway gates.
We evaluate our RDGCN on three real-world datasets. Experimental results show that RDGCN can effectively address the challenges mentioned above and significantly outperforms 6 recently proposed approaches on all datasets. The key contribution of this work is a novel DPGCNN-based model for learning robust KG representations. Our work is the first to extend DPGCNN for entity alignment, which yields significantly better performance over the state-of-the-art alternatives.
Recently, there has been an increasing interest in extending neural networks to deal with graphs. There have been many encouraging works which are often categorized as spectral approaches[Bruna et al.2014, Defferrard et al.2016, Kipf and Welling2017] and spatial approaches [Atwood and Towsley2016, Hamilton et al.2017, Veličković et al.2018]. The GCNs [Kipf and Welling2017]
have recently emerged as a powerful deep learning-based approach for many NLP tasks like semantic role labeling[Marcheggiani and Titov2017]Bastings et al.2017]. Furthermore, as an extension of GCNs, the R-GCNs [Schlichtkrull et al.2018] have recently been proposed to model relational data and have been successfully exploited in link prediction and entity classification. Recently, the graph attention networks (GATs) [Veličković et al.2018] have been proposed and achieved state-of-the-art performance. The DPGCNN [Monti et al.2018] discussed in Section 1 generalizes GAT model and achieves better performance on vertex classification, link prediction, and graph-guided matrix completion tasks.
Inspired by the capability of DPGCNN on determining neighborhood-aware edge features, we propose the first relation-aware multi-graph learning framework for entity alignment.
Previous approaches of entity alignment usually require intensive expert participation [Sarasua et al.2012] to design model features [Mahdisoltani et al.2013] or an external source contributed by other users [Wang et al.2017]. Recently, embedding-based methods [Hao et al.2016, Chen et al.2017, Sun et al.2017, Zhu et al.2017, Sun et al.2018, Wang et al.2018] have been proposed to address this issue. In addition, NTAM [Li et al.2018a] is a non-translational approach that utilizes a probabilistic model for the alignment task. KDCoE [Chen et al.2018]
is a semi-supervised learning approach for co-training multilingual KG embeddings and the embeddings of entity descriptions.
As a departure from prior work, our approach directly models the relation information by constructing the dual relation graph. As we will show later in the paper, doing so improves the learned entity embeddings which in turn lead to more accurate alignment.
Formally, a KG is represented as , where are the sets of entities, relations and triples, respectively. Let and be two heterogeneous KGs to be aligned. That is, an entity in may have its counterpart in in a different language or in different surface names. As a starting point, we can collect a small number of equivalent entity pairs between and as the alignment seeds . We define the entity alignment task as automatically finding more equivalent entities using the alignment seeds. Those known aligned entity pairs can be used as training data.
In order to better incorporate relation information to the entity representations, given the input KG (i.e., the primal graph), we first construct its dual relation graph whose vertices denote the relations in the original primal graph, and then, we utilize a graph attention mechanism to encourage interactions between the dual relation graph and the primal graph. The resulting vertex representations in primal graph are then fed to GCN [Kipf and Welling2017] layers with highway gates to capture the neighboring structural information. The final entity representations will be used to determine whether two entities should be aligned. Figure 2 provides an overview architecture of our model.
Without loss of generality, we put and together as the primal graph , where the vertex set is the union of all entities in and , and the edge set is the union of all edges/triples in and . Note that we do not connect the alignment seeds in , thus and are disconnected in .
Given the primal graph , its dual relation graph is constructed as follows: 1) for each type of relation in , there will be a vertex in , thus ; 2) if two relations, and , share the same head or tail entities in , then we create an edge in connecting and .
Different from the original design of dual graph, here we expect the dual relation graph to be more expressive about the relationship between different s in . We thus weight each edge in with a weight according to how likely the two relations and share similar heads or tails in , computed as:
where and are the sets of head and tail entities for relation in respectively. Here, the overhead for constructing the dual graph is proportional to the number of relation types in the primal graph. In our cases, it takes less than two minutes to construct the graphs for each evaluation dataset.
Our goal of introducing dual relation graph is to better incorporate relation information into the primal graph representations. To this end, we propose to apply a graph attention mechanism (GAT) to obtain vertex representations for the dual relation graph and the primal graph iteratively, where the attention mechanism helps to prompt interactions between the two graphs. Each dual-primal interaction contains two layers, the dual attention layer and the primal attention layer. Note that we can stack multiple interactions for mutual improvement on both graphs.
Let denote the input dual vertex representation matrix, where each row corresponds to a vertex in the dual relation graph . Different from the vanilla GAT [Veličković et al.2018], we compute the dual attention scores using the primal vertex features (computed by Eq. 8) produced by the primal attention layer from previous interaction module:
where denotes the -dimensional output representation at dual vertex (corresponding to relation ); denotes the dual representation of vertex ; is the set of neighbor indices of ; is the dual attention score; is a fully connected layer mapping the -dimensional input into a scalar;is the Leaky ReLU; is the concatenation operation; is the relation representation for relation in obtained from the previous primal attention layer.
Note that within our graph embedding based framework, we are not able to provide relation representations directly, due to limited training data. We thus approximate the relation representation for by concatenating its averaged head and tail entity representations in as:
where and are the output representations of the -th head entity and -th tail entity of relation from the previous primal attention layer.
A special case is when the current dual attention layer is the first layer of our model, we do not have in Eq. 3 produced by the previous dual attention layer, therefore, use an initial dual vertex representation produced by Eq. 5 with the initial primal vertex representations . Similarly, will be obtained with the initial primal as well.
In this layer, when applying GAT on the primal graph, we can compute the primal attention scores using the dual vertex representations in , which actually correspond to the relations in the primal graph . In this way, we are able to influence the primal vertex embeddings using the relation representations produced by the dual attention layer.
Specifically, we use to denote the input primal vertex representation matrix. For an entity in primal graph , its representation can be computed by:
where denotes the dual representation for (the relation between entity and ) obtained from ; is the primal attention score; is the set of neighbor indices of entity in ; is a fully connected layer mapping the -dimensional input into a scalar and is the primal layer activation function.
In our model, the initial representation matrix for the primal vertices, , can be initialized using entity names, which provide important evidence for entity alignment. We therefore preserve the evidence explicitly by mixing the initial representations with the output of primal attention layer:
where denotes the final output representation of the interaction module for entity in ; is a weighting parameter for the -th primal attention layer.
After multiple rounds of interaction between the dual relation graph and the primal graph, we are able to collect relation-aware entity representations from the primal graph. Next, we apply two-layer GCNs [Kipf and Welling2017] with highway gates to the resulting primal graph to further incorporating evidence from their neighboring structures.
In each GCN layer with entity representations as input, the output representations can be computed as:
where is the adjacency matrix of the primal graph with added self-connections and
is an identity matrix;and is a layer-specific trainable weight matrix; is the activation function ReLU. We treat as an undirected graph when constructing , in order to allow the information to flow in both directions.
In addition, to control the noise accumulated across layers and preserve useful relation information learned from interactions, following the method described in [Rahimi et al.2018], we introduce layer-wise gates between GCN layers, which is similar in spirit to the highway networks [Srivastava et al.2015]:
where is the input to layer ;
is a sigmoid function;is element-wise multiplication; and
are the weight matrix and bias vector for the transform gate.
With the final entity representations collected from the output of GCN layers, entity alignment can be performed by simply measuring the distance between two entities. Specifically, the distance, , between two entities, from and from can be calculated as:
For training, we expect the distance between aligned entity pairs to be as close as possible, and the distance between negative entity pairs to be as far as possible. We thus utilize a margin-based scoring function as the training objective:
where is a margin hyper-parameter; is our alignment seeds and is the set of negative instances.
Rather than random sampling, we look for challenging negative samples to train our model. Given a positive aligned pair , we choose the -nearest entities of (or ) according to Eq. 12 in the embedding space to replace (or ) as the negative instances.
We evaluate our approach on three large-scale cross-lingual datasets from DBP15K [Sun et al.2017]. These datasets are built upon Chinese, English, Japanese and French versions of DBpedia. Each dataset contains data from two KGs in different languages and provides 15K pre-aligned entity pairs. Table 1 gives the statistics of the datasets. We use the same training/testing split with previous works [Sun et al.2018], 30% for training and 70% for testing. Our source code and datasets are freely available online 222https://github.com/StephanieWyt/RDGCN.
We compare our approach against 6 more recent alignment methods that we have mentioned in Section 1: JE [Hao et al.2016], MTransE [Chen et al.2017], JAPE [Sun et al.2017], IPTransE [Zhu et al.2017], BootEA [Sun et al.2018] and GCN [Wang et al.2018], where the BootEA achieves the best performance on DBP15K.
To evaluate different components of our model, we provide four implementation variants of RDGCN for ablation studies, including (1) GCN-s: a two-layered GCN with entity name initialization but no highway gates; (2) R-GCN-s: a two-layered R-GCN [Schlichtkrull et al.2018] with entity name initialization; (3) HGCN-s: a two-layered GCN with entity name initialization and highway gates; (4) RD: an implementation of two dual-primal interaction modules, but without the subsequent GCN layers.
The configuration we used is: , , and
. The dimensions of hidden representations in dual and primal attention layers are, , and . All dimensions of hidden representations in GCN layers are 300. The learning rate is set to 0.001 and we sample
negative pairs every 10 epochs. In order to utilize entity names in different KGs for better initialization, we use Google Translate to translate Chinese, Japanese, and French entity names into English, and then use pre-trained English word vectorsglove.840B.300d 333http://nlp.stanford.edu/projects/glove/ to construct the input entity representations for the primal graph. Note that Google Translate can not guarantee accurate translations for named entities without any context. We manually check 100 English translations for Japanese/Chinese entity names, and find around 20% of English translations as incorrect, posing further challenges for our model.
Table 2 shows the performance of all compared approaches on the evaluation datasets. By using a bootstrapping process to iteratively explore many unlabeled data, BootEA gives the best Hits@10 score on DBP15K and clearly outperforms GCN and other translation-based models. It is not surprising that GCN outperforms most translation-based models, i.e., JE, MTransE, JAPE and IPTransE. By performing graph convolution over an entity’s neighbors, GCN is able to capture more structural characteristics of knowledge graphs, especially when using more GCN layers, while the translation assumption in translation-based models focuses more on the relationship among heads, tails and relations.
We observe that RDGCN gives the best performance across all metrics and datasets, except for Hits@10 on DBP15K where the performance of RDGCN is second to BootEA with a marginally lower score (84.55 vs 84.75). While BootEA serves a strong baseline by showing what can be achieved by exploiting many unlabeled data, our RDGCN has the advantage of requiring less prior alignment data to learn better representations. We believe that a bootstrapping process can further improve the performance of RDGCN, and we leave this for future work. Later in Section 6.3, we show that RDGCN maintains consistent performance and significantly outperforms BootEA when the training dataset size is reduced. The good performance of RDGCN is largely attributed to its capability for learning relation-aware embeddings.
As shown in Table 2, GCN-s considerably improves GCN in all datasets, resulting in a 17.2% increase on Hits@1 on DBP15K. As mentioned in Section 5, the three cross-lingual datasets require us to handle cross-lingual data through rough machine translations, which is likely to introduce lots of noise (80% accuracy in our pilot study). But our improvement over GCN shows that although noisy in nature, those rough translations can still provide useful evidence to capture, thus should not be ignored.
R-GCN is an extension of GCN by explicitly modeling the KG relations, but in our experiments, we observe that GCN-s achieves better performance than R-GCN-s on all datasets. As discussed in Section 1, R-GCN usually requires much more training data to learn an effective model due to its large number of parameters, and the available training data in our evaluation might not be sufficient for fully unlocking the potential of R-GCN.
Comparing HGCN-s with GCN-s, we can see that HGCN-s greatly boosts the performance of GCN-s after employing the layer-wise highway gates, e.g., over 30% improvement of Hits@1 on DBP15K. This is mainly due to their capability of preventing noisy vertices from driving the KG representations.
When comparing HGCN-s with RDGCN, we can see that the dual-primal interaction modules are crucial to the performance: removing the dual and primal attention layers leads to a drop of 1.1% on Hits@1 and 2.02% on Hits@10 on DBP15K. The interaction modules explore the relation characteristics of KGs by introducing the approximate relation information and fully integrate the relation and entity information after multiple interactions between the dual relation graph and the primal graph. The results show that effective modeling and use of relation information is beneficial for entity alignment.
Comparing RD with RDGCN, there is a significant drop in performance when removing the GCN layers from our model, e.g., the Hits@1 of RD and RDGCN differ by 8.94% on DBP15K. This is not surprising, because the dual-primal graph interactions are designed to integrate KG relation information, while the GCN layers can effectively capture the neighboring structural information of KGs. These two key components are, to some extent, complementary to each other, and should be combined together to learn better relation-aware representations.
Figure 3(d) shows the performance of RDGCN and BootEA, the state-of-the-art alignment model, on the testing instances with triangular structures. We can see that the alignment accuracy of our RDGCN for entities with triangular structures is significantly higher than that of BootEA in all three datasets, showing that RDGCN can better deal with the complex relation information.
We further compare our RDGCN with BootEA by varying the proportion of pre-aligned entities from 10% to 40% with a step of 10%. As expected, the results of both models on all three datasets gradually improve with an increased amount of prior alignment information. According to Figure 3(a-c), our RDGCN consistently outperforms BootEA, and seems to be insensitive to the proportion of prior alignments. When only using 10% of the pre-aligned entity pairs as training data, RDGCN still achieves promising results. For example, RDGCN using 10% of prior alignments achieves 86.35% for Hits@1 on DBP15K. This result translates to a 17.79% higher Hits@1 score over BootEA when BootEA uses 40% of prior alignments. These results further confirm the robustness of our model, especially with limited prior alignments.
Figure 4 shows an example in DBP15K and the target entity pair, and , should not be aligned. The competitive translation-based models, including BootEA, give lower distance scores for and , suggesting that these two entities should be aligned. This is because those models fail to address the specific relation information associated with the three aligned neighboring entities. For this example, both and indicate the person Chiang_Ching-kuo, but has the relation parents with , while has the relation children with . Utilizing such information, a better alignment model should produce a larger distance score for the two entities despite they have similar neighbors. By carefully considering the relation information during the dual-primal interactions, our RDGCN gives a larger distance score, leading to the correct alignment result.
This paper presents a novel Relation-aware Dual-Graph Convolutional Network for entity alignment over heterogeneous KGs. Our approach is designed to explore complex relation information that commonly exists in multi-relational KGs. By modeling the attentive interactions between the primal graph and dual relation graph, our model is able to incorporate relation information with neighboring structural information through gated GCN layers, and learn better entity representations for alignment. Compared to the state-of-the-art methods, our model uses less training data but achieves the best alignment performance across three real-world datasets.
This work is supported in part by the NSFC (Grant No. 61672057, 61672058, 61872294), the National Hi-Tech R&D Program of China (No. 2018YFC0831905), and a UK Royal Society International Collaboration Grant (IE161012). For any correspondence, please contact Yansong Feng.
Diffusion-convolutional neural networks.In NIPS, pages 1993–2001, 2016.
Knowledge graph embedding by translating on hyperplanes.In AAAI, 2014.