Recurrent Skipping Networks for Entity Alignment

by   Lingbing Guo, et al.
Nanjing University

We consider the problem of learning knowledge graph (KG) embeddings for entity alignment (EA). Current methods use the embedding models mainly focusing on triple-level learning, which lacks the ability of capturing long-term dependencies existing in KGs. Consequently, the embedding-based EA methods heavily rely on the amount of prior (known) alignment, due to the identity information in the prior alignment cannot be efficiently propagated from one KG to another. In this paper, we propose RSN4EA (recurrent skipping networks for EA), which leverages biased random walk sampling for generating long paths across KGs and models the paths with a novel recurrent skipping network (RSN). RSN integrates the conventional recurrent neural network (RNN) with residual learning and can largely improve the convergence speed and performance with only a few more parameters. We evaluated RSN4EA on a series of datasets constructed from real-world KGs. Our experimental results showed that it outperformed a number of state-of-the-art embedding-based EA methods and also achieved comparable performance for KG completion.


page 1

page 2

page 3

page 4


Learning to Exploit Long-term Relational Dependencies in Knowledge Graphs

We study the problem of knowledge graph (KG) embedding. A widely-establi...

Is Aligning Embedding Spaces a Challenging Task? An Analysis of the Existing Methods

Representation Learning of words and Knowledge Graphs (KG) into low dime...

Cross-lingual Entity Alignment for Knowledge Graphs with Incidental Supervision from Free Text

Much research effort has been put to multilingual knowledge graph (KG) e...

Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning

Benefiting from the excellent ability of neural networks on learning sem...

Coordinated Reasoning for Cross-Lingual Knowledge Graph Alignment

Existing entity alignment methods mainly vary on the choices of encoding...

Deep Reinforcement Learning for Entity Alignment

Embedding-based methods have attracted increasing attention in recent en...

Neural Models for Reasoning over Multiple Mentions using Coreference

Many problems in NLP require aggregating information from multiple menti...


Knowledge graphs (KGs) have become one of the most important resources for many areas, e.g., question answering and recommendation. Many KGs are created and maintained by different parties and in various languages, which makes them inevitably heterogeneous. Entity alignment (EA) aims to address this problem. It finds entities in two KGs referring to the same real-world object.

Recently, a number of methods start to consider leveraging the representation learning techniques for EA [Chen et al.2017, Sun, Hu, and Li2017, Sun et al.2018, Chen et al.2018]. Most of them are based on a classical KG embedding model called TransE [Bordes et al.2013], which interprets each triple in a KG as , where and denote the subject and object entities respectively, and denotes the relation label between them. However, these methods may suffer from the problem of modeling multi-relational triples [Lin et al.2015a]. Moreover, they only concern triple-level embeddings, i.e., they train a triple only using the embeddings of and . Although the information of multi-hop neighbors can be passed during several rounds of mini-batches using back propagation [Wang et al.2017], the efficiency would be severely affected, especially for the case of crossing KGs. A path-based method IPTransE [Zhu et al.2017] tries to learn inferences among relations, but it still concentrates on the triple-level embedding learning. The long-term dependencies of entities are ignored by the current methods. For EA, the triple-level embedding learning limits the identity information propagating across KGs, especially for the entities which are not well connected with other entities or far away from the entities in prior alignment (i.e., entity alignment known ahead of time). Also, the triple-level learning only uses triples involved in prior alignment to deliver information across KGs, it also makes the current methods heavily rely on the amount of prior alignment.

KGs can be regarded as multi-relational graphs and triples are just paths of length 1. If a KG embedding model is capable of being aware of the associations among entities in long paths, the trained embeddings would contain much richer information and thus help EA. However, none of the current EA methods takes modeling KG paths into consideration. To model KG paths, there exist two challenges that need to be solved. The first one is how to obtain these paths. A KG may have millions (even billions) of triples and the number of its paths is also huge. It is difficult, if not impossible, to use all of them for training. The second challenge is how to model these paths. The edges in the paths have labels and directions. We cannot simply ignore them when modeling the dependencies among entities.

In this paper, we propose a new method, called RSN4EA (recurrent skipping networks for EA), which employs random walk sampling to efficiently sample paths across KGs, and models the paths with a novel recurrent skipping network (RSN). According to the network representation learning [Perozzi, Al-Rfou, and Skiena2014, Grover and Leskovec2016], an appropriate sampling method reduces computational complexity and often brings good performance. So, sampling paths from KGs is also worth exploring. Compared with networks, which typically consider edges with no labels or directions, KGs have more complex graph structures. Furthermore, our problem requires to propagate the identity information through the paths across KGs. To deal with these issues, we design a biased random walk sampling method to fluently control the depth and cross-KG biases of generated paths.

To model paths or sentences, Skip-gram [Mikolov, Yih, and Zweig2013]

is widely used in the natural language processing area. It can efficiently encode the neighboring information into embeddings, which is important for discovering clusters or communities of related nodes (words). However, Skip-gram does not consider the order of nodes, while relations in KGs have different directions and enormous labels. The recurrent neural network (RNN) is a popular sequential model. It assumes that the next element only depends on the current input and the previous hidden state. But this assumption has inconsiderations for KG path modeling. Take a path

for example, RNN uses the input and the previous hidden state to infer . However, all the context of is mixed in , which overlooks the importance of . Note that this path is also constituted by two triples. To predict the object entity of , both and should be more appreciated than others. To achieve this, we combine the idea of residual learning [He et al.2016] with RNN to let the output hidden state of learn a residual between the subject and the desired prediction , which leads to our recurrent skipping network (RSN).

To evaluate RSN4EA, we built a series of datasets from real-world KGs. The previous work did not carefully consider the density and degree distributions of their datasets, which makes the datasets used in their experiments much denser than the original KGs. Also, their sampling methods are vague. In this paper, we created four couples of datasets, which were sampled with a reliable method and consider mono/cross-lingual scenarios and normal/high density.

The main contributions of this paper are listed below:

  • We propose RSN4EA, an end-to-end framework for EA, which is capable of capturing long-term dependencies existing in KGs.

  • We design a biased random walk sampling method specific to EA, which generates desired paths with controllable depth and cross-KG biases.

  • To revise the inconsideration of RNN for KG path modeling, we present RSN, which leverages the idea of residual learning and can largely improve the convergence speed and performance.

  • To demonstrate the feasibility of our method, we carried out EA experiments on the datasets with different density and languages. The results showed that our method stably outperformed the existing methods. Also, RSN4EA achieved comparable performance for KG completion.

Related Work

We divide the related work into three areas: KG representation learning, embedding-based EA and network representation learning. We discuss them in the rest of this section.

KG Representation Learning

KG representation learning has been widely studied in recent years [Wang et al.2017]. One of the most famous translational methods is TransE [Bordes et al.2013], which models a triple as . TransE works well for one-to-one relationships, but fails to model more complex relationships like one-to-many and many-to-many. TransR [Lin et al.2015a] tries to solve this problem by involving a relation-specific matrix to project by . PTransE [Lin et al.2015b] leverages path information to learn inferences among relations. For example, if there exist two triples , which form a path in KG, and another triple holds simultaneously, PTransE models the path information by learning , where denotes the operator used to merge . KG completion is the most prevalent task for KG representation learning, and there also exist some non-translation methods that are particularly tailored for KG completion [Trouillon et al.2016, Dettmers et al.2018].

Embedding-based Entity Alignment

Existing embedding-based EA methods are usually based on TransE. Specifically, MTransE [Chen et al.2017] separately trains the entity embeddings of two KGs and learns various transformations to align the embeddings. JAPE [Sun, Hu, and Li2017] is also based on TransE but learns the embeddings of two KGs in a unified space. Additionally, JAPE leverages attributes to refine entity embeddings. IPTransE [Zhu et al.2017] employs an iterative process on the original PTransE [Lin et al.2015b] for EA. Different from our method, it still concentrates on triple-level learning and does not consider the dependencies among entities in KG paths. BootEA [Sun et al.2018] takes bootstrapping into consideration and uses a sophisticated strategy to update alignment during iterations. KDCoE [Chen et al.2018] leverages co-training for separately training entity relations and entity descriptions. Like bootstrapping, propagating alignment to each other may involve errors. Moreover, it requires extra resources like pre-trained multi-lingual word embeddings and descriptions.

Because all the aforementioned methods use TransE-like models as the basic model, they are not capable of capturing long-term dependencies in KGs and the identity information propagating between different KGs is also limited.

Network Representation Learning

DeepWalk [Perozzi, Al-Rfou, and Skiena2014] is one of the most well-known models in the network representation learning area. It uses uniform random walks to sample paths in a network, and applies Skip-Gram [Mikolov, Yih, and Zweig2013]

to model the generated paths. Skip-Gram learns the embedding of a node by maximizing the probabilities of its neighbors, which captures the information among the nodes. node2vec

[Grover and Leskovec2016] proposes biased random walks to refine the process of sampling paths from a network. It smoothly controls the node selection strategy to make the random walks explore neighbors in a breadth-first-search as well as a depth-first-search fashion. In this paper, the proposed EA-specific random walk sampling is inspired by node2vec, but concentrates on generating long and cross-KG paths.

The methods in the network representation learning area mainly focus on discovering clusters or communities of related nodes. However, they are inappropriate to EA, since EA requires identifying entity alignment in two KGs.

Method Overview

A KG is defined as a directed multi-relational graph whose nodes correspond to entities and edges are of the form (denoted as ), each of which indicates that there exists a relation of name between the entities and .

EA is the task of finding entities in two KGs that refer to the same real-world object. In many cases (e.g., Linked Open Data), a subset of aligned entities, called prior alignment, is known as training data. Based on it, many existing methods, such as [Zhu et al.2017, Sun, Hu, and Li2017, Sun et al.2018], merge the two KGs into a connected joint graph and learn entity embeddings on it.

Figure 1 illustrates the architecture of our method, which accepts two KGs as input and adopts an end-to-end framework for aligning the entities between them. The main modules in the framework are described as follows:

  • Biased random walk sampling. To leverage graph sampling for EA, we first create a joint graph between the two KGs by copying the edges of one entity in prior alignment to another. Additionally, since the relation directions between entities are often arbitrary, we add a virtual reverse relation, marked by “”, for each existing relation. Thus, the object entity in a triple can follow the reverse relation to reach the subject entity. Figure 1 exemplifies the joint graph of KG and KG with reverse relations.

    Then, we conduct the biased random walk sampling on the joint graph to explore longer and cross-KG paths. We describe the details in the next section. Finally, each path, e.g., , is converted into a KG sequence and fed to the next module.

  • Recurrent skipping network (RSN).

    RNN is natural and flexible to process sequential data types. However, it is not aware of different element types (“entity” vs. “relation”) in KG sequences and basic KG structural units (i.e., triples). To cope with these issues, we propose RSN, which distinguishes entities from relations, and leverages the idea of residual learning by letting a subject entity skip its connection to directly participate in the object entity prediction. We present RSN in detail shortly. Each output of RSN is passed to the type-based noise contrastive estimation (NCE) for learning to predict the next element.

  • Type-based noise contrastive estimation. NCE [Gutmann and Hyvärinen2010]

    is a very popular estimation method in natural language processing, which samples a small number of negative classes to approximate the integral distribution. As aforementioned, entities and relations are of different types. So, we design a type-based method to sample negative examples according to element types, and use different weight matrices and biases to respectively calculate the logits for the two types of elements. By back propagation, the embedding of each input element is not only learned from predicting its next, but associated with the elements along the KG sequence.

  • Embedding-based EA.

    With entity embeddings from the two KGs learned in a unified space, given a source entity, its aligned target entity can be discovered by searching the nearest neighbors in this space using the cosine similarity.

Figure 1: Architecture of the proposed method

Biased Random Walk Sampling for EA

Random walks have been used as the sampling methods in network representation learning for a long time [Perozzi, Al-Rfou, and Skiena2014]. KGs share a lot of features with networks, such as large scale and sparsity. In this section, we present a biased random walk sampling method specific to EA, which can efficiently explore long and cross-KG sequences.

Random Walk Sampling

Given a start entity

in the joint graph, an unbiased random walk method obtains the probability distribution of next entities by the following equation:


where denotes the node in this walk and we have . denotes an arbitrary relation from current entity to next entity . is the unnormalized transition probability between and . is the normalizing constant.

Biased Random Walk Sampling

The above random walk method decides next entities in a uniform distribution. To model KGs, the basic training unit is triple, which means that the information of near entities can be updated via back propagation in different mini-batches. However, delivering the information of farther entities only with triples is hard and low-effective. Capturing longer paths of KGs becomes helpful.

To achieve this, we employ a 2-order random walk sampling method in [Grover and Leskovec2016] and propose a depth bias to smoothly control the depths of sampled paths. Formally, given an entity in the joint graph, the depth bias between ’s previous entity and next entity , denoted by , is defined as follows:


where calculates the shortest path distance and its value must be one of . Hyper-parameter controls the depths of random walks. To favor longer paths, we let . For multi-edges, we treat their biases equal.

Let us see Figure 1 for example. Consider a random walk that just traversed edge and now resides at . The walk now needs to decide on the next step so it evaluates the transition probabilities on edges leading from . We set the unnormalized transition probability to , where is the static edge weight. In the case of unweighted graphs, .

Furthermore, specific to EA, we propose a cross-KG bias to favor paths connecting two KGs. Formally, given an entity in the joint graph, the cross-KG bias between ’s previous entity and next entity , denoted by , is defined as follows:


where is a hyper-parameter controlling the preferences of random walks across two KGs. To favor cross-KG paths, we let . Similar to the depth bias, using previous and next entities avoids walking back and forth between only two entities in different KGs.

Finally, we combine and into overall bias and perform random walk sampling based on it:


Recall the above example. According to the overall bias, the walk at prefers and in KG to in KG. A KG sequence converted from this walk would be .

Recurrent Skipping Networks

In this section, we first describe the conventional RNN. Then, we propose our RSN and discuss its characteristics.

Recurrent Neural Networks

RNN is a popular class of artificial neural network which performs well on sequential data types. Given a KG sequence as input, an RNN recurrently processes it with the following equation:


where is the output hidden state at time step . are the weight matrices. is the bias.

RNN is capable of using a few parameters to cope with input of any length. It has achieved state-of-the-art performance in many areas. However, there still exist a few limitations when RNN is used to process KG sequences.

First, the elements in a KG sequence are of two different types, namely “entity” and “relation”, which always appear in an alternant order. However, the conventional RNN regards them as the same type elements like words or nodes, which makes the procedure of capturing the information in the KG sequences less effective.

Second, any KG sequences are constituted by triples, but these basic structural units are overlooked by RNN. Specifically, let denote a relation in a KG sequence and denote a triple involving . As shown in Eq. (5), to predict , RNN would combine the hidden state and the current input , where is a mix of the information of all the previous elements . However, it is expected that the information of in the triple can be more appreciated.

Improving RNN with the Skipping Mechanism

To better model KG sequences and remedy the semantic inconsideration of the conventional RNN, we propose the recurrent skipping network (RSN), which refines RNN with a simple but effective skipping mechanism.

The basic idea of RSN is to shortcut current input entity to let it directly participate in predicting its object entity. In other words, an input element in a KG sequence whose type is “entity” can not only contribute to predicting its next relation, but also straightly take part in predicting its object entity. Figure 1 shows an RSN example.

Formally, given a KG sequence as input, the skipping operation for an RSN is formulated as follows:


where denotes the output hidden state of the RSN at time step , and denotes the corresponding RNN output. is the weight matrix. In this paper, we select weighted sum for the skipping operation, but other combination methods can be supported as well.

Explanation of RSN.

Intuitively, RSN explicitly distinguishes entities and relations, and allows subject entities to skip their connections for directly participating in object entity predication. Behind this simple skipping operation, there exists a deeper explanation called residual learning.

Let be an original mapping, where denotes the input, and be the expected mapping. Compared to directly optimizing to fit , residual learning hypothesizes that it is easier to optimize to fit the residual part . For an extreme case, if an identity mapping is optimal (i.e., ), pushing the residual to zero would be much easier than fitting an identity mapping by a stack of nonlinear layers [He et al.2016].

Different from ResNet [He et al.2016] or recurrent residual network (RRN) [Wang and Tian2016], which were proposed to help train very deep networks, RSN employs residual learning on “shallow” networks. The skipping connections do not link the previous input to the very deep layers, but only concentrate on each triple in a KG sequence.

Specifically, given a KG sequence , where forms a triple, RRN leverages residual learning by regarding the process at each time step as a mini-residual network with the previous hidden state of RNN as input. Take time step for example, RRN regards as input, and learns the residual , where denotes the expected mapping for . It still ignores the structure of KGs that should be more appreciated for predicting .

Differently, RSN leverages the residual learning in a new way. Instead of using an input as subtrahend (), it directly chooses the subject entity as subtrahend. Making the output hidden state to fit may be hard, but learning the residual of and may be easier, which is the key characteristic of RSN.

Experiments and Results

We evaluated RSN4EA for EA using a variety of real-world datasets. In this section, we report the results compared with several state-of-the-art embedding-based EA methods. Since RSN4EA is capable of learning KG embeddings, we also conducted experiments to assess its performance on KG completion [Bordes et al.2013], which is a classical task for KG representation learning.


Datasets Sources Normal Dense
 #Rel. #Rel tr. #Rel. #Rel tr.
DBP-WD DBpedia (English) 248 38,256 219 67,954
Wikidata (English) 148 39,605 137 76,034
DBP-YG DBpedia (English) 219 33,571 206 71,257
YAGO3 (English) 30 34,660 30 97,131
EN-FR DBpedia (English) 230 35,139 218 71,587
DBpedia (French) 181 32,827 171 66,283
EN-DE DBpedia (English) 225 38,281 207 56,983
DBpedia (German) 118 37,069 117 59,848
We also extracted attribute triple of the sampled entities from original KGs.
Table 1: Statistics of the datasets

Although the datasets used by existing methods [Chen et al.2017, Sun, Hu, and Li2017, Sun et al.2018] are all sampled from real-world KGs, such as DBpedia and Wikidata, their density and degree distributions are quite different from the original ones. We argue that this status may prevent us from a comprehensive and accurate understanding of embedding-based EA. In this paper, we propose a segment-based random PageRank (SRP) sampling method, which can fluently control the density of sampled datasets.

Random PageRank sampling is an efficient algorithm for large graph sampling [Leskovec and Faloutsos2006]. It samples nodes according to the PageRank weights and can assign higher biases to more valuable entities. However, due to the characteristic of PageRank, it also favors high-degree nodes. To fulfill our requirements on KG sampling, we divided the entities in a KG into segments according to their degrees and performed sampling separately. To guarantee the distributions of sampled datasets following the original KGs, we used Kolmogorov-Smirnov (K-S) test to measure the difference. We set our expectation to for all the datasets.

Based on the above sampling method, we obtained four couples of datasets to evaluate the performance of the embedding-based EA methods. The detailed statistics are shown in Table 1. Each dataset contains nearly 15,000 entities. For the normal datasets, they follow the density of the original KGs. For the dense datasets, we randomly deleted entities with low degrees in the original KGs to make the average degree doubled, and then conducted sampling. Therefore, the dense datasets are more similar to the datasets used by the existing methods [Chen et al.2017, Sun, Hu, and Li2017, Sun et al.2018]. Figure 2 shows the degree distributions of source KGs and the sampled datasets from different methods. We can see that our normal datasets best represent the original KGs.

(a) DBpedia
(b) Wikidata
Figure 2: Degree distributions of the datasets extracted by different methods
Methods DBP-WD (normal) DBP-WD (dense) DBP-YG (normal) DBP-YG (dense)
Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR
MTransE 22.3 50.1 0.32 38.9 68.7 0.49 24.6 54.0 0.34 22.8 51.3 0.32
IPTransE 23.1 51.7 0.33 43.5 74.5 0.54 22.7 50.0 0.32 23.6 51.3 0.33
JAPE 21.9 50.1 0.31 39.3 70.5 0.50 23.3 52.7 0.33 26.8 57.3 0.37
KDCoE 24.6 51.5 0.34 56.5 83.1 0.65 22.7 47.0 0.31 56.8 80.4 0.64
BootEA 32.3 63.1 0.42 67.8 91.2 0.76 31.3 62.5 0.42 68.2 89.8 0.76
RSN4EA 38.8 65.7 0.49 76.3 92.4 0.83 40.0 67.5 0.50 82.6 95.8 0.87
The best results are marked in bold. The same to the following.
Table 2: Entity alignment results on monolingual datasets
Methods EN-FR (normal) EN-FR (dense) EN-DE (normal) EN-DE (dense)
Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR
MTransE 25.1 55.1 0.35 37.7 70.0 0.49 31.2 58.6 0.40 34.7 62.0 0.44
IPTransE 25.5 55.7 0.36 42.9 78.3 0.55 31.3 59.2 0.41 34.0 63.2 0.44
JAPE 25.6 56.2 0.36 40.7 72.7 0.52 32.0 59.9 0.41 37.5 66.1 0.47
KDCoE 22.1 47.4 0.33 54.5 85.1 0.65 34.1 56.9 0.42 58.7 79.9 0.66
BootEA 31.3 62.9 0.42 64.8 91.9 0.74 44.2 70.1 0.53 66.5 87.1 0.73
RSN4EA 34.7 63.1 0.44 75.6 92.5 0.82 48.7 72.0 0.57 73.9 89.0 0.79
Table 3: Entity alignment results on cross-lingual datasets

Implementation Details

We built RSN4EA using TensorFlow. The embeddings and weight matrices were initialized with Xavier initializer, and the embedding size was set to 256. We used two-layer LSTM

[Hochreiter and Schmidhuber1997] with Dropout [Srivastava et al.2014]

, and conducted batch normalization

[Ioffe and Szegedy2015] for both input and output of an RSN. We used Adam optimizer [Kingma and Ba2015]

with mini-batch size 512 and learning rate 0.003. We trained an RSN for up to 30 epochs. The random walk biases were set to

, and the walk length was set to 15. The source code, datasets and results will be available online.

For the comparative methods, we used the source code provided in their papers except KDCoE, since KDCoE has not released its source code yet. We implemented KDCoE by ourselves. We tried our best effort to adjust the hyper-parameters to make the performance optimal. Following the previous work [Sun, Hu, and Li2017, Sun et al.2018]

, we used 30% of reference alignment as prior alignment and chose Hits@1, Hits@10 and mean reciprocal rank (MRR) as evaluation metrics.

Results on Entity Alignment

Tables 2 and 3 depict the EA results on monolingual and cross-lingual datasets, respectively. It is evident that capturing long-term dependencies by paths enables RSN4EA to outperform the existing EA methods.

Generally, the heterogeneity of different KGs is more severe than a KG with different languages. A key module for embedding-based EA methods is to embed the information of entities in different KGs into a unified space. Thus, aligning entities in different KGs is more difficult for embedding-based EA methods. With the help of establishing long-term dependencies, RSN4EA captured richer information of KGs and learned more accurate embeddings, leading to more significant improvement on the more heterogenous datasets (DBP-WD and DBP-YG).

The two tables also demonstrate that the embedding-based EA methods are sensitive to the density. The performance of all the methods on the normal datasets is significantly lower than that on the dense datasets. Although the normal datasets are more difficult, RSN4EA still showed considerable advantages compared with the other methods, since it used long paths to capture implicit connections among entities and represented them in the embeddings.

It is worth noting that RSN4EA showed larger superiority in terms of Hits@1 and MRR. This is due to the fact that Hits@1 only considers the completely correct results, and MRR also favors top-ranked results. As aforementioned, RSN4EA embedded the long-term dependencies into the learned embeddings, which contains richer information to help identify aligned entities in different KGs. The better performance on these two metrics verified this point.

Results on KG Completion

Methods Hits@1 Hits@10 MRR
TransE [Bordes et al.2013] 13.3 40.9 0.22
TransR [Lin et al.2015a] 10.9 38.2 0.20
ComplEx [Trouillon et al.2016] 15.2 41.9 0.24
NeuralLP [Yang, Yang, and Cohen2017] 36.2 0.24
ConvE [Dettmers et al.2018] 23.9 49.1 0.31
RSN4EA (w/o cross-KG bias) 20.0 43.6 0.28
” denotes the methods executed by ourselves using the provided source
    code, due to some metrics were not used in literature.
“–” denotes the unknown results, due to we cannot obtain the source code.
Table 4: KG completion results on FB15K-237

Since RSN4EA can train KG embeddings for EA, it is also interesting to apply RSN4EA to KG completion [Bordes et al.2013], which is one of the most prevalent task for KG representation learning. To achieve this, we removed the cross-KG bias during the random walk sampling and conducted the KG completion experiment. Specifically, for a triple , KG completion aims to predict the object entity given or predict the subject entity given .

FB15K and WN18 are the most widely-used benchmark datasets for KG completion [Bordes et al.2013]. However, recent studies [Toutanova and Chen2015, Dettmers et al.2018] exposed that these two datasets have the problem of leaking testing data. To solve this issue, a new dataset called FB15K-237 was recommended, and we used this dataset to assess RSN4EA in our experiments.

The experimental results are shown in Table 4. ConvE—a method tailored to KG completion—obtained the best results on FB15K-237, followed by our RSN4EA. It is worth noting that, while predicting the entities given one triple is not the primary goal of RSN4EA, it still achieved comparable or better performance than many methods focusing on KG completion, which indicated the potential of leveraging KG paths for learning embeddings.

Further Analysis

Figure 3: Hits@1 results w.r.t. epochs required by alternative networks to converge

Comparison with Alternative Networks

To assess the feasibility of RSN, we conducted experiments to compare it with RNN and RRN. Both RNN and RRN were implemented using the same multi-layer LSTM units, Dropout and batch normalization.

The comparison results are shown in Figure 3. Since RNN and RRN did not consider the structure of KG paths, they converged the embedding learning at a very slow speed. Compared with RNN, RSN achieved better performance with only time cost, which indicated that this particular residual structure is essential for RSN4EA. Furthermore, RRN is a generic network involving residual learning in the conventional RNN. But it only achieved little improvement compared with RNN. This implied that simply combining residual learning with RNN cannot significantly help KG sequence modeling.

Sensitivity to Proportion of Prior Alignment

The proportion of prior alignment may significantly influence the performance of embedding-based EA methods. However, we may not obtain a large number of prior alignment in practice. We tested the performance of RSN4EA and BootEA (the second best method in our previous experiments) in terms of the proportion of prior alignment from 50% to 10% with step 10%.

Due to space limitation, we only depicted the results on the DBP-WD dataset in Figure 4. The performance of the two methods continually dropped with the decreasing proportion of prior alignment. However, the curves of RSN4EA are gentler than BootEA. Specifically, on the normal dataset, for the four proportion intervals, RSN4EA lost 7.4%, 8.2%, 16.5% and 30.2% on Hits@1 respectively, while BootEA lost 11.8%, 12.0%, 22.3% and 49.8% respectively, which demonstrated that RSN4EA is a more stable method. Additionally, when the proportion was down to 10%, the Hits@1 result of RSN4EA on the normal dataset was almost twice higher than that of BootEA, which indicated that modeling paths helps RSN4EA propagate the identity information across KGs more effectively and alleviates the dependence on the proportion of prior alignment.

Sensitivity to Random Walk Length

We also observed how the random walk length affected the EA performance. As shown in Figure 5, on all the eight datasets, the Hits@1 results increased sharply during length 5 to 15, which indicates that modeling longer paths can help learn KG embeddings and obtain better performance. Furthermore, we observed that the performance approached to saturation for length 15 to 25. Therefore, in consideration of the efficiency, the results reported in Tables 2 and 3 are based on length 15.

(a) DBP-WD (normal)
(b) DBP-WD (dense)
Figure 4: Hits@1 results w.r.t. proportion of prior alignment
(a) Normal datasets
(b) Dense datasets
Figure 5: Hits@1 results w.r.t. random walk length

Conclusion and Future Work

In this paper, we proposed RSN4EA, which employs biased random walks to sample paths specific to EA, and leverages RSN for learning KG embeddings. Our experimental results showed that RSN4EA not only outperformed the existing embedding-based EA methods, but also achieved superior performance compared with RNN and RRN. It also worked well for KG completion.

In future work, we plan to continue exploring KG sequence learning. First, KGs often contain rich textual information like names and descriptions. Such information can be modeled with character-/word-level sequential models. RSN is capable of modeling KGs in a sequential manner, therefore it is worth studying a unified sequential model to learn KG embeddings using all valuable information. Second, in addition to paths, the neighboring information provides another type of context and may be also helpful for learning KG embeddings. We look forward to integrating the neighboring context to further improve the performance.


  • [Bordes et al.2013] Bordes, A.; Usunier, N.; Garcia-Durán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multi-relational data. In Proc. of the 26th International Conference on Neural Information Processing Systems, 2787–2795.
  • [Chen et al.2017] Chen, M.; Tian, Y.; Yang, M.; and Zaniolo, C. 2017. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In

    Proc. of the 26th International Joint Conference on Artificial Intelligence

    , 1511–1517.
  • [Chen et al.2018] Chen, M.; Tian, Y.; Chang, K.-W.; Skiena, S.; and Zaniolo, C. 2018. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In Proc. of the 27th International Joint Conference on Artificial Intelligence, 3998–4004.
  • [Dettmers et al.2018] Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018. Convolutional 2D knowledge graph embeddings. In Proc. of the 32nd AAAI Conference on Artificial Intelligence, 1811–1818.
  • [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–864.
  • [Gutmann and Hyvärinen2010] Gutmann, M., and Hyvärinen, A. 2010. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proc. of the 13th International Conference on Artificial Intelligence and Statistics, 297–304.
  • [He et al.2016] He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep residual learning for image recognition. In

    Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition

    , 770–778.
  • [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Computation 9(8):1735–1780.
  • [Ioffe and Szegedy2015] Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In

    Proc. of the 32nd International Conference on Machine Learning

    , 448–456.
  • [Kingma and Ba2015] Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. In Proc. of the 3rd International Conference on Learning Representations.
  • [Leskovec and Faloutsos2006] Leskovec, J., and Faloutsos, C. 2006. Sampling from large graphs. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631–636.
  • [Lin et al.2015a] Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015a. Learning entity and relation embeddings for knowledge graph completion. In Proc. of the 29th AAAI Conference on Artificial Intelligence, 2181–2187.
  • [Lin et al.2015b] Lin, Y.; Liu, Z.; Luan, H.-B.; Sun, M.; Rao, S.; and Liu, S. 2015b. Modeling relation paths for representation learning of knowledge bases. In Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics, 705–714.
  • [Mikolov, Yih, and Zweig2013] Mikolov, T.; Yih, W.; and Zweig, G. 2013. Linguistic regularities in continuous space word representations. In Proc. of the 11th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751.
  • [Perozzi, Al-Rfou, and Skiena2014] Perozzi, B.; Al-Rfou, R.; and Skiena, S. 2014. DeepWalk: Online learning of social representations. In Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710.
  • [Srivastava et al.2014] Srivastava, N.; Hinton, G. E.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1):1929–1958.
  • [Sun et al.2018] Sun, Z.; Hu, W.; Zhang, Q.; and Qu, Y. 2018. Bootstrapping entity alignment with knowledge graph embedding. In Proc. of the 27th International Joint Conference on Artificial Intelligence, 4396–4402.
  • [Sun, Hu, and Li2017] Sun, Z.; Hu, W.; and Li, C. 2017. Cross-lingual entity alignment via joint attribute-preserving embedding. In Proc. of the 16th International Semantic Web Conference, 628–644.
  • [Toutanova and Chen2015] Toutanova, K., and Chen, D. 2015. Observed versus latent features for knowledge base and text inference. In

    Proc. of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

    , 57–66.
    Beijing, China: ACL.
  • [Trouillon et al.2016] Trouillon, T.; Welbl, J.; Riedel, S.; Éric Gaussier; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In Proc. of the 33rd International Conference on Machine Learning, 2071–2080.
  • [Wang and Tian2016] Wang, Y., and Tian, F. 2016. Recurrent residual learning for sequence classification. In Proc. of the 13th Conference on Empirical Methods in Natural Language Processing, 938–943.
  • [Wang et al.2017] Wang, Q.; Mao, Z.; Wang, B.; and Guo, L. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29(12):2724–2743.
  • [Yang, Yang, and Cohen2017] Yang, F.; Yang, Z.; and Cohen, W. W. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Proc. of the 30th International Conference on Neural Information Processing Systems, 2316–2325.
  • [Zhu et al.2017] Zhu, H.; Xie, R.; Liu, Z.; and Sun, M. 2017. Iterative entity alignment via joint knowledge embeddings. In Proc. of the 26th International Joint Conference on Artificial Intelligence, 4258–4264.