Introduction
Knowledge graphs (KGs) have become one of the most important resources for many areas, e.g., question answering and recommendation. Many KGs are created and maintained by different parties and in various languages, which makes them inevitably heterogeneous. Entity alignment (EA) aims to address this problem. It finds entities in two KGs referring to the same realworld object.
Recently, a number of methods start to consider leveraging the representation learning techniques for EA [Chen et al.2017, Sun, Hu, and Li2017, Sun et al.2018, Chen et al.2018]. Most of them are based on a classical KG embedding model called TransE [Bordes et al.2013], which interprets each triple in a KG as , where and denote the subject and object entities respectively, and denotes the relation label between them. However, these methods may suffer from the problem of modeling multirelational triples [Lin et al.2015a]. Moreover, they only concern triplelevel embeddings, i.e., they train a triple only using the embeddings of and . Although the information of multihop neighbors can be passed during several rounds of minibatches using back propagation [Wang et al.2017], the efficiency would be severely affected, especially for the case of crossing KGs. A pathbased method IPTransE [Zhu et al.2017] tries to learn inferences among relations, but it still concentrates on the triplelevel embedding learning. The longterm dependencies of entities are ignored by the current methods. For EA, the triplelevel embedding learning limits the identity information propagating across KGs, especially for the entities which are not well connected with other entities or far away from the entities in prior alignment (i.e., entity alignment known ahead of time). Also, the triplelevel learning only uses triples involved in prior alignment to deliver information across KGs, it also makes the current methods heavily rely on the amount of prior alignment.
KGs can be regarded as multirelational graphs and triples are just paths of length 1. If a KG embedding model is capable of being aware of the associations among entities in long paths, the trained embeddings would contain much richer information and thus help EA. However, none of the current EA methods takes modeling KG paths into consideration. To model KG paths, there exist two challenges that need to be solved. The first one is how to obtain these paths. A KG may have millions (even billions) of triples and the number of its paths is also huge. It is difficult, if not impossible, to use all of them for training. The second challenge is how to model these paths. The edges in the paths have labels and directions. We cannot simply ignore them when modeling the dependencies among entities.
In this paper, we propose a new method, called RSN4EA (recurrent skipping networks for EA), which employs random walk sampling to efficiently sample paths across KGs, and models the paths with a novel recurrent skipping network (RSN). According to the network representation learning [Perozzi, AlRfou, and Skiena2014, Grover and Leskovec2016], an appropriate sampling method reduces computational complexity and often brings good performance. So, sampling paths from KGs is also worth exploring. Compared with networks, which typically consider edges with no labels or directions, KGs have more complex graph structures. Furthermore, our problem requires to propagate the identity information through the paths across KGs. To deal with these issues, we design a biased random walk sampling method to fluently control the depth and crossKG biases of generated paths.
To model paths or sentences, Skipgram [Mikolov, Yih, and Zweig2013]
is widely used in the natural language processing area. It can efficiently encode the neighboring information into embeddings, which is important for discovering clusters or communities of related nodes (words). However, Skipgram does not consider the order of nodes, while relations in KGs have different directions and enormous labels. The recurrent neural network (RNN) is a popular sequential model. It assumes that the next element only depends on the current input and the previous hidden state. But this assumption has inconsiderations for KG path modeling. Take a path
for example, RNN uses the input and the previous hidden state to infer . However, all the context of is mixed in , which overlooks the importance of . Note that this path is also constituted by two triples. To predict the object entity of , both and should be more appreciated than others. To achieve this, we combine the idea of residual learning [He et al.2016] with RNN to let the output hidden state of learn a residual between the subject and the desired prediction , which leads to our recurrent skipping network (RSN).To evaluate RSN4EA, we built a series of datasets from realworld KGs. The previous work did not carefully consider the density and degree distributions of their datasets, which makes the datasets used in their experiments much denser than the original KGs. Also, their sampling methods are vague. In this paper, we created four couples of datasets, which were sampled with a reliable method and consider mono/crosslingual scenarios and normal/high density.
The main contributions of this paper are listed below:

We propose RSN4EA, an endtoend framework for EA, which is capable of capturing longterm dependencies existing in KGs.

We design a biased random walk sampling method specific to EA, which generates desired paths with controllable depth and crossKG biases.

To revise the inconsideration of RNN for KG path modeling, we present RSN, which leverages the idea of residual learning and can largely improve the convergence speed and performance.

To demonstrate the feasibility of our method, we carried out EA experiments on the datasets with different density and languages. The results showed that our method stably outperformed the existing methods. Also, RSN4EA achieved comparable performance for KG completion.
Related Work
We divide the related work into three areas: KG representation learning, embeddingbased EA and network representation learning. We discuss them in the rest of this section.
KG Representation Learning
KG representation learning has been widely studied in recent years [Wang et al.2017]. One of the most famous translational methods is TransE [Bordes et al.2013], which models a triple as . TransE works well for onetoone relationships, but fails to model more complex relationships like onetomany and manytomany. TransR [Lin et al.2015a] tries to solve this problem by involving a relationspecific matrix to project by . PTransE [Lin et al.2015b] leverages path information to learn inferences among relations. For example, if there exist two triples , which form a path in KG, and another triple holds simultaneously, PTransE models the path information by learning , where denotes the operator used to merge . KG completion is the most prevalent task for KG representation learning, and there also exist some nontranslation methods that are particularly tailored for KG completion [Trouillon et al.2016, Dettmers et al.2018].
Embeddingbased Entity Alignment
Existing embeddingbased EA methods are usually based on TransE. Specifically, MTransE [Chen et al.2017] separately trains the entity embeddings of two KGs and learns various transformations to align the embeddings. JAPE [Sun, Hu, and Li2017] is also based on TransE but learns the embeddings of two KGs in a unified space. Additionally, JAPE leverages attributes to refine entity embeddings. IPTransE [Zhu et al.2017] employs an iterative process on the original PTransE [Lin et al.2015b] for EA. Different from our method, it still concentrates on triplelevel learning and does not consider the dependencies among entities in KG paths. BootEA [Sun et al.2018] takes bootstrapping into consideration and uses a sophisticated strategy to update alignment during iterations. KDCoE [Chen et al.2018] leverages cotraining for separately training entity relations and entity descriptions. Like bootstrapping, propagating alignment to each other may involve errors. Moreover, it requires extra resources like pretrained multilingual word embeddings and descriptions.
Because all the aforementioned methods use TransElike models as the basic model, they are not capable of capturing longterm dependencies in KGs and the identity information propagating between different KGs is also limited.
Network Representation Learning
DeepWalk [Perozzi, AlRfou, and Skiena2014] is one of the most wellknown models in the network representation learning area. It uses uniform random walks to sample paths in a network, and applies SkipGram [Mikolov, Yih, and Zweig2013]
to model the generated paths. SkipGram learns the embedding of a node by maximizing the probabilities of its neighbors, which captures the information among the nodes. node2vec
[Grover and Leskovec2016] proposes biased random walks to refine the process of sampling paths from a network. It smoothly controls the node selection strategy to make the random walks explore neighbors in a breadthfirstsearch as well as a depthfirstsearch fashion. In this paper, the proposed EAspecific random walk sampling is inspired by node2vec, but concentrates on generating long and crossKG paths.The methods in the network representation learning area mainly focus on discovering clusters or communities of related nodes. However, they are inappropriate to EA, since EA requires identifying entity alignment in two KGs.
Method Overview
A KG is defined as a directed multirelational graph whose nodes correspond to entities and edges are of the form (denoted as ), each of which indicates that there exists a relation of name between the entities and .
EA is the task of finding entities in two KGs that refer to the same realworld object. In many cases (e.g., Linked Open Data), a subset of aligned entities, called prior alignment, is known as training data. Based on it, many existing methods, such as [Zhu et al.2017, Sun, Hu, and Li2017, Sun et al.2018], merge the two KGs into a connected joint graph and learn entity embeddings on it.
Figure 1 illustrates the architecture of our method, which accepts two KGs as input and adopts an endtoend framework for aligning the entities between them. The main modules in the framework are described as follows:

Biased random walk sampling. To leverage graph sampling for EA, we first create a joint graph between the two KGs by copying the edges of one entity in prior alignment to another. Additionally, since the relation directions between entities are often arbitrary, we add a virtual reverse relation, marked by “”, for each existing relation. Thus, the object entity in a triple can follow the reverse relation to reach the subject entity. Figure 1 exemplifies the joint graph of KG and KG with reverse relations.
Then, we conduct the biased random walk sampling on the joint graph to explore longer and crossKG paths. We describe the details in the next section. Finally, each path, e.g., , is converted into a KG sequence and fed to the next module.

Recurrent skipping network (RSN).
RNN is natural and flexible to process sequential data types. However, it is not aware of different element types (“entity” vs. “relation”) in KG sequences and basic KG structural units (i.e., triples). To cope with these issues, we propose RSN, which distinguishes entities from relations, and leverages the idea of residual learning by letting a subject entity skip its connection to directly participate in the object entity prediction. We present RSN in detail shortly. Each output of RSN is passed to the typebased noise contrastive estimation (NCE) for learning to predict the next element.

Typebased noise contrastive estimation. NCE [Gutmann and Hyvärinen2010]
is a very popular estimation method in natural language processing, which samples a small number of negative classes to approximate the integral distribution. As aforementioned, entities and relations are of different types. So, we design a typebased method to sample negative examples according to element types, and use different weight matrices and biases to respectively calculate the logits for the two types of elements. By back propagation, the embedding of each input element is not only learned from predicting its next, but associated with the elements along the KG sequence.

Embeddingbased EA.
With entity embeddings from the two KGs learned in a unified space, given a source entity, its aligned target entity can be discovered by searching the nearest neighbors in this space using the cosine similarity.
Biased Random Walk Sampling for EA
Random walks have been used as the sampling methods in network representation learning for a long time [Perozzi, AlRfou, and Skiena2014]. KGs share a lot of features with networks, such as large scale and sparsity. In this section, we present a biased random walk sampling method specific to EA, which can efficiently explore long and crossKG sequences.
Random Walk Sampling
Given a start entity
in the joint graph, an unbiased random walk method obtains the probability distribution of next entities by the following equation:
(1) 
where denotes the node in this walk and we have . denotes an arbitrary relation from current entity to next entity . is the unnormalized transition probability between and . is the normalizing constant.
Biased Random Walk Sampling
The above random walk method decides next entities in a uniform distribution. To model KGs, the basic training unit is triple, which means that the information of near entities can be updated via back propagation in different minibatches. However, delivering the information of farther entities only with triples is hard and loweffective. Capturing longer paths of KGs becomes helpful.
To achieve this, we employ a 2order random walk sampling method in [Grover and Leskovec2016] and propose a depth bias to smoothly control the depths of sampled paths. Formally, given an entity in the joint graph, the depth bias between ’s previous entity and next entity , denoted by , is defined as follows:
(2) 
where calculates the shortest path distance and its value must be one of . Hyperparameter controls the depths of random walks. To favor longer paths, we let . For multiedges, we treat their biases equal.
Let us see Figure 1 for example. Consider a random walk that just traversed edge and now resides at . The walk now needs to decide on the next step so it evaluates the transition probabilities on edges leading from . We set the unnormalized transition probability to , where is the static edge weight. In the case of unweighted graphs, .
Furthermore, specific to EA, we propose a crossKG bias to favor paths connecting two KGs. Formally, given an entity in the joint graph, the crossKG bias between ’s previous entity and next entity , denoted by , is defined as follows:
(3) 
where is a hyperparameter controlling the preferences of random walks across two KGs. To favor crossKG paths, we let . Similar to the depth bias, using previous and next entities avoids walking back and forth between only two entities in different KGs.
Finally, we combine and into overall bias and perform random walk sampling based on it:
(4) 
Recall the above example. According to the overall bias, the walk at prefers and in KG to in KG. A KG sequence converted from this walk would be .
Recurrent Skipping Networks
In this section, we first describe the conventional RNN. Then, we propose our RSN and discuss its characteristics.
Recurrent Neural Networks
RNN is a popular class of artificial neural network which performs well on sequential data types. Given a KG sequence as input, an RNN recurrently processes it with the following equation:
(5) 
where is the output hidden state at time step . are the weight matrices. is the bias.
RNN is capable of using a few parameters to cope with input of any length. It has achieved stateoftheart performance in many areas. However, there still exist a few limitations when RNN is used to process KG sequences.
First, the elements in a KG sequence are of two different types, namely “entity” and “relation”, which always appear in an alternant order. However, the conventional RNN regards them as the same type elements like words or nodes, which makes the procedure of capturing the information in the KG sequences less effective.
Second, any KG sequences are constituted by triples, but these basic structural units are overlooked by RNN. Specifically, let denote a relation in a KG sequence and denote a triple involving . As shown in Eq. (5), to predict , RNN would combine the hidden state and the current input , where is a mix of the information of all the previous elements . However, it is expected that the information of in the triple can be more appreciated.
Improving RNN with the Skipping Mechanism
To better model KG sequences and remedy the semantic inconsideration of the conventional RNN, we propose the recurrent skipping network (RSN), which refines RNN with a simple but effective skipping mechanism.
The basic idea of RSN is to shortcut current input entity to let it directly participate in predicting its object entity. In other words, an input element in a KG sequence whose type is “entity” can not only contribute to predicting its next relation, but also straightly take part in predicting its object entity. Figure 1 shows an RSN example.
Formally, given a KG sequence as input, the skipping operation for an RSN is formulated as follows:
(6) 
where denotes the output hidden state of the RSN at time step , and denotes the corresponding RNN output. is the weight matrix. In this paper, we select weighted sum for the skipping operation, but other combination methods can be supported as well.
Explanation of RSN.
Intuitively, RSN explicitly distinguishes entities and relations, and allows subject entities to skip their connections for directly participating in object entity predication. Behind this simple skipping operation, there exists a deeper explanation called residual learning.
Let be an original mapping, where denotes the input, and be the expected mapping. Compared to directly optimizing to fit , residual learning hypothesizes that it is easier to optimize to fit the residual part . For an extreme case, if an identity mapping is optimal (i.e., ), pushing the residual to zero would be much easier than fitting an identity mapping by a stack of nonlinear layers [He et al.2016].
Different from ResNet [He et al.2016] or recurrent residual network (RRN) [Wang and Tian2016], which were proposed to help train very deep networks, RSN employs residual learning on “shallow” networks. The skipping connections do not link the previous input to the very deep layers, but only concentrate on each triple in a KG sequence.
Specifically, given a KG sequence , where forms a triple, RRN leverages residual learning by regarding the process at each time step as a miniresidual network with the previous hidden state of RNN as input. Take time step for example, RRN regards as input, and learns the residual , where denotes the expected mapping for . It still ignores the structure of KGs that should be more appreciated for predicting .
Differently, RSN leverages the residual learning in a new way. Instead of using an input as subtrahend (), it directly chooses the subject entity as subtrahend. Making the output hidden state to fit may be hard, but learning the residual of and may be easier, which is the key characteristic of RSN.
Experiments and Results
We evaluated RSN4EA for EA using a variety of realworld datasets. In this section, we report the results compared with several stateoftheart embeddingbased EA methods. Since RSN4EA is capable of learning KG embeddings, we also conducted experiments to assess its performance on KG completion [Bordes et al.2013], which is a classical task for KG representation learning.
Datasets
Datasets  Sources  Normal  Dense  
#Rel.  #Rel tr.  #Rel.  #Rel tr.  
DBPWD  DBpedia (English)  248  38,256  219  67,954 
Wikidata (English)  148  39,605  137  76,034  
DBPYG  DBpedia (English)  219  33,571  206  71,257 
YAGO3 (English)  30  34,660  30  97,131  
ENFR  DBpedia (English)  230  35,139  218  71,587 
DBpedia (French)  181  32,827  171  66,283  
ENDE  DBpedia (English)  225  38,281  207  56,983 
DBpedia (German)  118  37,069  117  59,848  
We also extracted attribute triple of the sampled entities from original KGs. 
Although the datasets used by existing methods [Chen et al.2017, Sun, Hu, and Li2017, Sun et al.2018] are all sampled from realworld KGs, such as DBpedia and Wikidata, their density and degree distributions are quite different from the original ones. We argue that this status may prevent us from a comprehensive and accurate understanding of embeddingbased EA. In this paper, we propose a segmentbased random PageRank (SRP) sampling method, which can fluently control the density of sampled datasets.
Random PageRank sampling is an efficient algorithm for large graph sampling [Leskovec and Faloutsos2006]. It samples nodes according to the PageRank weights and can assign higher biases to more valuable entities. However, due to the characteristic of PageRank, it also favors highdegree nodes. To fulfill our requirements on KG sampling, we divided the entities in a KG into segments according to their degrees and performed sampling separately. To guarantee the distributions of sampled datasets following the original KGs, we used KolmogorovSmirnov (KS) test to measure the difference. We set our expectation to for all the datasets.
Based on the above sampling method, we obtained four couples of datasets to evaluate the performance of the embeddingbased EA methods. The detailed statistics are shown in Table 1. Each dataset contains nearly 15,000 entities. For the normal datasets, they follow the density of the original KGs. For the dense datasets, we randomly deleted entities with low degrees in the original KGs to make the average degree doubled, and then conducted sampling. Therefore, the dense datasets are more similar to the datasets used by the existing methods [Chen et al.2017, Sun, Hu, and Li2017, Sun et al.2018]. Figure 2 shows the degree distributions of source KGs and the sampled datasets from different methods. We can see that our normal datasets best represent the original KGs.
Methods  DBPWD (normal)  DBPWD (dense)  DBPYG (normal)  DBPYG (dense)  

Hits@1  Hits@10  MRR  Hits@1  Hits@10  MRR  Hits@1  Hits@10  MRR  Hits@1  Hits@10  MRR  
MTransE  22.3  50.1  0.32  38.9  68.7  0.49  24.6  54.0  0.34  22.8  51.3  0.32 
IPTransE  23.1  51.7  0.33  43.5  74.5  0.54  22.7  50.0  0.32  23.6  51.3  0.33 
JAPE  21.9  50.1  0.31  39.3  70.5  0.50  23.3  52.7  0.33  26.8  57.3  0.37 
KDCoE  24.6  51.5  0.34  56.5  83.1  0.65  22.7  47.0  0.31  56.8  80.4  0.64 
BootEA  32.3  63.1  0.42  67.8  91.2  0.76  31.3  62.5  0.42  68.2  89.8  0.76 
RSN4EA  38.8  65.7  0.49  76.3  92.4  0.83  40.0  67.5  0.50  82.6  95.8  0.87 
The best results are marked in bold. The same to the following. 
Methods  ENFR (normal)  ENFR (dense)  ENDE (normal)  ENDE (dense)  

Hits@1  Hits@10  MRR  Hits@1  Hits@10  MRR  Hits@1  Hits@10  MRR  Hits@1  Hits@10  MRR  
MTransE  25.1  55.1  0.35  37.7  70.0  0.49  31.2  58.6  0.40  34.7  62.0  0.44 
IPTransE  25.5  55.7  0.36  42.9  78.3  0.55  31.3  59.2  0.41  34.0  63.2  0.44 
JAPE  25.6  56.2  0.36  40.7  72.7  0.52  32.0  59.9  0.41  37.5  66.1  0.47 
KDCoE  22.1  47.4  0.33  54.5  85.1  0.65  34.1  56.9  0.42  58.7  79.9  0.66 
BootEA  31.3  62.9  0.42  64.8  91.9  0.74  44.2  70.1  0.53  66.5  87.1  0.73 
RSN4EA  34.7  63.1  0.44  75.6  92.5  0.82  48.7  72.0  0.57  73.9  89.0  0.79 
Implementation Details
We built RSN4EA using TensorFlow. The embeddings and weight matrices were initialized with Xavier initializer, and the embedding size was set to 256. We used twolayer LSTM
[Hochreiter and Schmidhuber1997] with Dropout [Srivastava et al.2014], and conducted batch normalization
[Ioffe and Szegedy2015] for both input and output of an RSN. We used Adam optimizer [Kingma and Ba2015]with minibatch size 512 and learning rate 0.003. We trained an RSN for up to 30 epochs. The random walk biases were set to
, and the walk length was set to 15. The source code, datasets and results will be available online.For the comparative methods, we used the source code provided in their papers except KDCoE, since KDCoE has not released its source code yet. We implemented KDCoE by ourselves. We tried our best effort to adjust the hyperparameters to make the performance optimal. Following the previous work [Sun, Hu, and Li2017, Sun et al.2018]
, we used 30% of reference alignment as prior alignment and chose Hits@1, Hits@10 and mean reciprocal rank (MRR) as evaluation metrics.
Results on Entity Alignment
Tables 2 and 3 depict the EA results on monolingual and crosslingual datasets, respectively. It is evident that capturing longterm dependencies by paths enables RSN4EA to outperform the existing EA methods.
Generally, the heterogeneity of different KGs is more severe than a KG with different languages. A key module for embeddingbased EA methods is to embed the information of entities in different KGs into a unified space. Thus, aligning entities in different KGs is more difficult for embeddingbased EA methods. With the help of establishing longterm dependencies, RSN4EA captured richer information of KGs and learned more accurate embeddings, leading to more significant improvement on the more heterogenous datasets (DBPWD and DBPYG).
The two tables also demonstrate that the embeddingbased EA methods are sensitive to the density. The performance of all the methods on the normal datasets is significantly lower than that on the dense datasets. Although the normal datasets are more difficult, RSN4EA still showed considerable advantages compared with the other methods, since it used long paths to capture implicit connections among entities and represented them in the embeddings.
It is worth noting that RSN4EA showed larger superiority in terms of Hits@1 and MRR. This is due to the fact that Hits@1 only considers the completely correct results, and MRR also favors topranked results. As aforementioned, RSN4EA embedded the longterm dependencies into the learned embeddings, which contains richer information to help identify aligned entities in different KGs. The better performance on these two metrics verified this point.
Results on KG Completion
Methods  Hits@1  Hits@10  MRR 

TransE [Bordes et al.2013]  13.3  40.9  0.22 
TransR [Lin et al.2015a]  10.9  38.2  0.20 
ComplEx [Trouillon et al.2016]  15.2  41.9  0.24 
NeuralLP [Yang, Yang, and Cohen2017]  –  36.2  0.24 
ConvE [Dettmers et al.2018]  23.9  49.1  0.31 
RSN4EA (w/o crossKG bias)  20.0  43.6  0.28 
“” denotes the methods executed by ourselves using the provided source  
code, due to some metrics were not used in literature.  
“–” denotes the unknown results, due to we cannot obtain the source code. 
Since RSN4EA can train KG embeddings for EA, it is also interesting to apply RSN4EA to KG completion [Bordes et al.2013], which is one of the most prevalent task for KG representation learning. To achieve this, we removed the crossKG bias during the random walk sampling and conducted the KG completion experiment. Specifically, for a triple , KG completion aims to predict the object entity given or predict the subject entity given .
FB15K and WN18 are the most widelyused benchmark datasets for KG completion [Bordes et al.2013]. However, recent studies [Toutanova and Chen2015, Dettmers et al.2018] exposed that these two datasets have the problem of leaking testing data. To solve this issue, a new dataset called FB15K237 was recommended, and we used this dataset to assess RSN4EA in our experiments.
The experimental results are shown in Table 4. ConvE—a method tailored to KG completion—obtained the best results on FB15K237, followed by our RSN4EA. It is worth noting that, while predicting the entities given one triple is not the primary goal of RSN4EA, it still achieved comparable or better performance than many methods focusing on KG completion, which indicated the potential of leveraging KG paths for learning embeddings.
Further Analysis
Comparison with Alternative Networks
To assess the feasibility of RSN, we conducted experiments to compare it with RNN and RRN. Both RNN and RRN were implemented using the same multilayer LSTM units, Dropout and batch normalization.
The comparison results are shown in Figure 3. Since RNN and RRN did not consider the structure of KG paths, they converged the embedding learning at a very slow speed. Compared with RNN, RSN achieved better performance with only time cost, which indicated that this particular residual structure is essential for RSN4EA. Furthermore, RRN is a generic network involving residual learning in the conventional RNN. But it only achieved little improvement compared with RNN. This implied that simply combining residual learning with RNN cannot significantly help KG sequence modeling.
Sensitivity to Proportion of Prior Alignment
The proportion of prior alignment may significantly influence the performance of embeddingbased EA methods. However, we may not obtain a large number of prior alignment in practice. We tested the performance of RSN4EA and BootEA (the second best method in our previous experiments) in terms of the proportion of prior alignment from 50% to 10% with step 10%.
Due to space limitation, we only depicted the results on the DBPWD dataset in Figure 4. The performance of the two methods continually dropped with the decreasing proportion of prior alignment. However, the curves of RSN4EA are gentler than BootEA. Specifically, on the normal dataset, for the four proportion intervals, RSN4EA lost 7.4%, 8.2%, 16.5% and 30.2% on Hits@1 respectively, while BootEA lost 11.8%, 12.0%, 22.3% and 49.8% respectively, which demonstrated that RSN4EA is a more stable method. Additionally, when the proportion was down to 10%, the Hits@1 result of RSN4EA on the normal dataset was almost twice higher than that of BootEA, which indicated that modeling paths helps RSN4EA propagate the identity information across KGs more effectively and alleviates the dependence on the proportion of prior alignment.
Sensitivity to Random Walk Length
We also observed how the random walk length affected the EA performance. As shown in Figure 5, on all the eight datasets, the Hits@1 results increased sharply during length 5 to 15, which indicates that modeling longer paths can help learn KG embeddings and obtain better performance. Furthermore, we observed that the performance approached to saturation for length 15 to 25. Therefore, in consideration of the efficiency, the results reported in Tables 2 and 3 are based on length 15.
Conclusion and Future Work
In this paper, we proposed RSN4EA, which employs biased random walks to sample paths specific to EA, and leverages RSN for learning KG embeddings. Our experimental results showed that RSN4EA not only outperformed the existing embeddingbased EA methods, but also achieved superior performance compared with RNN and RRN. It also worked well for KG completion.
In future work, we plan to continue exploring KG sequence learning. First, KGs often contain rich textual information like names and descriptions. Such information can be modeled with character/wordlevel sequential models. RSN is capable of modeling KGs in a sequential manner, therefore it is worth studying a unified sequential model to learn KG embeddings using all valuable information. Second, in addition to paths, the neighboring information provides another type of context and may be also helpful for learning KG embeddings. We look forward to integrating the neighboring context to further improve the performance.
References
 [Bordes et al.2013] Bordes, A.; Usunier, N.; GarciaDurán, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multirelational data. In Proc. of the 26th International Conference on Neural Information Processing Systems, 2787–2795.

[Chen et al.2017]
Chen, M.; Tian, Y.; Yang, M.; and Zaniolo, C.
2017.
Multilingual knowledge graph embeddings for crosslingual knowledge
alignment.
In
Proc. of the 26th International Joint Conference on Artificial Intelligence
, 1511–1517.  [Chen et al.2018] Chen, M.; Tian, Y.; Chang, K.W.; Skiena, S.; and Zaniolo, C. 2018. Cotraining embeddings of knowledge graphs and entity descriptions for crosslingual entity alignment. In Proc. of the 27th International Joint Conference on Artificial Intelligence, 3998–4004.
 [Dettmers et al.2018] Dettmers, T.; Minervini, P.; Stenetorp, P.; and Riedel, S. 2018. Convolutional 2D knowledge graph embeddings. In Proc. of the 32nd AAAI Conference on Artificial Intelligence, 1811–1818.
 [Grover and Leskovec2016] Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–864.
 [Gutmann and Hyvärinen2010] Gutmann, M., and Hyvärinen, A. 2010. Noisecontrastive estimation: A new estimation principle for unnormalized statistical models. In Proc. of the 13th International Conference on Artificial Intelligence and Statistics, 297–304.

[He et al.2016]
He, K.; Zhang, X.; Ren, S.; and Sun, J.
2016.
Deep residual learning for image recognition.
In
Proc. of the 29th IEEE Conference on Computer Vision and Pattern Recognition
, 770–778.  [Hochreiter and Schmidhuber1997] Hochreiter, S., and Schmidhuber, J. 1997. Long shortterm memory. Neural Computation 9(8):1735–1780.

[Ioffe and Szegedy2015]
Ioffe, S., and Szegedy, C.
2015.
Batch normalization: Accelerating deep network training by reducing
internal covariate shift.
In
Proc. of the 32nd International Conference on Machine Learning
, 448–456.  [Kingma and Ba2015] Kingma, D. P., and Ba, J. 2015. Adam: A method for stochastic optimization. In Proc. of the 3rd International Conference on Learning Representations.
 [Leskovec and Faloutsos2006] Leskovec, J., and Faloutsos, C. 2006. Sampling from large graphs. In Proc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 631–636.
 [Lin et al.2015a] Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; and Zhu, X. 2015a. Learning entity and relation embeddings for knowledge graph completion. In Proc. of the 29th AAAI Conference on Artificial Intelligence, 2181–2187.
 [Lin et al.2015b] Lin, Y.; Liu, Z.; Luan, H.B.; Sun, M.; Rao, S.; and Liu, S. 2015b. Modeling relation paths for representation learning of knowledge bases. In Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics, 705–714.
 [Mikolov, Yih, and Zweig2013] Mikolov, T.; Yih, W.; and Zweig, G. 2013. Linguistic regularities in continuous space word representations. In Proc. of the 11th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–751.
 [Perozzi, AlRfou, and Skiena2014] Perozzi, B.; AlRfou, R.; and Skiena, S. 2014. DeepWalk: Online learning of social representations. In Proc. of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 701–710.
 [Srivastava et al.2014] Srivastava, N.; Hinton, G. E.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15(1):1929–1958.
 [Sun et al.2018] Sun, Z.; Hu, W.; Zhang, Q.; and Qu, Y. 2018. Bootstrapping entity alignment with knowledge graph embedding. In Proc. of the 27th International Joint Conference on Artificial Intelligence, 4396–4402.
 [Sun, Hu, and Li2017] Sun, Z.; Hu, W.; and Li, C. 2017. Crosslingual entity alignment via joint attributepreserving embedding. In Proc. of the 16th International Semantic Web Conference, 628–644.

[Toutanova and Chen2015]
Toutanova, K., and Chen, D.
2015.
Observed versus latent features for knowledge base and text
inference.
In
Proc. of the 3rd Workshop on Continuous Vector Space Models and their Compositionality
, 57–66. Beijing, China: ACL.  [Trouillon et al.2016] Trouillon, T.; Welbl, J.; Riedel, S.; Éric Gaussier; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In Proc. of the 33rd International Conference on Machine Learning, 2071–2080.
 [Wang and Tian2016] Wang, Y., and Tian, F. 2016. Recurrent residual learning for sequence classification. In Proc. of the 13th Conference on Empirical Methods in Natural Language Processing, 938–943.
 [Wang et al.2017] Wang, Q.; Mao, Z.; Wang, B.; and Guo, L. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29(12):2724–2743.
 [Yang, Yang, and Cohen2017] Yang, F.; Yang, Z.; and Cohen, W. W. 2017. Differentiable learning of logical rules for knowledge base reasoning. In Proc. of the 30th International Conference on Neural Information Processing Systems, 2316–2325.
 [Zhu et al.2017] Zhu, H.; Xie, R.; Liu, Z.; and Sun, M. 2017. Iterative entity alignment via joint knowledge embeddings. In Proc. of the 26th International Joint Conference on Artificial Intelligence, 4258–4264.