1 Introduction
Largescale knowledge bases (KBs), such as Freebase [Bollacker et al.2008], WordNet [Miller1995], Yago [Suchanek et al.2007], and NELL [Carlson et al.2010]
, are critical to natural language processing applications, e.g., question answering
[Dong et al.2015], relation extraction [Riedel et al.2013], and language modeling [Ahn et al.2016]. These KBs generally contain billions of facts, and each fact is organized into a triple base format (head entity, relation, tail entity), abbreviated as (h,r,t). However, the coverage of such KBs is still far from complete compared with realworld knowledge [Dong et al.2014]. Traditional KB completion approaches, such as Markov logic networks [Richardson and Domingos2006], suffer from feature sparsity and low efficiency.Recently, encoding the entire knowledge base into a lowdimensional vector space to learn latent representations of entity and relation has attracted widespread attention. These knowledge embedding models yield better performance in terms of low complexity and high scalability compared with previous works. Among these methods, TransE
[Bordes et al.2013] is a classical neuralbased model, which assumes that each relation can be regarded as a translation from head to tail and uses a score function S(h,r,t)= to measure the plausibility for triples. TransH [Wang et al.2014] and TransR [Lin et al.2015b] are representative variants of TransE. These variants consider entities from multiple aspects and various relations on different aspects.However, the majority of these approaches only exploit direct links that connect head and tail entities to predict potential relations between entities. These approaches do not explore the fact that relation paths, which are denoted as the sequences of relations, i.e., p=(r, r, , r), play an important role in knowledge base completion. For example, the sequence of triples (J.K. Rowling, CreatedRole, Harry Potter), (Harry Potter, Describedin, Harry Potter and the Philosopher’s Stone) can be used to infer the new fact (J.K. Rowling, WroteBook, Harry Potter and the Philosopher’s Stone), which does not appear in the original KBs. Consequently, a promising new research direction is to use relation paths to learn knowledge embeddings [Neelakantan et al.2015, Guu et al.2015, Toutanova et al.2016].
For a relation path, consistent semantics is a semantic interpretation via composition of the meaning of the component elements. Each relation path contains its respective consistent semantics. However, the consistent semantics expressed by some relation paths p is unreliable for reasoning new facts of that entity pair [Lin et al.2015a]. For instance, there is a common relation path , but this path is meaningless for inferring additional relationships between h and t. Therefore, reliable relation paths are urgently needed. Moreover, their consistent semantics, which is essential for knowledge representation learning, is consistent with the semantics of relation r. Based on this intuition, we propose a compositional learning model of relation path embedding (RPE), which extends the projection and type constraints of the specific relation to the specific path. As the path ranking algorithm (PRA) [Lao et al.2011] suggests, relation paths that end in many possible tail entities are more likely to be unreliable for the entity pair. Reliable relation paths can thus be filtered using PRA. Figure 1 illustrates the basic idea for relationspecific and pathspecific projections. Each entity is projected by M and M into the corresponding relation and path spaces. These different embedding spaces hold the following hypothesis: in the relationspecific space, relation r is regarded as a translation from head h to tail t; likewise, p, the path representation by the composition of relation embeddings, is regarded as a translation from head h to tail t in the pathspecific space. We design two types of compositions to dynamically construct the pathspecific projection M without extra parameters. Moreover, with slight changes on negative sampling, we also propose that relationspecific and pathspecific type constraints can be seamlessly incorporated into our model.
Our main contributions are as follows:
1) To reinforce the reasoning ability of knowledge embedding models, the consistent semantics and the path spaces are introduced.
2) The pathspecific type constraints generated from path space can help to improve the model’s discriminability.
3) Compared with the pure datadriven mechanism in the knowledge embedding models used, the way in which we utilize PRA to find reliable relation paths improves the knowledge representation learning interpretability.
The remainder of this paper is organized as follows. We first provide a brief review of related knowledge embedding models in Section 2. The details of RPE are introduced in Section 3. The experiments and analysis are reported in Section 4. Conclusions and directions for future work are reported in the final section.
2 Related Work
We first concentrate on three classical translationbased models that only consider direct links between entities. The bold lowercase letter v denotes a column vector, and the bold uppercase letter M denotes a matrix. The first translationbased model is TransE, and it defines the score function as S(h,r,t)= for each triple (h,r,t). The score will become smaller if the triple (h,r,t
) is correct; otherwise, the score will become higher. The embeddings are learned by optimizing a global marginloss function. This assumption is clearly simple and cannot address more complex relation attributes well, i.e., 1toN, Nto1, and NtoN. To alleviate this problem, TransH projects entities into a relationdependent hyperplane by the normal vector
w: h=hwhw and t=twtw (restrict w=1). The corresponding score function is S(h,r,t)=. TransE and TransH achieve translations on the same embedding space, whereas TransR assumes that each relation should be used to project entities into different relationspecific embedding spaces since different relations may place emphasis on different entity aspects. The projected entity vectors are h=Mh and t=Mt; thus, the new score function is defined as S(h,r,t)=.Another research direction focuses on improving the prediction performance by using prior knowledge in the form of relationspecific type constraints [Krompass et al.2015, Chang et al.2014, Wang et al.2015]. Note that each relation should possess Domain and Range fields to indicate the subject and object types, respectively. For example, the relation haschildren’s Domain and Range types both belong to a person. By exploiting these limited rules, the harmful influence of a merely datadriven pattern can be avoided. Typeconstrained TransE [Krompass et al.2015] imposes these constraints on the global marginloss function to better distinguish similar embeddings in latent space.
A third current related work is PTransE [Lin et al.2015a] and the path ranking algorithm (PRA) [Lao et al.2011]. PTransE considers relation paths as translations between head and tail entities and primarily addresses two problems: 1) exploit a variant of PRA to select reliable relation paths, and 2) explore three path representations by compositions of relation embeddings. PRA, as one of the most promising research innovations for knowledge base completion, has also attracted considerable attention [Lao et al.2015, Gardner and Mitchell2015, Wang et al.2016, Nickel et al.2016]
. PRA uses the pathconstrained random walk probabilities as path features to train linear classifiers for different relations. In largescale KBs, relation paths have great significance for enhancing the reasoning ability for more complicated situations. However, none of the aforementioned models take full advantage of the consistent semantics of relation paths.
3 Our Model
The consistent semantics expressed by reliable relation paths has a
significant impact on learning meaningful embeddings. Here, we
propose a compositional learning model of relation path embedding
(RPE), which includes novel pathspecific projection and type
constraints. All entities constitute
the entity set , and all relations constitute the relation set
R. RPE uses PRA to select reliable relation paths. Precisely,
for a triple (h,r,t),
P={p,p,,p} is the path set
for the entity pair (h,t). PRA calculates
P(th, p), the probability of
reaching t from h following the sequence of relations
indicated by p, which can be recursively defined as follows:
If p is an empty path,
(1) 
If p is not an empty path, then p is defined as r,,r; subsequently,
(2) 
Ran(p) is the set of ending nodes of p. RPE obtains the reliable relation paths set P={p,p,,p} by selecting relation paths whose probabilities are above a certain threshold .
3.1 Pathspecific Projection
The key idea of RPE is that the consistent semantics of reliable relation paths is similar to the semantics of the relation between an entity pair. For a triple (h,r,t), RPE exploits projection matrices M, M to project entity vectors h, t in entity space into the corresponding relation and path spaces simultaneously (m is the dimension of relation embeddings, n is the dimension of entity embeddings, and m may differ from n). The projected vectors (h, h, t, t) in their respective embedding spaces are denoted as follows:
(3) 
(4) 
Because relation paths are those sequences of relations p=(r, r, , r), we dynamically use M to construct M to decrease the model complexity. Subsequently, we explore two compositions for the formation of M, which are formulated as follows:
(5) 
(6) 
where addition composition (ACOM) and multiplication composition (MCOM) represent cumulative addition and multiplication for path projection. Matrix normalization is applied on M for both compositions. The new score function is defined as follows:
(7) 
For path representation p, we use p=r+r+…+r, as suggested by PTransE. is the hyperparameter used to balance the knowledge embedding score S(h,r,t) and the relation path embedding score S(h,p,t). Z=P(th, p) is the normalization factor, and P(rp) = P(r, p) / P(p) is used to assist in calculating the reliability of relation paths. In the experiments, we increase the limitation on these embeddings, i.e., 1, 1, 1, 1, 1, 1, and 1. By exploiting the consistent semantics of reliable relation paths, RPE embeds entities into the relation and path spaces simultaneously. This method improves the flexibility of RPE when modeling more complicated relation attributes.
3.2 Pathspecific Type Constraints
In RPE, based on the semantic similarity between relations and reliable relation paths, we can naturally extend the relationspecific type constraints to novel pathspecific type constraints. In typeconstrained TransE, the distribution of corrupted triples is a uniform distribution.
In our model, we borrow the idea from TransH, incorporating the two type constraints with a Bernoulli distribution. For each relation
r, we denote the Domain and Range to indicate the subject and object types of relation r. is the entity set whose entities conform to Domain, and is the entity set whose entities conform to Range. We calculate the average numbers of tail entities for each head entity, named teh, and the average numbers of head entities for each tail entity, named het. The Bernoulli distribution with parameter for each relation r is incorporated with the two type constraints, which can be defined as follows: RPE samples entities from to replace the head entity with probability , and it samples entities from to replace the tail entity with probability . The objective function for RPE is defined as follows:(8) 
L(h,r,t) is the loss function for triples, and L(h,,t) is the loss function for relation paths.
(9) 
(10) 
We denote C={(h,r,t) i=1,2,t} as the set of all observed triples and C={(h,r,t) (h,r,t) i=1,2,t} as the set of corrupted triples, where each element of C is obtained by randomly sampling from . C, whose element conforms to the two type constraints with a Bernoulli distribution, is a subset of C. The Max(0, x) returns the maximum between 0 and x. is the hyperparameter of margin, which separates corrected triples and corrupted triples. By exploiting these two prior knowledges, RPE could better distinguish similar embeddings in different embedding spaces, thus allowing it to achieve better prediction.
3.3 Training Details
We adopt stochastic gradient descent (SGD) to minimize the objective function. TransE or RPE (initial) can be exploited for the initializations of all entities and relations. The score function of RPE (initial) is as follows:
(11) 
We also employ this score function in our experiment as a baseline. The projection matrices Ms are initialized as identity matrices. RPE holds the local closedworld assumption (LCWA) [Dong et al.2014], where each relation’s domain and range types are based on the instance level. Their type information is provided by KBs or the entities shown in observed triples.
Note that each relation r has the reverse relation ; therefore, to increase supplemental path information, RPE utilizes the reverse relation paths. For example, for the relation path , its reverse relation path can be defined as .
For each iteration, we randomly sample a correct triple (h,r,t) with its reverse (t,r,h), and the final score function of our model is defined as follows:
(12) 
Theoretically, we can arbitrarily set the length of the relation path, but in the implementation, we prefer to take a smaller value to reduce the time required to enumerate all relation paths. Moreover, as suggested by the pathconstrained random walk probability P(th, p), as the path length increases, P(th, p) will become smaller and the relation path will more likely be cast off.
4 Experiments
To examine the retrieval and discriminative ability of our model, RPE is evaluated on two standard subtasks of knowledge base completion: link prediction and triple classification. We also present a case study on consistent semantics learned by our method to further highlight the importance of relation paths for knowledge representation learning.
4.1 Datasets
We evaluate our model on two classical largescale knowledge bases: Freebase and WordNet. Freebase is a large collaborative knowledge base that contains billions of facts about the real world, such as the triple (Beijing, Locatedin, China), which describes the fact that Beijing is located in China. WordNet is a large lexical knowledge base of English, in which each entity is a synset that expresses a distinct concept, and each relationship is a conceptualsemantic or lexical relation. We use two subsets of Freebase, FB15K and FB13 [Bordes et al.2013], and one subset of WordNet, WN11 [Socher et al.2013]. Table 1 presents the statistics of the datasets, where each column represents the numbers of entity type, relation type and triples that have been split into training, validation and test sets.
Dataset  #Ent  #Rel  #Train  #Valid  #Test 

FB15K  14591  1345  483142  50000  59071 
FB13  75043  13  316232  5908  23733 
WN11  38696  11  112581  2609  10544 
Metric  Mean Rank  Hits@10(%)  
Raw  Filter  Raw  Filter  
TransE [Bordes et al.2013]  243  125  34.9  47.1 
TransH (unif) [Wang et al.2014]  211  84  42.5  58.5 
TransH (bern) [Wang et al.2014]  212  87  45.7  64.4 
TransR (unif) [Lin et al.2015b]  226  78  43.8  65.5 
TransR (bern) [Lin et al.2015b]  198  77  48.2  68.7 
PTransE (ADD, 2hop) [Lin et al.2015a]  200  54  51.8  83.4 
PTransE (MUL, 2hop) [Lin et al.2015a]  216  67  47.4  77.7 
PTransE (ADD, 3hop) [Lin et al.2015a]  207  58  51.4  84.6 
TranSparse (separate, S, unif) [Ji et al.2016]  211  63  50.1  77.9 
TranSparse (separate, S, bern) [Ji et al.2016]  187  82  53.3  79.5 
RPE (initial)  207  58  50.8  82.2 
RPE (PC)  196  77  49.1  72.6 
RPE (ACOM)  171  41  52.0  85.5 
RPE (MCOM)  183  43  52.2  81.7 
RPE (PC + ACOM)  184  42  51.1  84.2 
RPE (PC + MCOM)  186  43  51.7  76.5 
In our model, each triple has its own reverse triple for increasing the reverse relation paths. Therefore, the total number of triples is twice the number used in the original datasets. Our model exploits the LCWA. In this case, we utilize the type information provided by [Xie et al.2016] for FB15K. As for FB13 and WN11, we do not depend on the auxiliary data, and the domain and range of each relation are approximated by the triples from the original datasets.
Tasks  Predicting Head Entities (Hits@10)  Predicting Tail Entities (Hits@10)  

Relation Category  1to1  1toN  Nto1  NtoN  1to1  1toN  Nto1  NtoN 
TransE [Bordes et al.2013]  43.7  65.7  18.2  47.2  43.7  19.7  66.7  50.0 
TransH (unif) [Wang et al.2014]  66.7  81.7  30.2  57.4  63.7  30.1  83.2  60.8 
TransH (bern) [Wang et al.2014]  66.8  87.6  28.7  64.5  65.5  39.8  83.3  67.2 
TransR (unif) [Lin et al.2015b]  76.9  77.9  38.1  66.9  76.2  38.4  76.2  69.1 
TransR (bern) [Lin et al.2015b]  78.8  89.2  34.1  69.2  79.2  37.4  90.4  72.1 
PTransE (ADD, 2hop) [Lin et al.2015a]  91.0  92.8  60.9  83.8  91.2  74.0  88.9  86.4 
PTransE (MUL, 2hop) [Lin et al.2015a]  89.0  86.8  57.6  79.8  87.8  71.4  72.2  80.4 
PTransE (ADD, 3hop) [Lin et al.2015a]  90.1  92.0  58.7  86.1  90.7  70.7  87.5  88.7 
TranSparse (separate, S, unif) [Ji et al.2016]  82.3  85.2  51.3  79.6  82.3  59.8  84.9  82.1 
TranSparse (separate, S, bern) [Ji et al.2016]  86.8  95.5  44.3  80.9  86.6  56.6  94.4  83.3 
RPE (initial)  83.9  93.6  60.1  78.2  82.2  66.8  92.2  80.6 
RPE (PC)  82.6  92.7  44.0  71.2  82.6  64.6  81.2  75.8 
RPE (ACOM)  92.5  96.6  63.7  87.9  92.5  79.1  95.1  90.8 
RPE (MCOM)  91.2  95.8  55.4  87.2  91.2  66.3  94.2  89.9 
RPE (PC + ACOM)  89.5  94.3  63.2  84.2  89.1  77.0  89.7  87.6 
RPE (PC + MCOM)  89.3  95.6  45.2  84.2  89.7  62.8  94.1  87.7 
4.2 Link Prediction
The link prediction task consists of predicting the possible h or t for test triples when h or t is missed. FB15K is employed for this task.
4.2.1 Evaluation Protocol
We follow the same evaluation procedures as used in [Bordes et al.2013, Wang et al.2014, Lin et al.2015b]. First, for each test triple (h,r,t), we replace h or t with every entity in . Second, each corrupted triple is calculated by the corresponding score function S(h,r,t). The final step is to rank the original correct entity with these scores in descending order.
Two evaluation metrics are reported: the average rank of correct entities (Mean Rank) and the proportion of correct entities ranked in the top 10 (Hits@10). Note that if a corrupted triple already exists in the knowledge base, then it should not be considered to be incorrect. We prefer to remove these corrupted triples from our dataset, and call this setting ”Filter”. If these corrupted triples are reserved, then we call this setting ”Raw”. In both settings, if the latent representations of entity and relation are better, then a lower mean rank and a higher Hits@10 should be achieved. Because we use the same dataset, the baseline results reported in
[Lin et al.2015b, Lin et al.2015a, Ji et al.2016] are directly used for comparison.4.2.2 Implementation
We set the dimension of entity embedding m and relation embedding n among {20, 50, 100, 120}, the margin among {1, 2, 3, 4, 5}, the margin among {3, 4, 5, 6, 7, 8}, the learning rate for SGD among {0.01, 0.005, 0.0025, 0.001, 0.0001}, the batch size B among {20, 120, 480, 960, 1440, 4800}, and the balance factor among {0.5, 0.8, 1,1.5, 2}. The threshold was set in the range of {0.01, 0.02, 0.04, 0.05} to reduce the calculation of meaningless paths.
Grid search is used to determine the optimal parameters. The best configurations for RPE (ACOM) are n=100, m=100, =2, =5, =0.0001, B=4800, =1, and =0.05. We select RPE (initial) to initialize our model, set the path length as 2, take L
norm for the score function, and traverse our model for 500 epochs.
4.2.3 Analysis of Results
Table 2 reports the results of link prediction, in which the first column is translationbased models, the variants of PTransE, and our models. The numbers in bold are the best performance, and nhop indicates the path length n that PTransE exploits. We denote RPE only with pathspecific constraints as RPE (PC), and from the results, we can observe the following. 1) Our models significantly outperform the classical knowledge embedding models (TransE, TransH, TransR, and TranSparse) and PTransE on FB15K with the metrics of mean rank and Hits@10 (filter). The results demonstrate that the pathspecific projection can explore further implications on relation paths, which are crucial for knowledge base completion. 2) RPE (PC) improves slightly compared with the baselines. We believe that this result is primarily because RPE (PC) only focuses on local information provided by related embeddings, ignoring some global information compared with the approach of randomly selecting corrupted entities. In terms of mean rank, RPE (ACOM) achieves the best performance with 14.5% and 24.1% error reduction compared with PTransE’s performance in the raw and filter settings, respectively. In terms of Hits@10, RPE (ACOM) brings few improvements. RPE with pathspecific type constraints and projection (RPE (PC + ACOM) and RPE (PC + MCOM)) is a compromise between RPE (PC) and RPE (ACOM).
Table 3 presents the separated evaluation results by mapping properties of relations on FB15K. Mapping properties of relations follows the same rules in [Bordes et al.2013], and the metrics are Hits@10 on head and tail entities. From Table 3, we can conclude that 1) RPE (ACOM) outperforms all baselines in all mapping properties of relations. In particular, for the 1toN, Nto1, and NtoN types of relations that plague knowledge embedding models, RPE (ACOM) improves 4.1%, 4.6%, and 4.9% on head entity’s prediction and 6.9%, 7.0%, and 5.1% on tail entity’s prediction compared with previous stateoftheart performances achieved by PTransE (ADD, 2hop). 2) RPE (MCOM) does not perform as well as RPE (ACOM), and we believe that this result is because RPE’s path representation is not consistent with RPE (MCOM)’s composition of projections. Although RPE (PC) improves little compared with PTransE, we will indicate the effectiveness of relationspecific and pathspecific type constraints in triple classification. 3) We use the relationspecific projection to construct pathspecific ones dynamically; then, entities are encoded into relationspecific and pathspecific spaces simultaneously. The experiments are similar to link prediction, and the results of experiments results further demonstrate the better expressibility of our model.
Datasets  WN11  FB13  FB15K 
TransE (unif) [Bordes et al.2013]  75.9  70.9  77.8 
TransE (bern) [Bordes et al.2013]  75.9  81.5  85.3 
TransH (unif) [Wang et al.2014]  77.7  76.5  78.4 
TransH (bern) [Wang et al.2014]  78.8  83.3  85.8 
TransR (unif) [Lin et al.2015b]  85.5  74.7  79.2 
TransR (bern) [Lin et al.2015b]  85.9  82.5  87.0 
PTransE (ADD, 2hop) [Lin et al.2015a]  80.9  73.5  83.4 
PTransE (MUL, 2hop) [Lin et al.2015a]  79.4  73.6  79.3 
PTransE (ADD, 3hop) [Lin et al.2015a]  80.7  73.3  82.9 
RPE (initial)  80.2  73.0  68.8 
RPE (PC)  83.8  77.4  77.9 
RPE (ACOM)  84.7  80.9  85.4 
RPE (MCOM)  83.6  76.2  85.1 
RPE (PC + ACOM)  86.8  84.3  89.8 
RPE (PC + MCOM)  85.7  83.0  87.5 
entity pair  (sociology, George Washington University) 

relation  /education/field_of_study/students_majoring./education/education/institution 
relation paths  a: /education/field_of_study/students_majoring./education/education/student 
/people/person/education./education/education/institution  
b: /people/person/education./education/education/major_field_of_study  
/education/educational_institution/students_graduates./education/education/student  
entity pair  (Planet of the Apes, art director) 
relation  /education/field_of_study/students_majoring./education/education/institution 
relation paths  a: /film/film/sequel /film/film_job/films_with_this_crew_job./film/film_crew_gig/film 
b: /film/film/prequel /film/film/other_crew./film/film_crew_gig/film_crew_role 
4.3 Triple Classification
We conduct the task of triple classification on three benchmark datasets: FB15K, FB13 and WN11. Triple classification aims to predict whether a given triple (h,r,t) is true, which is a binary classification problem.
4.3.1 Evaluation Protocol
We set different relationspecific thresholds {} to perform this task. For a test triple (h,r,t), if its score S(h,r,t) is below , then we predict it as a positive one; otherwise, it is negative. {} is obtained by maximizing the classification accuracies on the valid set.
4.3.2 Implementation
We directly compare our model with prior work using the results about knowledge embedding models reported in [Lin et al.2015b] for WN11 and FB13. Because [Lin et al.2015a] does not evaluate PTransE’s performance on this task, we use the code of PTransE that is released in [Lin et al.2015a] to complete it. FB13 and WN11 already contain negative samples. For FB15K, we use the same process to produce negative samples, as suggested by [Socher et al.2013]. The hyperparameter intervals are the same as link prediction. The best configurations for RPE (PC + ACOM) are as follows: n=50, m=50, =5, =6, =0.0001, B=1440, =0.8, and =0.05, taking the L norm on WN11; n=100, m=100, =3, =6, =0.0001, B=960, =0.8, and =0.05, taking the L norm on FB13; and n=100, m=100, =4, =5, =0.0001, B=4800, =1, and =0.05, taking the L norm on FB15K. We exploit RPE (initial) for initiation, and we set the path length as 2 and the maximum epoch as 500.
4.3.3 Analysis of Results
Table 4 lists the results for triple classification on different datasets, and the evaluation metric is classification accuracy. The results demonstrate that 1) RPE (PC + ACOM) achieves the best performance on all datasets, which takes good advantage of pathspecific projection and type constraints; 2) RPE (PC) improves the performance of RPE (initial) by 4.5%, 6.0%, and 13.2%, particularly on FB15K; thus, we consider that lengthening the distances for similar entities in embedding space is essential to specific problems. The results of experiments also indicate that although LCWA can compensate for the loss for type information, real relationtype information is predominant.
4.4 Case Study of Consistent Semantics
As shown in Table 5, for two entity pairs (sociology, George Washington University) and (Planet of the Apes, art director) from Freebase, RPE provides two relations and four most relevant relation paths (each relation is mapped to two relation paths, denoted as a and b), which are considered as having similar semantics to their respective relations. However, this type of consistent semantics of reliable relation paths cannot be achieved by translationbased models, such as Trans(E, H, R), because translationbased models only exploit the direct links and do not consider relation path information, such as reliable relation paths in line 3 and line 6 in Table 5. In contrast, RPE can obtain reliable relation paths with their consistent semantics, and it extends the projection and type constraints of the specific relation to the specific path. Furthermore, the experimental results demonstrate that by explicitly using the additional semantics, RPE consistently and significantly outperforms stateoftheart knowledge embedding models in the two benchmark tasks (link prediction and triple classification).
5 Conclusions and Future Work
In this paper, we propose a compositional learning model of relation path embedding (RPE) for knowledge base completion. To the best of our knowledge, this is the first time that a pathspecific projection has been proposed, and it simultaneously embeds each entity into relation and path spaces to learn more meaningful embeddings. Moreover, We also put forward the novel pathspecific type constraints based on relationspecific constraints to better distinguish similar embeddings in the latent space.
In the future, we plan to 1) incorporate other potential semantic information into the relation path modeling, such as the information provided by those intermediate nodes connected by relation paths, and 2) explore relation path embedding in other applications associated with knowledge bases, such as distant supervision for relation extraction and question answering over knowledge bases.
Acknowledgments
The authors are grateful for the support of the National Natural Science Foundation of China (No. 61572228, No. 61472158, No. 61300147, and No. 61602207), the Science Technology Development Project from Jilin Province (No. 20140520070JH and No. 20160101247JC), the PremierDiscipline Enhancement Scheme supported by Zhuhai Government and Premier KeyDiscipline Enhancement Scheme supported by Guangdong Government Funds, and the Smart Society Collaborative Project funded by the European Commission’s 7th Framework ICT Programme for Research and Technological Development under Grant Agreement No. 600854.
References
 [Ahn et al.2016] Sungjin Ahn, Heeyoul Choi, Tanel Pärnamaa, and Yoshua Bengio. A neural knowledge language model. CoRR, abs/1608.00318, 2016.
 [Bollacker et al.2008] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of KDD, pages 1247–1250, 2008.
 [Bordes et al.2013] Antoine Bordes, Nicolas Usunier, Alberto GarcíaDurán, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multirelational data. In Proceedings of NIPS, pages 2787–2795, 2013.
 [Carlson et al.2010] Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam Hruschka, and Tom Mitchell. Toward an architecture for neverending language learning. In Proceedings of AAAI, pages 1306–1313, 2010.

[Chang et al.2014]
KaiWei Chang, Wen tau Yih, Bishan Yang, and Christopher Meek.
Typed tensor decomposition of knowledge bases for relation extraction.
In Proceedings of EMNLP, pages 1568–1579, 2014.  [Dong et al.2014] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: a webscale approach to probabilistic knowledge fusion. In Proceedings of KDD, pages 601–610, 2014.

[Dong et al.2015]
Li Dong, Furu Wei, Ming Zhou, and Ke Xu.
Question answering over freebase with multicolumn convolutional neural networks.
In Proceedings of ACL, pages 260–269, 2015. 
[Gardner and
Mitchell2015]
Matthew Gardner and Tom Mitchell.
Efficient and expressive knowledge base completion using subgraph feature extraction.
In Proceedings of EMNLP, pages 1488–1498, 2015. 
[Guu et al.2015]
Kelvin Guu, John Miller, and Percy Liang.
Traversing knowledge graphs in vector space.
In Proceedings of EMNLP, pages 318–327, 2015.  [Ji et al.2016] Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. Knowledge graph completion with adaptive sparse transfer matrix. In Proceedings of AAAI, pages 985–991, 2016.
 [Krompass et al.2015] Denis Krompass, Stephan Baier, and Volker Tresp. Typeconstrained representation learning in knowledge graphs. In Proceedings of ISWC, pages 640–655, 2015.
 [Lao et al.2011] Ni Lao, Tom Mitchell, and William W. Cohen. Random walk inference and learning in a large scale knowledge base. In Proceedings of EMNLP, pages 529–539, 2011.
 [Lao et al.2015] Ni Lao, Einat Minkov, and William W. Cohen. Learning relational features with backward random walks. In Proceedings of ACL, pages 666–675, 2015.
 [Lin et al.2015a] Yankai Lin, Zhiyuan Liu, HuanBo Luan, Maosong Sun, Siwei Rao, and Song Liu. Modeling relation paths for representation learning of knowledge bases. In Proceedings of EMNLP, pages 705–714, 2015.
 [Lin et al.2015b] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, pages 2181–2187, 2015.
 [Miller1995] George A. Miller. Wordnet: a lexical database for english. Communications of the ACM, 38:39–41, 1995.
 [Neelakantan et al.2015] Arvind Neelakantan, Benjamin Roth, and Andrew McCallum. Compositional vector space models for knowledge base completion. In Proceedings of ACL, pages 156–166, 2015.

[Nickel et al.2016]
Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich.
A review of relational machine learning for knowledge graphs.
Proceedings of the IEEE, 104:11–33, 2016.  [Richardson and Domingos2006] Matthew Richardson and Pedro M. Domingos. Markov logic networks. Machine Learning, 62:107–136, 2006.
 [Riedel et al.2013] Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. Relation extraction with matrix factorization and universal schemas. In Proceedings of HLTNAACL, pages 74–84, 2013.
 [Socher et al.2013] Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. Reasoning with neural tensor networks for knowledge base completion. In Proceedings of NIPS, pages 926–934, 2013.
 [Suchanek et al.2007] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: a core of semantic knowledge. In Proceedings of WWW, pages 697–706, 2007.
 [Toutanova et al.2016] Kristina Toutanova, Victoria Lin, Wen tau Yih, Hoifung Poon, and Chris Quirk. Compositional learning of embeddings for relation paths in knowledge base and text. In Proceedings of ACL, pages 1434–1444, 2016.
 [Wang et al.2014] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In Proceedings of AAAI, pages 1112–1119, 2014.
 [Wang et al.2015] Quan Wang, Bin Wang, and Li Guo. Knowledge base completion using embeddings and rules. In Proceedings of IJCAI, pages 1859–1865, 2015.
 [Wang et al.2016] Quan Wang, Jing Liu, Yuanfei Luo, Bin Wang, and ChinYew Lin. Knowledge base completion via coupled path ranking. In Proceedings of ACL, pages 1308–1318, 2016.
 [Xie et al.2016] Ruobing Xie, Zhiyuan Liu, and Maosong Sun. Representation learning of knowledge graphs with hierarchical types. In Proceedings of IJCAI, 2016.
Comments
There are no comments yet.