Knowledge Graph Embedding with Linear Representation for Link Prediction

by   Yanhui Peng, et al.
Nanjing University

Knowledge graph (KG) embedding aims to represent entities and relations in KGs as vectors in a continuous vector space. If an embedding model can cover different types of connectivity patterns and mapping properties of relations as many as possible, it will potentially bring more benefits for applications. In this paper, we propose a novel embedding model, namely LineaRE, which is capable of modeling four connectivity patterns (symmetry, antisymmetry, inversion, and composition) and four mapping properties of relations (one-to-one, one-to-many, many-to-one, and many-to-many). Specifically, in our model, a relation is a linear function of two low-dimensional vector-presented entities with two weight vectors and a bias vector. Since the vectors are defined in a real number space and the scoring function of the model is linear, our model is simple and scalable to large KGs. Experimental results on four datasets show that the proposed LineaRE significantly outperforms existing state-of-the-art models for link prediction task.


page 1

page 2

page 3

page 4


Learning Hierarchy-Aware Knowledge Graph Embeddings for Link Prediction

Knowledge graph embedding, which aims to represent entities and relation...

Domain Representation for Knowledge Graph Embedding

Embedding entities and relations into a continuous multi-dimensional vec...

Differentiating Concepts and Instances for Knowledge Graph Embedding

Concepts, which represent a group of different instances sharing common ...

HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Recently, several Knowledge Graph Embedding (KGE) approaches have been d...

RatE: Relation-Adaptive Translating Embedding for Knowledge Graph Completion

Many graph embedding approaches have been proposed for knowledge graph c...

PairRE: Knowledge Graph Embeddings via Paired Relation Vectors

Distance based knowledge graph embedding methods show promising results ...

RotateQVS: Representing Temporal Information as Rotations in Quaternion Vector Space for Temporal Knowledge Graph Completion

Temporal factors are tied to the growth of facts in realistic applicatio...

1 Introduction

The construction and applications of knowledge graphs (KGs) have attracted much attention in recent years. Many KGs, such as WordNet [9], DBpedia [8], and Freebase [1], have been built and successfully applied to some AI domains, including information retrieval [16], recommender systems [19], question-answering [5, 6]

, and natural language processing 

[17]. A large KG stores billions of factual triplets in the form of directed graphs, where each triplet in the form of (head entity, relation, tail entity) (denoted by (, , ) in this paper) stands for an edge with two end nodes in the graph, indicating that there exists a specific relationship between the head and tail entities. On a graph with this kind of symbolic representation, algorithms that compute semantic relationships between entities usually have high computational complexity and lack scalability. Therefore, knowledge graph embedding is proposed to improve the calculation efficiency. By embedding entities and relations into a low-dimensional vector space, we can efficiently implement the operations such as the calculation of semantic similarity between entities, which is of considerable significance to the completion, reasoning, and applications of KGs.

Quite a few methods [2, 12, 4, 10] have been proposed for knowledge graph embedding. Given a KG, these methods first assign one or more vectors (or matrices) to each entity and relation, then define a scoring function to measure the plausibility of each triplet, and finally maximize the global plausibility of all triplets. Thus, scoring functions play a critical role in the methods, which determine the capability and computational complexity of models. The capability of a model is primarily influenced by the variety of connectivity patterns and mapping properties of relations it can model. In a KG, following [10], we have four connectivity patterns of relations:

  • Symmetry (Antisymmetry). A relation is symmetric (antisymmetric) if , :

  • Inversion. Relation is inverse to relation if , :

  • Composition. Relation is composed of relation and relation if , , :

Also, following [2], we have four mapping properties of relations:

  • One-to-One (1-to-1). Relation is 1-to-1 if a head can appear with at most one tail.

  • One-to-Many (1-to-N). Relation is 1-to-N if a head can appear with many tails.

  • Many-to-One (N-to-1). Relation is N-to-1 if many heads can appear with the same tail.

  • Many-to-Many (N-to-N). Relation is N-to-N if many heads can appear with many tails.

If an embedding method could model connectivity patterns and mapping properties as many as possible, it would potentially benefit the applications. For example, in a link prediction task, a model has learned that relation Nationality is a Composition of BornIn and LocatedIn. When triplets (Tom, BornIn, New York), (New York, LocatedIn, United States) both hold, it can infer that triplet (Tom, Nationality, United States) holds. Another negative instance is that if a method cannot model N-to-1

mapping property, it probably treats Leonardo DiCaprio and Kate Winslet as the same entity when it reads relations (Leonardo DiCaprio,

ActorIn, Titanic) and (Kate Winslet, ActorIn, Titanic).

In this paper, we proposed a novel method, namely linear representation embedding (LineaRE), which interprets a relation as a linear function of entities head and tail. Specifically, our model represents each entity as a low-dimensional vector (denoted by or ), and each relation as two weight vectors and a bias vector (denoted by , , and ), where , , , , and . Given a golden triplet (, , ), we expect the equation , where denotes the Hadamard (element-wise) product. Tables 1 &  2 summarize the scoring functions and the modeling capabilities of some state-of-the-art KG embedding methods, respectively. Table 1 shows that, the parameters of ComplEx and RotatE are defined in complex number spaces and those of the others (including our model) are defined in real number spaces. Compared with most of the other models, the scoring function of our LineaRE is simpler. Table 2 shows that, some of them (such as TransE and RotatE) are better at modeling connectivity patterns but do not consider complex mapping properties. In contrast, some others (TransH and DistMult) are better at modeling complex mapping properties but sacrifice some capability to model connectivity patterns. Our LineaRE has the most comprehensive modeling capability.

The contributions of the paper are three-fold: (1) We propose a novel LineaRE method for KG embedding, which is simple and can cover all the above connectivity patterns and mapping properties. (2) We provide formal mathematical proofs to demonstrate the modeling capabilities of LineaRE. (3) We conduct extensive experiments to evaluate our LineaRE on the task of link prediction on several benchmark datasets. The experimental results show that LineaRE has significant improvements compared with the existing state-of-the-art methods.

(a) Symmetry.
(b) Antisymmetry.
(c) Inversion.
(d) Composition.
(e) 1-to-N.
Figure 1: Illustrations of LineaRE modeling connectivity patterns and complex mapping properties.

2 Related Work

Knowledge graph embedding models can be roughly categorized into two groups [14]: translational models and multiplicative models.

Translational Models.

Given a triplet (, , ), TransE  [2] interprets the relation as a translation from the head entity to the tail entity . When a relation is symmetric, its vector will be represented by , resulting in TransE being unable to distinguish different symmetric relations. In addition, TransE has issues in dealing with 1-to-N, N-to-1, and N-to-N relations. TransH [15]

was proposed to address the issues of TransE in modeling complex relations, which interprets each relation as a translating operation on a hyperplane. However, such an operation cannot model inversion and composition patterns.

Multiplicative Models.

DistMult [18] is a bi-linear model. For a triplet (, , ), the relation is represented as a diagonal matrix to capture pairwise interactions between the components of and along the same dimension. But, this simple model can only deal with symmetric relations. ComplEx [12] was proposed to address the issues of DistMult in modeling antisymmetric relations by introducing complex-valued embeddings. Unfortunately, ComplEx is still not capable of modeling the composition pattern, and the space and time complexity of the model are considerably increased.

Other Models.

ConvE [4] is a multi-layer convolutional network model. The 2D convolution is able to extract feature interactions between two embeddings and . RotatE [10] represents entities and relations as complex vectors, and interprets the relation as a rotation from the head entity to the tail entity for a triplet. RotatE can model all the above connectivity patterns, but does not consider the complex mapping properties.

3 Our Method

In this section, we will introduce our proposed LineaRE model. First, we mathematically prove the powerful modeling capabilities of LineaRE. Then, we introduce the loss function used in our method.

3.1 Linear Representation Embedding

We provide the details of our proposed LineaRE in this part. We represent each entity as a low-dimensional vector ( or ), and each relation as two weight vectors (, ) and a bias vector (), where , , , , and . In LineaRE, a relation (, , ) defines straight lines in the rectangular coordinate system with and as axes. Therefore, we call our model linear representation embedding. Given a golden triplet (, , ), we expect that:


where denotes the Hadamard (element-wise) product. The scoring function of LineaRE is:


The connectivity patterns and mapping properties of relations are implicit in the properties of the straight lines. Formally, we have main results as follows:

Theorem 1.

LineaRE can model symmetry, antisymmetry, inversion and composition patterns.


With and as the axes, LineaRE represents each dimension of a relation as a straight line in the rectangular coordinate system. Figure 1 illustrates LineaRE in one-dimension case.

  • Symmetry (Each straight line of the relation is symmetrical about , Figure 1(a)).

    When holds, , then , where is a constant vector. When holds, , then . To sum up, when or , LineaRE can model symmetry pattern.

  • Antisymmetry (There exist some straight lines not symmetrical about in the relation, Figure 1(b)).

    When and , LineaRE can model symmetry pattern.

  • Inversion (The straight lines of and along the same dimension are symmetrical about , Figure 1(c)).

    That is, the slopes of the straight lines along the same dimension in and are mutually reciprocal, and the intercepts are symmetrical about .

  • Composition (Composition of linear functions, Figure 1(d).)

    is a linear mapping from to , and is a linear mapping from to , then a new linear mapping from to (ie., ) can be obtained by combining and .

Theorem 2.

LineaRE can model 1-to-1, 1-to-N, N-to-1 and N-to-N relations.


1-to-1: obviously, LineaRE can model 1-to-1 relations. 1-to-N: as shown in Figure 1(e), the straight line is one dimension of relation , close to axis. let be the maximum error, a given can appear with multiple values with low errors, where . The closer the straight line is to the axis, the larger the range of values is. Thus, multiple tail entities appearing with the same head entity can be appropriately far away from each other in such dimensions, and in other dimensions, these tail entities are closer to each other. Similarly, there exist some straight lines close to axis in N-to-1 relations. N-to-N relations contain both straight lines close to axis and straight lines close to axis. ∎

Corollary 1.

TransE model is a special case of LineaRE.


Let , our LineaRE becomes TransE, ie., TransE defines a relation as straight lines with a constant slope of one, which is a special case of LineaRE. ∎

3.2 Loss Function

A KG only contains positive triplets, and the way to construct a negative sample is to randomly replace the head or tail entity of an observed triplet, which is called negative sampling. Many negative sampling methods have been proposed [3, 20, 13], among which the self-adversarial negative sampling method [10]

dynamically adjusts the weight of negative samples according to their scores as the training goes on. We adopt this negative sampling technique. Specifically, the weight (i.e., probability distribution) of negative triplets for a golden triplet is as follows:


where is the temperature of sampling.

Then, we define the logistic loss function for a observed triplet and its negative samples:


where is a parameter that can adjust the margin between positive and negative sample scores; is the regularization coefficient; is the set of entities in the KG. Adam [7] is used as the optimizer.

Datasets # E # R # Train # Valid # Test
FB15k 14,951 1,345 483,142 50,000 59,071
WN18 40,943 18 141,442 5,000 5,000
FB15k-237 14,541 237 272,115 17,535 20,466
WN18RR 40,943 11 86,835 3,034 3,134
Table 3: Statistical information of the datasets used in experiments.
Model FB15k WN18
MR MRR hit@1 hit@3 hit@10 MR MRR hit@1 hit@3 hit@10
TransE 35 .729 .638 .798 .873 184 .798 .713 .869 .949
TransH 36 .731 .641 .800 .873 372 .796 .717 .856 .948
DistMult 59 .789 .730 .830 .887 496 .810 .694 .922 .949
ComplEx 63 .809 .757 .846 .894 531 .948 .945 .949 .953
ConvE 64 .745 .670 .801 .873 504 .942 .935 .947 .955
RotatE 40 .786 .723 .835 .884 264 .949 .943 .953 .960
LineaRE 36 .839 .799 .864 .906 170 .952 .947 .955 .961
Table 4: Link prediction results on FB15k and WN18.
Model FB15k-237 WN18RR
MR MRR hit@1 hit@3 hit@10 MR MRR hit@1 hit@3 hit@10
TransE 172 .334 .238 .371 .523 2933 .196 .021 .317 .529
TransH 168 .339 .243 .375 .531 4736 .210 .018 .387 .473
DistMult 301 .311 .225 .341 .485 6580 .424 .397 .433 .476
ComplEx 376 .313 .227 .342 .486 6671 .446 .416 .462 .503
ConvE 246 .316 .239 .350 .491 5277 .46 .39 .43 .48
RotatE 174 .338 .245 .373 .526 3536 .477 .429 .493 .569
LineaRE 168 .353 .258 .389 .545 1887 .486 .445 .500 .571
Table 5: Link prediction results on FB15k-237 and WN18RR.
Rel. Cat 1-to-1 1-to-N N-to-1 N-to-N 1-to-1 1-to-N N-to-1 N-to-N
Task Predicting Head (Hits@10) Predicting Tail (Hits@10)
TransE .913 .974 .622 .880 .895 .705 .967 .908
TransH .914 .973 .612 .883 .894 .680 .967 .910
DistMult .925 .965 .657 .890 .923 .821 .949 .917
ComplEx .928 .962 .673 .897 .934 .831 .950 .923
RotatE .933 .973 .630 .894 .933 .709 .965 .922
LineaRE .926 .972 .723 .905 .913 .837 .965 .932
Task Predicting Head (MRR) Predicting Tail (MRR)
TransE .736 .925 .489 .721 .731 .582 .903 .744
TransH .731 .922 .470 .728 .730 .559 .905 .751
DistMult .813 .922 .526 .793 .805 .683 .886 .817
ComplEx .820 .928 .557 .819 .815 .717 .890 .838
RotatE .859 .938 .511 .790 .857 .627 .906 .814
LineaRE .825 .938 .618 .842 .817 .751 .919 .865
Table 6: The detailed link prediction results by relation category on FB15k.
(a) _similar_to
(b) _hypernym & _hyponym
(c) winnerfor & for
(d) _hyponym
Figure 2: Investigation of some relation embeddings. (a) Angles between the straight lines of and the axis; (b) Angles between the straight lines of and ; (c) Angles between the straight lines of and the composition of and ; (d) Angles between the straight lines of and the axis; denotes the composition operation.

4 Experiments

In this section, We conduct extensive experiments to evaluate the proposed LineaRE method.

4.1 Datasets

Four widely used benchmark datasets are used in our experiments: FB15k [2], WN18 [2], FB15k-237 [11] and WN18RR [4]. The statistical information of these datasets is summarized in Table 3.

FB15k is a subset of Freebase, while WN18 is a subset of WordNet, and [10] showed that the key of link prediction on both of them is to model the symmetry, antisymmetry, and inversion patterns. FB15k-237 and WN18RR are subsets of FB15k and WN18, respectively. The main connectivity patterns in FB15k-237 are symmetry, antisymmetry, and composition. There is almost no composition pattern in WN18RR. Thus, the main connectivity patterns in WN18RR are symmetry and antisymmetry.

4.2 Experimental Settings

We use link prediction as a canonical task to evaluate KG embedding models, because it reflects the level to which a model preserves the global and local structural information of KGs. Specifically, we let be the test set and be the set of all entities in the dataset. For each test triplet (, , ) , we replace the tail entity by each entity in turn, forming candidate triplets {(, , )}. Some candidate triplets may exist in the dataset (training, validation, or test set), and it is common practice to delete them (except the current test triplet). The model is then used to calculate the dissimilarity of these corrupted triplets and sort them in ascending order. Eventually, the rank of the correct entity is stored. The prediction process for the head entity is the same.

Evaluation Protocol.

We report several standard evaluation metrics: the Mean of those predicted Ranks (MR), the Mean of those predicted Reciprocal Ranks (MRR), and the Hits@N (i.e., the proportion of correct entities ranked in the top N, where

). A lower MR is better while higher MRR and Hits@N are better.


We compare the performance of our model LineaRE and that of six state-of-the-art models on link prediction tasks. For fairness, all the models except ConvE use the same negative sampling technique (self-adversarial negative sampling proposed by [10]

), and the hyperparameters of different models are selected from the same ranges. Because ConvE is quite different from other models in principle, we extract the experimental results directly from the original paper


Hyperparameter Settings.

The hyperparameters are selected according to the performances on the validation dataset via grid search. We set the ranges of hyperparameters as follows: temperature of sampling {0.5, 1.0}, fixed margin {6, 9, 12, 15, 18, 24, 30}, in softplus {0.75, 1.0, 1.25}, embedding size {125, 250, 500, 1000}, batchsize {512, 1024, 2048}, and number of negative samples {128, 256, 512, 1024}. Optimal configurations for our LineaRE are: =1.0, =1.25, =15, =1000, =2048 and =128 on FB15k; =0.5, =1.25, =6, =500, =1024 and =512 on WN18; =0.5, =1.0, =12, =1000, =2048 and =128 on FB15k-237; =0.5, =1.0, =12, =1000, =2048 and =128 on WN18RR.

4.3 Experimental Results

4.3.1 Main Results

The main results on FB15k and WN18 are summarized in Table 4. LineaRE significantly outperforms all these previous state-of-the-art models in almost all the metrics except that TransE performs slightly better than LineaRE in the metric MR on FB15k. Table 5 summarizes the results on FB15k-237 and WN18RR. No previous model performs better than our LineaRE in any metric. Table 6 summarizes the detailed results by relation category111 Following  [15], for each relation , we compute the average number of tails per head () and the average number of head per tail (). If and , is treated as one-to-one; if and , is treated as a many-to-many; if and , is treated as one-to-many. If and , is treated as many-to-one. on FB15k, which shows that our LineaRE achieves the best performance on complex relations.

4.3.2 Analysis of Results

Then, we analyze the performances of these models with respect to the connectivity patterns and mapping properties in detail (Refer to Table 2, which summarizes the modeling capabilities of these models):

Symmetry (RotatE and TransE).

Among these methods, the difference between RotatE and TransE is only that the former can model symmetric relations and the latter cannot. The performance of RotatE is significantly better than that of TransE, because there are many symmetric relations in all datasets, especially in WN18RR.

Antisymmetry and Inversion (ComplEx and DistMult).

Complex embeddings enable ComplEx to model two more connectivity patterns (antisymmetry and inversion) than DistMult. The former performs better than the latter on all datasets, especially on WN18, which contains a large number of antisymmetric triplets and inverse triplets.

Composition (LineaRE and ComplEx).

Complex not only can model all connectivity patterns except composition but also can model complex mapping properties, which makes it achieve very good performance on all datasets other than on FB15k-237 where the main connectivity pattern is composition. Also, DistMult, which cannot model composition pattern, also performs poorly on FB15k-237. The difference between our LineaRE and ComplEx is that LineaRE is capable of modeling composition pattern, thus, our model performs better, especially on FB15k-237.

Complex mapping properties. (LineaRE and RotatE).

RotatE has a powerful modeling capability for all the above connectivity patterns, which makes it perform well on these datasets. However, it is still inferior to our LineaRE because our LineaRE has the same modeling capability in connectivity patterns as RotatE and moreover, LineaRE can deal with complex mapping properties that RotatE cannot handle. On the relatively more complex dataset FB15k, our LineaRE gains a more prominent advantage. Table 6 shows that DistMult, ComplEx and LineaRE, which are capable of modeling complex mapping properties, perform well on 1-to-N (predicting tail), N-to-1 (predicting head), and N-to-N relations, while RotatE and TransE perform worse.

The performance of TransH is worse than expected. We express our conjecture briefly. As we all know that two points determine a straight line. When two relations have many common entities, their hyperplanes will be the same, resulting in many entities being restricted to a straight line. This leads to a poor performance, especially on KGs like WordNet. In the original paper [15], the experimental results of TransH on WN18 are also inferior to those of TransE.

4.3.3 Investigation of Relation Embeddings

To verify our theoretical analysis of the modeling capabilities of LineaRE in Section 3.2, we investigate some relevant relation embeddings (500 dimensions on WN18 and 1000 dimensions on FB15k-237). Figure 2(a) counts the angles between the straight lines corresponding to relation in WN18 and the axis. Almost all of the 500 angles are equal to or close to 45° or 135°. Relation and in WN18 are a pair of inverse relations. We first inverse one of them about , and then compute and count the angles between the straight lines of the two relations along the same dimensions. Figure 2(b) shows that most angles are equal to or close to 0° or 180°. In FB15k-237, is a composition of and 222 represents relation /award/award_nominee/award_nomina- tions./award/award_nomination/nominated_for, represents relation /award/award_category/winners./award/award_honor/award- _winner, and represents /award/award_category/nominees./awa- rd/award_nomination/nominated_for. . We compute the angles between the composite straight lines and the lines of along the same dimensions. Figure 2(c) shows that the composition of and is very similar to . For 1-to-N relation , , and , Figure 2(d) shows that there are more straight lines close to axis than axis. Besides, the investigation of straight line intercepts is consistent with our theoretical analysis.

5 Conclusion

In this paper, we proposed a novel KG embedding method LineaRE, which models connectivity patterns and mapping properties of relations in linear representation. Extensive experimental results on the task of link prediction show that the LineaRE model significantly outperforms existing state-of-the-art models on four widely used datasets. A deep investigation into the relation embeddings further verifies our theoretical analysis of its modeling capabilities. The source code is available at


  • [1] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250. Cited by: §1.
  • [2] A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko (2013) Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: Table 1, Table 2, §1, §2, §4.1.
  • [3] L. Cai and W. Y. Wang (2017) Kbgan: adversarial learning for knowledge graph embeddings. arXiv preprint arXiv:1711.04071. Cited by: §3.2.
  • [4] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel (2018) Convolutional 2d knowledge graph embeddings. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    Cited by: Table 1, §1, §2, §4.1, §4.2.
  • [5] Y. Hao, Y. Zhang, K. Liu, S. He, Z. Liu, H. Wu, and J. Zhao (2017) An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 221–231. Cited by: §1.
  • [6] X. Huang, J. Zhang, D. Li, and P. Li (2019) Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 105–113. Cited by: §1.
  • [7] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
  • [8] J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D. Kontokostas, P. N. Mendes, S. Hellmann, M. Morsey, P. Van Kleef, S. Auer, et al. (2015) DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web 6 (2), pp. 167–195. Cited by: §1.
  • [9] G. A. Miller (1995) WordNet: a lexical database for english. Communications of the ACM 38 (11), pp. 39–41. Cited by: §1.
  • [10] Z. Sun, Z. Deng, J. Nie, and J. Tang (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. Cited by: Table 1, Table 2, §1, §2, §3.2, §4.1, §4.2.
  • [11] K. Toutanova and D. Chen (2015) Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66. Cited by: §4.1.
  • [12] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard (2016) Complex embeddings for simple link prediction. In

    International Conference on Machine Learning

    pp. 2071–2080. Cited by: Table 1, Table 2, §1, §2.
  • [13] P. Wang, S. Li, and R. Pan (2018) Incorporating gan for negative sampling in knowledge representation learning. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: §3.2.
  • [14] Q. Wang, Z. Mao, B. Wang, and L. Guo (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29 (12), pp. 2724–2743. Cited by: §2.
  • [15] Z. Wang, J. Zhang, J. Feng, and Z. Chen (2014) Knowledge graph embedding by translating on hyperplanes. In Twenty-Eighth AAAI conference on artificial intelligence, Cited by: Table 1, Table 2, §2, §4.3.2, footnote 1.
  • [16] C. Xiong, R. Power, and J. Callan (2017) Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pp. 1271–1279. Cited by: §1.
  • [17] B. Yang and T. Mitchell (2019) Leveraging knowledge bases in lstms for improving machine reading. arXiv preprint arXiv:1902.09091. Cited by: §1.
  • [18] B. Yang, W. Yih, X. He, J. Gao, and L. Deng (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: Table 1, Table 2, §2.
  • [19] F. Zhang, N. J. Yuan, D. Lian, X. Xie, and W. Ma (2016) Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 353–362. Cited by: §1.
  • [20] Y. Zhang, Q. Yao, Y. Shao, and L. Chen (2019) NSCaching: simple and efficient negative sampling for knowledge graph embedding. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 614–625. Cited by: §3.2.