STransE: a novel embedding model of entities and relationships in knowledge bases

06/27/2016 ∙ by Dat Quoc Nguyen, et al. ∙ CSIRO Macquarie University 0

Knowledge bases of real-world facts about entities and their relationships are useful resources for a variety of natural language processing tasks. However, because knowledge bases are typically incomplete, it is useful to be able to perform link prediction or knowledge base completion, i.e., predict whether a relationship not in the knowledge base is likely to be true. This paper combines insights from several previous link prediction models into a new embedding model STransE that represents each entity as a low-dimensional vector, and each relation by two matrices and a translation vector. STransE is a simple combination of the SE and TransE models, but it obtains better link prediction performance on two benchmark datasets than previous embedding models. Thus, STransE can serve as a new baseline for the more complex models in the link prediction task.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge bases (KBs), such as WordNet [Fellbaum1998], YAGO [Suchanek et al.2007], Freebase [Bollacker et al.2008] and DBpedia [Lehmann et al.2015], represent relationships between entities as triples . Even very large knowledge bases are still far from complete [Socher et al.2013, West et al.2014]. Link prediction or knowledge base completion systems [Nickel et al.2016a] predict which triples not in a knowledge base are likely to be true [Taskar et al.2004, Bordes et al.2011]. A variety of different kinds of information is potentially useful here, including information extracted from external corpora [Riedel et al.2013, Wang et al.2014a] and the other relationships that hold between the entities [Angeli and Manning2013, Zhao et al.2015]. For example, toutanova-EtAl:2015:EMNLP used information from the external ClueWeb-12 corpus to significantly enhance performance.

While integrating a wide variety of information sources can produce excellent results [Das et al.2017]

, there are several reasons for studying simpler models that directly optimize a score function for the triples in a knowledge base, such as the one presented here. First, additional information sources might not be available, e.g., for knowledge bases for specialized domains. Second, models that don’t exploit external resources are simpler and thus typically much faster to train than the more complex models using additional information. Third, the more complex models that exploit external information are typically extensions of these simpler models, and are often initialized with parameters estimated by such simpler models, so improvements to the simpler models should yield corresponding improvements to the more complex models as well.

Model Score function Opt.
SE ; , SGD
Unstructured SGD
TransE ; r SGD
DISTMULT ; is a diagonal matrix AdaGrad
NTN ; ; ; , L-BFGS
TransH ; , ; I

: Identity matrix size

TransD ; r, r ; h, t ; I: Identity matrix size AdaDelta
TransR ; ; r SGD
TranSparse ; , ; , ; SGD
Our STransE ; , ; r SGD
Table 1: The score functions and the optimization methods (Opt.) of several prominent embedding models for KB completion. In all of these the entities and are represented by vectors h and respectively.

Embedding models for KB completion associate entities and/or relations with dense feature vectors or matrices. Such models obtain state-of-the-art performance [Nickel et al.2011, Bordes et al.2011, Bordes et al.2012, Bordes et al.2013, Socher et al.2013, Wang et al.2014b, Guu et al.2015] and generalize to large KBs [Krompaß et al.2015]. Table 1 summarizes a number of prominent embedding models for KB completion.

Let represent a triple. In all of the models discussed here, the head entity and the tail entity are represented by vectors h and respectively. The Unstructured model [Bordes et al.2012] assumes that . As the Unstructured model does not take the relationship into account, it cannot distinguish different relation types. The Structured Embedding (SE) model [Bordes et al.2011] extends the unstructured model by assuming that and are similar only in a relation-dependent subspace. It represents each relation with two matrices and , which are chosen so that . The TransE model [Bordes et al.2013] is inspired by models such as Word2Vec [Mikolov et al.2013] where relationships between words often correspond to translations in latent feature space. The TransE model represents each relation by a translation vector r , which is chosen so that .

The primary contribution of this paper is that two very simple relation-prediction models, SE and TransE, can be combined into a single model, which we call STransE.111Source code: Specifically, we use relation-specific matrices and as in the SE model to identify the relation-dependent aspects of both and , and use a vector r as in the TransE model to describe the relationship between and in this subspace. Specifically, our new KB completion model STransE chooses , and r so that . That is, a TransE-style relationship holds in some relation-dependent subspace, and crucially, this subspace may involve very different projections of the head and tail . So and can highlight, suppress, or even change the sign of, relation-specific attributes of and . For example, for the “purchases” relationship, certain attributes of individuals (e.g., age, gender, marital status) are presumably strongly correlated with very different attributes of objects (e.g., sports car, washing machine and the like).

As we show below, STransE performs better than the SE and TransE models and other state-of-the-art link prediction models on two standard link prediction datasets WN18 and FB15k, so it can serve as a new baseline for KB completion. We expect that the STransE will also be able to serve as the basis for extended models that exploit a wider variety of information sources, just as TransE does.

2 Our approach

Let denote the set of entities and the set of relation types. For each triple , where and , the STransE model defines a score function of its implausibility. Our goal is to choose such that the score of a plausible triple is smaller than the score of an implausible triple . We define the STransE score function as follows:

using either the or the -norm (the choice is made using validation data; in our experiments we found that the norm gave slightly better results). To learn the vectors and matrices we minimize the following margin-based objective function:

where , is the margin hyper-parameter, is the training set consisting of correct triples, and is the set of incorrect triples generated by corrupting a correct triple .

We use Stochastic Gradient Descent (SGD) to minimize

, and impose the following constraints during training: , , , and .

3 Related work

Table 1 summarizes related embedding models for link prediction and KB completion. The models differ in the score functions and the algorithms used to optimize the margin-based objective function, e.g., SGD, AdaGrad [Duchi et al.2011], AdaDelta [Zeiler2012] and L-BFGS [Liu and Nocedal1989].

DISTMULT [Yang et al.2015] is based on a Bilinear model [Nickel et al.2011, Bordes et al.2012, Jenatton et al.2012]

where each relation is represented by a diagonal rather than a full matrix. The neural tensor network (NTN) model

[Socher et al.2013] uses a bilinear tensor operator to represent each relation while ProjE [Shi and Weninger2017] could be viewed as a simplified version of NTN with diagonal matrices. Similar quadratic forms are used to model entities and relations in KG2E [He et al.2015], ComplEx [Trouillon et al.2016], TATEC [García-Durán et al.2016] and RSTE [Tay et al.2017]. In addition, HolE [Nickel et al.2016b] uses circular correlation—a compositional operator—which could be interpreted as a compression of the tensor product.

The TransH model [Wang et al.2014b]

associates each relation with a relation-specific hyperplane and uses a projection vector to project entity vectors onto that hyperplane. TransD

[Ji et al.2015] and TransR/CTransR [Lin et al.2015b] extend the TransH model using two projection vectors and a matrix to project entity vectors into a relation-specific space, respectively. TransD learns a relation-role specific mapping just as STransE, but represents this mapping by projection vectors rather than full matrices, as in STransE. The lppTransD model [Yoon et al.2016] extends TransD to additionally use two projection vectors for representing each relation. In fact, our STransE model and TranSparse [Ji et al.2016] can be viewed as direct extensions of the TransR model, where head and tail entities are associated with their own projection matrices, rather than using the same matrix for both, as in TransR and CTransR.

Recently, several authors have shown that relation paths between entities in KBs provide richer information and improve the relationship prediction [Lin et al.2015a, García-Durán et al.2015, Guu et al.2015, Wang et al.2016, Feng et al.2016, Liu et al.2016, Niepert2016, Wei et al.2016, Toutanova et al.2016, Nguyen et al.2016]. In addition, NickelMTG15 reviews other approaches for learning from KBs and multi-relational data.

4 Experiments

For link prediction evaluation, we conduct experiments and compare the performance of our STransE model with published results on the benchmark WN18 and FB15k datasets [Bordes et al.2013]. Information about these datasets is given in Table 2.

Dataset #E #R #Train #Valid #Test
WN18 40,943 18 141,442 5,000 5,000
FB15k 14,951 1,345 483,142 50,000 59,071
Table 2: Statistics of the experimental datasets used in this study (and previous works). #E is the number of entities, #R is the number of relation types, and #Train, #Valid and #Test are the numbers of triples in the training, validation and test sets, respectively.
Method Raw Filtered
WN18 FB15k WN18 FB15k
SE [Bordes et al.2011] 1011 68.5 - 273 28.8 - 985 80.5 - 162 39.8 -
Unstructured [Bordes et al.2012] 315 35.3 - 1074 4.5 - 304 38.2 - 979 6.3 -
TransE [Bordes et al.2013] 263 75.4 - 243 34.9 - 251 89.2 - 125 47.1 -
TransH [Wang et al.2014b] 401 73.0 - 212 45.7 - 303 86.7 - 87 64.4 -
TransR [Lin et al.2015b] 238 79.8 - 198 48.2 - 225 92.0 - 77 68.7 -
CTransR [Lin et al.2015b] 231 79.4 - 199 48.4 - 218 92.3 - 75 70.2 -
KG2E [He et al.2015] 342 80.2 - 174 48.9 - 331 92.8 - 59 74.0 -
TransD [Ji et al.2015] 224 79.6 - 194 53.4 - 212 92.2 - 91 77.3 -
lppTransD [Yoon et al.2016] 283 80.5 - 195 53.0 - 270 94.3 - 78 78.7 -
TranSparse [Ji et al.2016] 223 80.1 - 187 53.5 - 211 93.2 - 82 79.5 -
TATEC [García-Durán et al.2016] - - - - - - - - - 58 76.7 -
NTN [Socher et al.2013] - - - - - - - 66.1 0.53 - 41.4 0.25
DISTMULT [Yang et al.2015] - - - - - - - 94.2 0.83 - 57.7 0.35
HolE [Nickel et al.2016b] - - 0.616 - - 0.232 - 94.9 0.938 - 73.9 0.524
Our STransE 217 80.9 0.469 219 51.6 0.252 206 93.4 0.657 69 79.7 0.543
rTransE [García-Durán et al.2015] - - - - - - - - - 50 76.2 -
PTransE [Lin et al.2015a] - - - 207 51.4 - - - - 58 84.6 -
GAKE [Feng et al.2016] - - - 228 44.5 - - - - 119 64.8 -
Gaifman [Niepert2016] - - - - - - 352 93.9 - 75 84.2 -
Hiri [Liu et al.2016] - - - - - - - 90.8 0.691 - 70.3 0.603
NLFeat [Toutanova and Chen2015] - - - - - - - 94.3 0.940 - 87.0 0.822
TEKE_H [Wang and Li2016] 127 80.3 - 212 51.2 - 114 92.9 - 108 73.0 -
SSP [Xiao et al.2017] 168 81.2 - 163 57.2 - 156 93.2 - 82 79.0 -
Table 3:

Link prediction results. MR, H10 and MRR denote evaluation metrics of mean rank, Hits@10 (in %) and mean reciprocal rank, respectively. “NLFeat” abbreviates Node+LinkFeat. The results for NTN

[Socher et al.2013] listed in this table are taken from yang-etal-2015 since NTN was originally evaluated on different datasets.

4.1 Task and evaluation protocol

The link prediction task [Bordes et al.2011, Bordes et al.2012, Bordes et al.2013] predicts the head or tail entity given the relation type and the other entity, i.e. predicting given or predicting given where denotes the missing element. The results are evaluated using the ranking induced by the score function on test triples.

For each test triple , we corrupted it by replacing either or by each of the possible entities in turn, and then rank these candidates in ascending order of their implausibility value computed by the score function. This is called as the “Raw” setting protocol. For the “Filtered” setting protocol described in NIPS2013_5071, we removed any corrupted triples that appear in the knowledge base, to avoid cases where a correct corrupted triple might be ranked higher than the test triple. The “Filtered” setting thus provides a clearer view on the ranking performance. Following NIPS2013_5071, we report the mean rank and the Hits@10 (i.e., the proportion of test triples in which the target entity was ranked in the top 10 predictions) for each model. In addition, we report the mean reciprocal rank, which is commonly used in information retrieval. In both “Raw” and “Filtered” settings, lower mean rank, higher mean reciprocal rank or higher Hits@10 indicates better link prediction performance.

Following TransR [Lin et al.2015b], TransD [Ji et al.2015], rTransE [García-Durán et al.2015], PTransE [Lin et al.2015a], TATEC [García-Durán et al.2016] and TranSparse [Ji et al.2016], we used the entity and relation vectors produced by TransE [Bordes et al.2013] to initialize the entity and relation vectors in STransE, and we initialized the relation matrices with identity matrices. We applied the “Bernoulli” trick used also in previous work for generating head or tail entities when sampling incorrect triples [Wang et al.2014b, Lin et al.2015b, He et al.2015, Ji et al.2015, Lin et al.2015a, Yoon et al.2016, Ji et al.2016]

. We ran SGD for 2,000 epochs to estimate the model parameters. Following NIPS2013_5071 we used a grid search on validation set to choose either the

or norm in the score function , as well as to set the SGD learning rate , the margin hyper-parameter and the vector size . The lowest filtered mean rank on the validation set was obtained when using the norm in on both WN18 and FB15k, and when for WN18, and for FB15k.

4.2 Main results

Table 3 compares the link prediction results of our STransE model with results reported in prior work, using the same experimental setup. The first 15 rows report the performance of the models that do not exploit information about alternative paths between head and tail entities. The next 5 rows report results of the models that exploit information about relation paths. The last 3 rows present results for the models which make use of textual mentions derived from a large external corpus.

It is clear that the models with the additional external corpus information obtained best results. In future work we plan to extend the STransE model to incorporate such additional information. Table 3 also shows that the models employing path information generally achieve better results than models that do not use such information. In terms of models not exploiting path information or external information, the STransE model produces the highest filtered mean rank on WN18 and the highest filtered Hits@10 and mean reciprocal rank on FB15k. Compared to the closely related models SE, TransE, TransR, CTransR, TransD and TranSparse, our STransE model does better than these models on both WN18 and FB15k.

Following NIPS2013_5071, Table 4 analyzes Hits@10 results on FB15k with respect to the relation categories defined as follows: for each relation type , we computed the averaged number of heads for a pair and the averaged number of tails for a pair . If and , then is labeled 1-1. If and , then is labeled M-1. If and , then is labeled as 1-M. If and , then is labeled as M-M

. 1.4%, 8.9%, 14.6% and 75.1% of the test triples belong to a relation type classified as

1-1, 1-M, M-1 and M-M, respectively.

Method Predicting head Predicting tail
1-1 1-M M-1 M-M 1-1 1-M M-1 M-M
SE 35.6 62.6 17.2 37.5 34.9 14.6 68.3 41.3
Unstr. 34.5 2.5 6.1 6.6 34.3 4.2 1.9 6.6
TransE 43.7 65.7 18.2 47.2 43.7 19.7 66.7 50.0
TransH 66.8 87.6 28.7 64.5 65.5 39.8 83.3 67.2
TransR 78.8 89.2 34.1 69.2 79.2 37.4 90.4 72.1
CTransR 81.5 89.0 34.7 71.2 80.8 38.6 90.1 73.8
KG2E 92.3 94.6 66.0 69.6 92.6 67.9 94.4 73.4
TATEC 79.3 93.2 42.3 77.2 78.5 51.5 92.7 80.7
TransD 86.1 95.5 39.8 78.5 85.4 50.6 94.4 81.2
lppTransD 86.0 94.2 54.4 82.2 79.7 43.2 95.3 79.7
TranSparse 86.8 95.5 44.3 80.9 86.6 56.6 94.4 83.3
STransE 82.8 94.2 50.4 80.1 82.4 56.9 93.4 83.1
Table 4: Hits@10 (in %) by the relation category on FB15k. “Unstr.” abbreviates Unstructured.

Table 4 shows that in comparison to prior models not using path information, STransE obtains the second highest Hits@10 result for M-M relation category at which is 0.5% smaller than the Hits@10 result of TranSparse for M-M. However, STransE obtains 2.5% higher Hits@10 result than TranSparse for M-1. In addition, STransE also performs better than TransD for 1-M and M-1 relation categories. We believe the improved performance of the STransE model is due to its use of full matrices, rather than just projection vectors as in TransD. This permits STransE to model diverse and complex relation categories (such as 1-M, M-1 and especially M-M) better than TransD and other similiar models. However, STransE is not as good as TransD for the 1-1 relations. Perhaps the extra parameters in STransE hurt performance in this case (note that 1-1 relations are relatively rare, so STransE does better overall).

5 Conclusion and future work

This paper presented a new embedding model for link prediction and KB completion. Our STransE combines insights from several simpler embedding models, specifically the Structured Embedding model [Bordes et al.2011] and the TransE model [Bordes et al.2013], by using a low-dimensional vector and two projection matrices to represent each relation. STransE, while being conceptually simple, produces highly competitive results on standard link prediction evaluations, and scores better than the embedding-based models it builds on. Thus it is a suitable candidate for serving as future baseline for more complex models in the link prediction task.

In future work we plan to extend STransE to exploit relation path information in knowledge bases, in a manner similar to lin-EtAl:2015:EMNLP1, guu-miller-liang:2015:EMNLP or NguyenCoNLL2016.


This research was supported by a Google award through the Natural Language Understanding Focused Program, and under the Australian Research Council’s Discovery Projects funding scheme (project number DP160102156).

NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program. The first author is supported by an International Postgraduate Research Scholarship and a NICTA NRPA Top-Up Scholarship.


  • [Angeli and Manning2013] Gabor Angeli and Christopher Manning. 2013. Philosophers are Mortal: Inferring the Truth of Unseen Facts. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 133–142.
  • [Bollacker et al.2008] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pages 1247–1250.
  • [Bordes et al.2011] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning Structured Embeddings of Knowledge Bases. In

    Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence

    , pages 301–306.
  • [Bordes et al.2012] Antoine Bordes, Xavier Glorot, Jason Weston, and Yoshua Bengio. 2012. A Semantic Matching Energy Function for Learning with Multi-relational Data. Machine Learning, 94(2):233–259.
  • [Bordes et al.2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems 26, pages 2787–2795.
  • [Das et al.2017] Rajarshi Das, Arvind Neelakantan, David Belanger, and Andrew McCallum. 2017.

    Chains of reasoning over entities, relations, and text using recurrent neural networks.

    In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics.
  • [Duchi et al.2011] John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. The Journal of Machine Learning Research, 12:2121–2159.
  • [Fellbaum1998] Christiane D. Fellbaum. 1998. WordNet: An Electronic Lexical Database. MIT Press.
  • [Feng et al.2016] Jun Feng, Minlie Huang, Yang Yang, and xiaoyan zhu. 2016. GAKE: Graph Aware Knowledge Embedding. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 641–651.
  • [García-Durán et al.2015] Alberto García-Durán, Antoine Bordes, and Nicolas Usunier. 2015. Composing Relationships with Translations. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 286–290.
  • [García-Durán et al.2016] Alberto García-Durán, Antoine Bordes, Nicolas Usunier, and Yves Grandvalet. 2016. Combining Two and Three-Way Embedding Models for Link Prediction in Knowledge Bases. Journal of Artificial Intelligence Research, 55:715–742.
  • [Guu et al.2015] Kelvin Guu, John Miller, and Percy Liang. 2015.

    Traversing Knowledge Graphs in Vector Space.

    In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 318–327.
  • [He et al.2015] Shizhu He, Kang Liu, Guoliang Ji, and Jun Zhao. 2015. Learning to Represent Knowledge Graphs with Gaussian Embedding. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pages 623–632.
  • [Jenatton et al.2012] Rodolphe Jenatton, Nicolas L. Roux, Antoine Bordes, and Guillaume R Obozinski. 2012. A latent factor model for highly multi-relational data. In Advances in Neural Information Processing Systems 25, pages 3167–3175.
  • [Ji et al.2015] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 687–696.
  • [Ji et al.2016] Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. 2016. Knowledge Graph Completion with Adaptive Sparse Transfer Matrix. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 985–991.
  • [Krompaß et al.2015] Denis Krompaß, Stephan Baier, and Volker Tresp. 2015. Type-Constrained Representation Learning in Knowledge Graphs. In Proceedings of the 14th International Semantic Web Conference, pages 640–655.
  • [Lehmann et al.2015] Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N. Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick van Kleef, Sören Auer, and Christian Bizer. 2015. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web, 6(2):167–195.
  • [Lin et al.2015a] Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015a. Modeling Relation Paths for Representation Learning of Knowledge Bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 705–714.
  • [Lin et al.2015b] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015b. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Learning, pages 2181–2187.
  • [Liu and Nocedal1989] D. C. Liu and J. Nocedal. 1989. On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming, 45(3):503–528.
  • [Liu et al.2016] Qiao Liu, Liuyi Jiang, Minghao Han, Yao Liu, and Zhiguang Qin. 2016. Hierarchical Random Walk Inference in Knowledge Graphs. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 445–454.
  • [Mikolov et al.2013] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751.
  • [Nguyen et al.2016] Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, and Mark Johnson. 2016. Neighborhood Mixture Model for Knowledge Base Completion. In Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, pages 40–50.
  • [Nickel et al.2011] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning, pages 809–816.
  • [Nickel et al.2016a] Maximilian Nickel, Kevin Murphy, Volker Tresp, and Evgeniy Gabrilovich. 2016a. A Review of Relational Machine Learning for Knowledge Graphs. Proceedings of the IEEE, 104(1):11–33.
  • [Nickel et al.2016b] Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016b. Holographic embeddings of knowledge graphs. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 1955–1961.
  • [Niepert2016] Mathias Niepert. 2016. Discriminative Gaifman Models. In Advances in Neural Information Processing Systems 29, pages 3405–3413.
  • [Riedel et al.2013] Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013. Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 74–84.
  • [Shi and Weninger2017] Baoxu Shi and Tim Weninger. 2017. ProjE: Embedding Projection for Knowledge Graph Completion. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
  • [Socher et al.2013] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013. Reasoning With Neural Tensor Networks for Knowledge Base Completion. In Advances in Neural Information Processing Systems 26, pages 926–934.
  • [Suchanek et al.2007] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. 2007. YAGO: A Core of Semantic Knowledge. In Proceedings of the 16th International Conference on World Wide Web, pages 697–706.
  • [Taskar et al.2004] Ben Taskar, Ming fai Wong, Pieter Abbeel, and Daphne Koller. 2004. Link Prediction in Relational Data. In Advances in Neural Information Processing Systems 16, pages 659–666.
  • [Tay et al.2017] Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Falk Brauer. 2017. Random Semantic Tensor Ensemble for Scalable Knowledge Graph Link Prediction. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pages 751–760.
  • [Toutanova and Chen2015] Kristina Toutanova and Danqi Chen. 2015. Observed Versus Latent Features for Knowledge Base and Text Inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pages 57–66.
  • [Toutanova et al.2015] Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing Text for Joint Embedding of Text and Knowledge Bases. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1499–1509.
  • [Toutanova et al.2016] Kristina Toutanova, Victoria Lin, Wen-tau Yih, Hoifung Poon, and Chris Quirk. 2016. Compositional Learning of Embeddings for Relation Paths in Knowledge Base and Text. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1434–1444.
  • [Trouillon et al.2016] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33nd International Conference on Machine Learning, pages 2071–2080.
  • [Wang and Li2016] Zhigang Wang and Juan-Zi Li. 2016. Text-Enhanced Representation Learning for Knowledge Graph. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, pages 1293–1299.
  • [Wang et al.2014a] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014a. Knowledge Graph and Text Jointly Embedding. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1591–1601.
  • [Wang et al.2014b] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014b. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pages 1112–1119.
  • [Wang et al.2016] Quan Wang, Jing Liu, Yuanfei Luo, Bin Wang, and Chin-Yew Lin. 2016. Knowledge Base Completion via Coupled Path Ranking. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1308–1318.
  • [Wei et al.2016] Zhuoyu Wei, Jun Zhao, and Kang Liu. 2016. Mining Inference Formulas by Goal-Directed Random Walks. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1379–1388.
  • [West et al.2014] Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, and Dekang Lin. 2014. Knowledge Base Completion via Search-based Question Answering. In Proceedings of the 23rd International Conference on World Wide Web, pages 515–526.
  • [Xiao et al.2017] Han Xiao, Minlie Huang, and Xiaoyan Zhu. 2017. SSP: semantic space projection for knowledge graph embedding with text descriptions. In Proceedings of the 31st AAAI Conference on Artificial Intelligence.
  • [Yang et al.2015] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In Proceedings of the International Conference on Learning Representations.
  • [Yoon et al.2016] Hee-Geun Yoon, Hyun-Je Song, Seong-Bae Park, and Se-Young Park. 2016. A Translation-Based Knowledge Graph Embedding Preserving Logical Property of Relations. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 907–916.
  • [Zeiler2012] Matthew D. Zeiler. 2012. ADADELTA: An Adaptive Learning Rate Method. CoRR, abs/1212.5701.
  • [Zhao et al.2015] Yu Zhao, Sheng Gao, Patrick Gallinari, and Jun Guo. 2015. Knowledge Base Completion by Learning Pairwise-Interaction Differentiated Embeddings. Data Mining and Knowledge Discovery, 29(5):1486–1504.