Inducing Interpretability in Knowledge Graph Embeddings

12/10/2017 ∙ by Chandrahas, et al. ∙ indian institute of science 0

We study the problem of inducing interpretability in KG embeddings. Specifically, we explore the Universal Schema (Riedel et al., 2013) and propose a method to induce interpretability. There have been many vector space models proposed for the problem, however, most of these methods don't address the interpretability (semantics) of individual dimensions. In this work, we study this problem and propose a method for inducing interpretability in KG embeddings using entity co-occurrence statistics. The proposed method significantly improves the interpretability, while maintaining comparable performance in other KG tasks.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge Graphs such as Freebase, WordNet etc. have become important resources for supporting many AI applications like web search, Q&A etc. They store a collection of facts in the form of a graph. The nodes in the graph represent real world entities such as Roger Federer, Tennis, United States etc while the edges represent relationships between them.

These KGs have grown huge, but they are still not complete Toutanova et al. (2015). Hence the task of inferring new facts becomes important. Many vector space models have been proposed which can perform reasoning over KGs efficiently Bordes et al. (2011), Wang et al. (2014), Lin et al. (2015), Socher et al. (2013), Riedel et al. (2013), Toutanova et al. (2015) etc. These methods learn representations for entities and relations as vectors in a vector space, capturing global information about the KG. The task of KG inference is then defined as operations over these vectors. Some of these methods like Riedel et al. (2013), Toutanova et al. (2015) are capable of exploiting additional text data apart from the KG, resulting in better representations.

Although these methods have shown good performance in applications, they don’t address the problem of understanding semantics of individual dimensions of the KG embedding. A recent work Xiao et al. (2016) addressed the problem of learning semantic features for KGs. However, they don’t directly use vector space modeling.

In this work, we focus on incorporating interpretability in KG embeddings. Specifically, we aim to learn interpretable embeddings for KG entities by incorporating additional entity co-occurrence statistics from text data. This work is motivated by Lau et al. (2014) who presented automated methods for evaluating topics learned via topic modelling methods. We adapt these measures for the vector space model and propose a method to directly maximize them while learning KG embedding. To the best of our knowledge, this work presents the first regularization term which induces interpretability in KG embeddings.

2 Related Work

Several methods have been proposed for learning KG embeddings. They differ on the modeling of entities and relations, usage of text data and interpretability of the learned embeddings. We summarize some of these methods in following sections.

2.1 Vector-space models for KG Embeddings

A very effective and powerful set of models are based on translation vectors. These models represent entities as vectors in -dimensional space, and relations as translation vectors from head entity to tail entity, in either same or a projected space. TransEBordes et al. (2011) is one of the initial works, which was later improved by many works [Wang et al. (2014), Lin et al. (2015), Xiao et al. (2015b), Xiao et al. (2015a), Ji et al. (2015), Fan et al. (2014)]. Also, there are methods which are able to incorporate text data while learning KG embeddings. Riedel et al. (2013) is one such method, which assumes a combined universal schema of relations from KG as well as text. Toutanova et al. (2015) further improves the performance by sharing parameters among similar textual relations.

2.2 Interpretability of Embedding

While the vector space models perform well in many tasks, the semantics of learned representations are not directly clear. This problem for word embeddings was addressed by Murphy et al. (2012) where they proposed a set of constraints inducing interpretability. However, its adaptation for KG embeddings hasn’t been addressed. A recent work Xiao et al. (2016) addressed a similar problem, where they learn coherent semantic features for entities and relations in KG. Our method differs from theirs in the following two aspects. Firstly, we use vector space modeling leading directly to KG embeddings while they need to infer KG embeddings from their probabilistic model. Second, we incorporate additional information about entities which helps in learning interpretable embeddings.

3 Proposed Method

We are interested in inducing interpretability in KG embeddings and regularization is one good way to do it. So we want to look at novel regularizers in KG embeddings. Hence, we explore a measure of coherence proposed in Lau et al. (2014). This measure allows automated evaluation of the quality of topics learned by topic modeling methods by using additional Point-wise Mutual Information (PMI) for word pairs. It was also shown to have high correlation with human evaluation of topics.

Based on this measure of coherence, we propose a regularization term. This term can be used with existing KG embedding methods (eg Riedel et al. (2013)) for inducing interpretability. It is described in the following sections.

3.1 Coherence

In topic models, coherence of a topic can be determined by semantic relatedness among top entities within the topic. This idea can also be used in vector space models by treating dimensions of the vector space as topics. With this assumption, we can use a measure of coherence defined in following section for evaluating interpretability of the embeddings.


has been shown to have high correlation with human interpretability of topics learned via various topic modeling methodsLau et al. (2014). Hence, we can expect interpretable embeddings by maximizing it.

Coherence for top entities along dimension is defined as follows:


where is PMI score between entities and extracted from text data. for the entity embedding matrix is defined as the average over all dimensions.


3.1.2 Inducing coherence while learning embeddings

We want to learn an embedding matrix which has high coherence (i.e. which maximizes ). Since changes during training, the set of top entities along each dimension varies over iterations. Hence, directly maximizing seems to be tricky.

An alternate approach could be to promote higher values for entity pairs having high PMI score . This will result in an embedding matrix with a high value of since high PMI entity pairs are more likely to be among top entities.

This idea can be captured by following coherence term


where is entity-pair PMI matrix and denote vector for entity . This term can be used in the objective function defined in Equation 6

3.2 Entity Model (Model-E)

We use the Entity Model proposed in Riedel et al. (2013) for learning KG embeddings. This model assumes a vector for each entity and two vectors and for each relation of the KG. The score for the triple is given by,


Training these vectors requires incorrect triples. So, we use the closed world assumption. For each triple , we create two negative triples and by corrupting the object and subject of the triples respectively such that the corrupted triples don’t appear in training, test or validation data. The loss for a triple pair is defined as

. Then, the aggregate loss function is defined as


3.3 Objective

The overall loss function can be written as follows:


Where is the regularization term and and are hyper-parameters controlling the trade-off among different terms in the objective function.

4 Experiments and Results

4.1 Datasets

We use the FB15k-237Toutanova and Chen (2015) dataset for experiments. It contains entities and relations. The triples are split into training, validation and test set having , and triples respectively. For extracting entity co-occurrences, we use the textual relations used in Toutanova et al. (2015). It contains around 3.7 millions textual triples, which we use for calculating PMI for entity pairs.

4.2 Experimental Setup

We use the method proposed in Riedel et al. (2013) as the baseline. Please refer to Section 3.2 for more details. For evaluating the learned embeddings, we test them on different tasks. All the hyper-parameters are tuned using performance (MRR) on validation data. We use 100 dimensions after cross validating among 50, 100 and 200 dimensions. For regularization, we use (from ) and (from ) for

and coherence regularization respectively. We use multiple random initializations sampled from a Gaussian distribution. For optimization, we use gradient descent and stop optimization when gradient becomes

upto decimal places. The final performance measures are reported for test data.

4.3 Results

In following sections, we compare the performance of the proposed method with the baseline method in different tasks. Please refer to Table 1 for results.

4.3.1 Interpretability

For evaluating the interpretability, we use (Equation 2) , automated and manual word intrusion tests. In word intrusion test Chang et al. (2009), top entities along a dimension are mixed with the bottom most entity (the intruder) in that dimension and shuffled. Then multiple (3 in our case) human annotators are asked to find out the intruder. We use majority voting to finalize one intruder. Amazon Mechanical Turk was used for crowdsourcing the task and we used randomly selected dimensions for evaluation. For automated word intrusion Lau et al. (2014), we calculate following score for all entities


where are the PMI scores. The entity having least score is identified as the intruder. We report the fraction of dimensions for which we were able to identify the intruder correctly.

As we can see in Table 1, the proposed method achieves better values for as a direct consequence of the regularization term, thereby maximizing coherence between appropriate entities. Performance on the word intrusion task also improves drastically as the intruder along each dimension is a lot easier to identify owing to the fact that the top entities for each dimension group together more conspicuously.

Method Link Prediction
MRR MR Hits@10(%)
Triple Classification
AUC(%) Accuracy(%)
AutoWI@5(%) Coherence@5 Manual WI(%)
Table 1: Results on test data. The proposed method significantly improves interpretability while maintaining comparable performance on KG tasks (Section 4.3).
Top 5
-Jurist, Pipe organ, USA, Lions Gate Entertainment, UK
-Guitar, 71st Academy Awards, Jurist, Piano, Bass guitar
-Actor, Official Website, Screenwriter, Film Producer, USA
-Jurist, USA, Marriage, Male, UK
-Pipe organ, Official Website, Actor, Film Producer, Screenwriter
Proposed Method
-Juris Doctor, Business Administration, Biology, Psychology, BS
-Bachelor of Arts, PhD, Bachelor’s degree, BS, MS
-European Union, Europe, Netherlands, Portugal, Government
-UK, Hollywood, DVD, London, Europe
-Hollywood, Academy Awards, USA, DVD, Los Angeles
Table 2: Top 5 and bottom most entities for randomly selected dimensions. As we see, the proposed method produces more coherent entities compared to the baseline. Incoherent entities are marked in bold face. 111We have used abbreviations for BS (Bachelor of Science), MS (Master of Science), UK (United Kingdom) and USA (United States of America). They appear as full form in the data.

4.3.2 Link Prediction

In this experiment, we test the model’s ability to predict the best object entity for a given subject entity and relation. For each of the triples, we fix the subject and the relation and rank all entities (within same category as true object entity) based on their score according to Equation 4. We report Mean Rank (MR) and Mean Reciprocal rank (MRR) of the true object entity and Hits@10 (the number of times true object entity is ranked in top 10) as percentage.

The objective of the coherence regularization term being tangential to that of the original loss function, is not expected to affect performance on the link prediction task. However, the results show a trivial drop of in MRR as the coherence term gives credibility to triples that are otherwise deemed incorrect by the closed world assumption.

4.3.3 Triple Classification

In this experiment, we test the model on classifying correct and incorrect triples. For finding incorrect triples, we corrupt the object entity with a randomly selected entity within the same category. For classification, we use validation data to find the best threshold for each relation by training an SVM classifier and later use this threshold for classifying test triples. We report the mean accuracy and mean AUC over all relations.

We observe that the proposed method achieves slightly better performance for triple classification improving the accuracy by . The PMI information adds more evidence to the correct triples which are related in text data, generating a better threshold that more accurately distinguishes correct and incorrect triples.

4.4 Qualitative Analysis of Results

Since our aim is to induce interpretability in representations, in this section, we evaluate the embeddings learned by the baseline as well as the proposed method. For both methods, we select some dimensions randomly and present top 5 entities along those dimensions. The results are presented in Table 2.

As we can see from the results, the proposed method produces more coherent entities than the baseline method.

5 Conclusion and Future Works

In this work, we proposed a method for inducing interpretability in KG embeddings using a coherence regularization term. We evaluated the proposed and the baseline method on the interpretability of the learned embeddings. We also evaluated the methods on different KG tasks and compared their performance. We found that the proposed method achieves better interpretability while maintaining comparable performance on KG tasks. As next steps, we plan to evaluate the generalizability of the method with more recent KG embeddings.


  • Bordes et al. (2011) Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011. Learning structured embeddings of knowledge bases. In

    Conference on Artificial Intelligence

    . EPFL-CONF-192344.
  • Chang et al. (2009) Jonathan Chang, Jordan L Boyd-Graber, Sean Gerrish, Chong Wang, and David M Blei. 2009. Reading tea leaves: How humans interpret topic models. In Nips. volume 31, pages 1–9.
  • Fan et al. (2014) Miao Fan, Qiang Zhou, Emily Chang, and Thomas Fang Zheng. 2014. Transition-based knowledge graph embedding with relational mapping properties. In PACLIC. pages 328–337.
  • Ji et al. (2015) Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In ACL (1). pages 687–696.
  • Lau et al. (2014) Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In EACL. pages 530–539.
  • Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI. pages 2181–2187.
  • Murphy et al. (2012) Brian Murphy, Partha Pratim Talukdar, and Tom Mitchell. 2012. Learning effective and interpretable semantic models using non-negative sparse embedding. In International Conference on Computational Linguistics (COLING 2012), Mumbai, India.
  • Riedel et al. (2013) Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. 2013. Relation extraction with matrix factorization and universal schemas. NAACL HLT 2013 pages 74–84.
  • Socher et al. (2013) Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013.

    Reasoning with neural tensor networks for knowledge base completion.

    In Advances in Neural Information Processing Systems. pages 926–934.
  • Toutanova and Chen (2015) Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In 3rd Workshop on Continuous Vector Space Models and Their Compositionality. ACL – Association for Computational Linguistics.
  • Toutanova et al. (2015) Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. 2015. Representing Text for Joint Embedding of Text and Knowledge Bases. In

    Empirical Methods in Natural Language Processing (EMNLP)

    . ACL – Association for Computational Linguistics.
  • Wang et al. (2014) Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014.

    Knowledge graph embedding by translating on hyperplanes.

    In AAAI. Citeseer, pages 1112–1119.
  • Xiao et al. (2015a) Han Xiao, Minlie Huang, Yu Hao, and Xiaoyan Zhu. 2015a. Transa: An adaptive approach for knowledge graph embedding. arXiv preprint arXiv:1509.05490 .
  • Xiao et al. (2015b) Han Xiao, Minlie Huang, Yu Hao, and Xiaoyan Zhu. 2015b. Transg: A generative mixture model for knowledge graph embedding. arXiv preprint arXiv:1509.05488 .
  • Xiao et al. (2016) Han Xiao, Minlie Huang, and Xiaoyan Zhu. 2016. Knowledge semantic representation: A generative model for interpretable knowledge graph embedding. arXiv preprint arXiv:1608.07685 .