1 Introduction
Knowledge Graphs such as Freebase, WordNet etc. have become important resources for supporting many AI applications like web search, Q&A etc. They store a collection of facts in the form of a graph. The nodes in the graph represent real world entities such as Roger Federer, Tennis, United States etc while the edges represent relationships between them.
These KGs have grown huge, but they are still not complete Toutanova et al. (2015). Hence the task of inferring new facts becomes important. Many vector space models have been proposed which can perform reasoning over KGs efficiently Bordes et al. (2011), Wang et al. (2014), Lin et al. (2015), Socher et al. (2013), Riedel et al. (2013), Toutanova et al. (2015) etc. These methods learn representations for entities and relations as vectors in a vector space, capturing global information about the KG. The task of KG inference is then defined as operations over these vectors. Some of these methods like Riedel et al. (2013), Toutanova et al. (2015) are capable of exploiting additional text data apart from the KG, resulting in better representations.
Although these methods have shown good performance in applications, they don’t address the problem of understanding semantics of individual dimensions of the KG embedding. A recent work Xiao et al. (2016) addressed the problem of learning semantic features for KGs. However, they don’t directly use vector space modeling.
In this work, we focus on incorporating interpretability in KG embeddings. Specifically, we aim to learn interpretable embeddings for KG entities by incorporating additional entity cooccurrence statistics from text data. This work is motivated by Lau et al. (2014) who presented automated methods for evaluating topics learned via topic modelling methods. We adapt these measures for the vector space model and propose a method to directly maximize them while learning KG embedding. To the best of our knowledge, this work presents the first regularization term which induces interpretability in KG embeddings.
2 Related Work
Several methods have been proposed for learning KG embeddings. They differ on the modeling of entities and relations, usage of text data and interpretability of the learned embeddings. We summarize some of these methods in following sections.
2.1 Vectorspace models for KG Embeddings
A very effective and powerful set of models are based on translation vectors. These models represent entities as vectors in dimensional space, and relations as translation vectors from head entity to tail entity, in either same or a projected space. TransEBordes et al. (2011) is one of the initial works, which was later improved by many works [Wang et al. (2014), Lin et al. (2015), Xiao et al. (2015b), Xiao et al. (2015a), Ji et al. (2015), Fan et al. (2014)]. Also, there are methods which are able to incorporate text data while learning KG embeddings. Riedel et al. (2013) is one such method, which assumes a combined universal schema of relations from KG as well as text. Toutanova et al. (2015) further improves the performance by sharing parameters among similar textual relations.
2.2 Interpretability of Embedding
While the vector space models perform well in many tasks, the semantics of learned representations are not directly clear. This problem for word embeddings was addressed by Murphy et al. (2012) where they proposed a set of constraints inducing interpretability. However, its adaptation for KG embeddings hasn’t been addressed. A recent work Xiao et al. (2016) addressed a similar problem, where they learn coherent semantic features for entities and relations in KG. Our method differs from theirs in the following two aspects. Firstly, we use vector space modeling leading directly to KG embeddings while they need to infer KG embeddings from their probabilistic model. Second, we incorporate additional information about entities which helps in learning interpretable embeddings.
3 Proposed Method
We are interested in inducing interpretability in KG embeddings and regularization is one good way to do it. So we want to look at novel regularizers in KG embeddings. Hence, we explore a measure of coherence proposed in Lau et al. (2014). This measure allows automated evaluation of the quality of topics learned by topic modeling methods by using additional Pointwise Mutual Information (PMI) for word pairs. It was also shown to have high correlation with human evaluation of topics.
Based on this measure of coherence, we propose a regularization term. This term can be used with existing KG embedding methods (eg Riedel et al. (2013)) for inducing interpretability. It is described in the following sections.
3.1 Coherence
In topic models, coherence of a topic can be determined by semantic relatedness among top entities within the topic. This idea can also be used in vector space models by treating dimensions of the vector space as topics. With this assumption, we can use a measure of coherence defined in following section for evaluating interpretability of the embeddings.
3.1.1
has been shown to have high correlation with human interpretability of topics learned via various topic modeling methodsLau et al. (2014). Hence, we can expect interpretable embeddings by maximizing it.
Coherence for top entities along dimension is defined as follows:
(1) 
where is PMI score between entities and extracted from text data. for the entity embedding matrix is defined as the average over all dimensions.
(2) 
3.1.2 Inducing coherence while learning embeddings
We want to learn an embedding matrix which has high coherence (i.e. which maximizes ). Since changes during training, the set of top entities along each dimension varies over iterations. Hence, directly maximizing seems to be tricky.
An alternate approach could be to promote higher values for entity pairs having high PMI score . This will result in an embedding matrix with a high value of since high PMI entity pairs are more likely to be among top entities.
This idea can be captured by following coherence term
(3) 
where is entitypair PMI matrix and denote vector for entity . This term can be used in the objective function defined in Equation 6
3.2 Entity Model (ModelE)
We use the Entity Model proposed in Riedel et al. (2013) for learning KG embeddings. This model assumes a vector for each entity and two vectors and for each relation of the KG. The score for the triple is given by,
(4) 
Training these vectors requires incorrect triples. So, we use the closed world assumption. For each triple , we create two negative triples and by corrupting the object and subject of the triples respectively such that the corrupted triples don’t appear in training, test or validation data. The loss for a triple pair is defined as
. Then, the aggregate loss function is defined as
(5) 
3.3 Objective
The overall loss function can be written as follows:
(6) 
Where is the regularization term and and are hyperparameters controlling the tradeoff among different terms in the objective function.
4 Experiments and Results
4.1 Datasets
We use the FB15k237Toutanova and Chen (2015) dataset for experiments. It contains entities and relations. The triples are split into training, validation and test set having , and triples respectively. For extracting entity cooccurrences, we use the textual relations used in Toutanova et al. (2015). It contains around 3.7 millions textual triples, which we use for calculating PMI for entity pairs.
4.2 Experimental Setup
We use the method proposed in Riedel et al. (2013) as the baseline. Please refer to Section 3.2 for more details. For evaluating the learned embeddings, we test them on different tasks. All the hyperparameters are tuned using performance (MRR) on validation data. We use 100 dimensions after cross validating among 50, 100 and 200 dimensions. For regularization, we use (from ) and (from ) for
and coherence regularization respectively. We use multiple random initializations sampled from a Gaussian distribution. For optimization, we use gradient descent and stop optimization when gradient becomes
upto decimal places. The final performance measures are reported for test data.4.3 Results
In following sections, we compare the performance of the proposed method with the baseline method in different tasks. Please refer to Table 1 for results.
4.3.1 Interpretability
For evaluating the interpretability, we use (Equation 2) , automated and manual word intrusion tests. In word intrusion test Chang et al. (2009), top entities along a dimension are mixed with the bottom most entity (the intruder) in that dimension and shuffled. Then multiple (3 in our case) human annotators are asked to find out the intruder. We use majority voting to finalize one intruder. Amazon Mechanical Turk was used for crowdsourcing the task and we used randomly selected dimensions for evaluation. For automated word intrusion Lau et al. (2014), we calculate following score for all entities
(7) 
where are the PMI scores. The entity having least score is identified as the intruder. We report the fraction of dimensions for which we were able to identify the intruder correctly.
As we can see in Table 1, the proposed method achieves better values for as a direct consequence of the regularization term, thereby maximizing coherence between appropriate entities. Performance on the word intrusion task also improves drastically as the intruder along each dimension is a lot easier to identify owing to the fact that the top entities for each dimension group together more conspicuously.
Method  Link Prediction  

MRR  MR  Hits@10(%)  
Baseline  
Proposed  
Triple Classification  
AUC(%)  Accuracy(%)  
Baseline  
Proposed  
Interpretability  
AutoWI@5(%)  Coherence@5  Manual WI(%)  
Baseline  
Proposed 
Top 5 
Baseline 
Jurist, Pipe organ, USA, Lions Gate Entertainment, UK 
Guitar, 71st Academy Awards, Jurist, Piano, Bass guitar 
Actor, Official Website, Screenwriter, Film Producer, USA 
Jurist, USA, Marriage, Male, UK 
Pipe organ, Official Website, Actor, Film Producer, Screenwriter 
Proposed Method 
Juris Doctor, Business Administration, Biology, Psychology, BS 
Bachelor of Arts, PhD, Bachelor’s degree, BS, MS 
European Union, Europe, Netherlands, Portugal, Government 
UK, Hollywood, DVD, London, Europe 
Hollywood, Academy Awards, USA, DVD, Los Angeles 
4.3.2 Link Prediction
In this experiment, we test the model’s ability to predict the best object entity for a given subject entity and relation. For each of the triples, we fix the subject and the relation and rank all entities (within same category as true object entity) based on their score according to Equation 4. We report Mean Rank (MR) and Mean Reciprocal rank (MRR) of the true object entity and Hits@10 (the number of times true object entity is ranked in top 10) as percentage.
The objective of the coherence regularization term being tangential to that of the original loss function, is not expected to affect performance on the link prediction task. However, the results show a trivial drop of in MRR as the coherence term gives credibility to triples that are otherwise deemed incorrect by the closed world assumption.
4.3.3 Triple Classification
In this experiment, we test the model on classifying correct and incorrect triples. For finding incorrect triples, we corrupt the object entity with a randomly selected entity within the same category. For classification, we use validation data to find the best threshold for each relation by training an SVM classifier and later use this threshold for classifying test triples. We report the mean accuracy and mean AUC over all relations.
We observe that the proposed method achieves slightly better performance for triple classification improving the accuracy by . The PMI information adds more evidence to the correct triples which are related in text data, generating a better threshold that more accurately distinguishes correct and incorrect triples.
4.4 Qualitative Analysis of Results
Since our aim is to induce interpretability in representations, in this section, we evaluate the embeddings learned by the baseline as well as the proposed method. For both methods, we select some dimensions randomly and present top 5 entities along those dimensions. The results are presented in Table 2.
As we can see from the results, the proposed method produces more coherent entities than the baseline method.
5 Conclusion and Future Works
In this work, we proposed a method for inducing interpretability in KG embeddings using a coherence regularization term. We evaluated the proposed and the baseline method on the interpretability of the learned embeddings. We also evaluated the methods on different KG tasks and compared their performance. We found that the proposed method achieves better interpretability while maintaining comparable performance on KG tasks. As next steps, we plan to evaluate the generalizability of the method with more recent KG embeddings.
References

Bordes et al. (2011)
Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2011.
Learning structured embeddings of knowledge bases.
In
Conference on Artificial Intelligence
. EPFLCONF192344.  Chang et al. (2009) Jonathan Chang, Jordan L BoydGraber, Sean Gerrish, Chong Wang, and David M Blei. 2009. Reading tea leaves: How humans interpret topic models. In Nips. volume 31, pages 1–9.
 Fan et al. (2014) Miao Fan, Qiang Zhou, Emily Chang, and Thomas Fang Zheng. 2014. Transitionbased knowledge graph embedding with relational mapping properties. In PACLIC. pages 328–337.
 Ji et al. (2015) Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In ACL (1). pages 687–696.
 Lau et al. (2014) Jey Han Lau, David Newman, and Timothy Baldwin. 2014. Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. In EACL. pages 530–539.
 Lin et al. (2015) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI. pages 2181–2187.
 Murphy et al. (2012) Brian Murphy, Partha Pratim Talukdar, and Tom Mitchell. 2012. Learning effective and interpretable semantic models using nonnegative sparse embedding. In International Conference on Computational Linguistics (COLING 2012), Mumbai, India. http://aclweb.org/anthology/C/C12/C121118.pdf.
 Riedel et al. (2013) Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M Marlin. 2013. Relation extraction with matrix factorization and universal schemas. NAACL HLT 2013 pages 74–84.

Socher et al. (2013)
Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. 2013.
Reasoning with neural tensor networks for knowledge base completion.
In Advances in Neural Information Processing Systems. pages 926–934.  Toutanova and Chen (2015) Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In 3rd Workshop on Continuous Vector Space Models and Their Compositionality. ACL – Association for Computational Linguistics.

Toutanova et al. (2015)
Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi
Choudhury, and Michael Gamon. 2015.
Representing Text for Joint Embedding of Text and Knowledge Bases.
In
Empirical Methods in Natural Language Processing (EMNLP)
. ACL – Association for Computational Linguistics. 
Wang et al. (2014)
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014.
Knowledge graph embedding by translating on hyperplanes.
In AAAI. Citeseer, pages 1112–1119.  Xiao et al. (2015a) Han Xiao, Minlie Huang, Yu Hao, and Xiaoyan Zhu. 2015a. Transa: An adaptive approach for knowledge graph embedding. arXiv preprint arXiv:1509.05490 .
 Xiao et al. (2015b) Han Xiao, Minlie Huang, Yu Hao, and Xiaoyan Zhu. 2015b. Transg: A generative mixture model for knowledge graph embedding. arXiv preprint arXiv:1509.05488 .
 Xiao et al. (2016) Han Xiao, Minlie Huang, and Xiaoyan Zhu. 2016. Knowledge semantic representation: A generative model for interpretable knowledge graph embedding. arXiv preprint arXiv:1608.07685 .
Comments
There are no comments yet.