Knowledge Graph Embedding Bi-Vector Models for Symmetric Relation

05/23/2019 ∙ by Jinkui Yao, et al. ∙ 0

Knowledge graph embedding (KGE) models have been proposed to improve the performance of knowledge graph reasoning. However, there is a general phenomenon in most of KGEs, as the training progresses, the symmetric relations tend to zero vector, if the symmetric triples ratio is high enough in the dataset. This phenomenon causes subsequent tasks, e.g. link prediction etc., of symmetric relations to fail. The root cause of the problem is that KGEs do not utilize the semantic information of symmetric relations. We propose KGE bi-vector models, which represent the symmetric relations as vector pair, significantly increasing the processing capability of the symmetry relations. We generate the benchmark datasets based on FB15k and WN18 by completing the symmetric relation triples to verify models. The experiment results of our models clearly affirm the effectiveness and superiority of our models against baseline.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The knowledge graph, a structured knowledge base, represents world’s truth in a form that computer can easily process. As the basis of question answering and knowledge inference, etc., the knowledge graph has received extensive attention from academia and industry.

In recent years, knowledge graph reasoning has made significant progress. There are two main branches, logical reasoning and representation learning, each with its own advantages and disadvantages. Logical reasoning based on the rigorous mathematical foundation is difficult to solve the computational bottleneck of the combinatorial explosion. Knowledge representation learning based on statistics has attracted more attention because of the development of machine learning and deep learning at present, but it is limited by the incompleteness and the scale of the knowledge base.

Usually, each fact of the knowledge graph is represented by a triple , where and are the head entity and the tail entity, respectively, and is the relation between them.


Figure 1: symmetric relations spouse

For example, the triple means that Trump’s spouse is Melania, in which Trump is the head entity, the spouse is the relation, and Melania is the tail entity. Semantically, relation is symmetric, shown in figure1, and simultaneously hold.

KGE aims to embed the entities and relations into low-dimensional real vectors, and then learns the representations of them. TransE [1] is the earliest KGE model and has derived a series of models called Trans series models or Trans models. Most of Trans models based on vector addition calculation, which are difficult to apply well in symmetric relations.

We propose bi-vector models extended the Trans models for symmetric relations. Different from the Trans models using a single vector to represent the entity or relation, We adopt bi-vector to represent symmetric relation. The score functions of the two subvectors are calculated separately. With the increase of training epochs, the two subvectors are separated step by step. And then, models can distinguish the two directions of the symmetric relation

Two benchmark datasets, FB15k-SYM and WN18-SYM construced by us for running bi-vector models on them The experimental results show that our method can effectively improve the triple prediction accuracy of symmetry relations. The main contributions of this paper as follow.

  1. We propose bi-vector models which improve the prediction accuracy of symmetric relations.

  2. The symmetric semantic information of relations is combined with KGE, which is a new research method of knowledge graph reasoning.

  3. We run the model on the extended benchmark datasets and verify the effectiveness and advantages of the models.

2 Related Works

We extend three popular KGE models, TransE, TransH [2] and TransD [3], using bi-vector. Therefore, we firstly introduce these.

2.1 TransE, TransH and TransD

•TransE

, the first KGE model proposed, regards relation as the translation from entity to . Entity should be in the nearest neighborhood of . The score function is defined as

(1)

Where is usually as norm or norm. TransE can slove 1-1 relations effectively, but it is not suitable for handling 1-n, n-1 and n-n relations.

•TransH

projects entities and

into the hyperplane which relation

located. TransH calculates , before calculating score function,

(2)

Where is usually as norm. TransH is more accurate than TransE in terms of recognition rate of 1-n, n-1 and n-n relations.

•TransD

believes that combinations of entities and relations can distinguish the relation more finely. The combination of entity and relation correspondences association matrix . The calculation of score function uses the product of entity and association matrix, form as , . The score function is defined as

(3)

Where is usually as norm.

2.2 Other Models

•Translation based methods

. In addition to TransE(H,D) that we have already mentioned, translation based methods cover the following models. TransR [4] build entity and relation embedding independent spaces, in which, entities , and relation . A projection matrix has been set, and the score funcion is defined as TransSparse [5] set two separate relation sparse matrices and to deal with the issue of sparse data. The score function is defined as .TransF reduces the cost of calculation of relation projection by modeling subspaces of projection matrices, and the score function is defined as , where ,, and are the corresponding coefficients of and .

•Tensor based methods

. DistMult [6] adopts a relation-specific diagonal matrix to represents the characteristics of a relation. The score function is a bilinear function, which score of positive triples should be higher than negative triples. HolE [7] employs circular correlations by holographic to create compositional representations, and has advantages of computation efficiency and representing scalability. RESCAL [8]

adopt tensor factorization to estimate relation axis.

ComplEX [9] embed the entities and relation to complex space, then computes loss vaule.

•Other related methods

. SE[10] defines two relation-specific matrices for , i.e. , and defines the score function as . There are many other KGE models try to try to use various embedding methods, such as Neural Tensor Network (NTN)[11] , Semantic Matching Energy (SME)[12], SLM, TransA, lppTransD, etc.

However, these works did not utilize the semantic information of relations properties. We believe that the semantic information of the relations properties are of value and can improve the performance of the KGE models.

3 Methodology

In order to overcome the lack of support for symmetric relations in KGE, we made the following efforts. First of all we describe the defects of Trans models in handling symmetric relations, and analyze the causes of it. Then, we propose three new models that extends the Trans models to improve the performance of handling symmetry relations in KGE, which are named TransE-SYM, TransH-SYM and TransD-SYM. Finally, we give the definition of the loss functions for these models.

3.1 Problems and causes

Knowledge graph can be represented as a set of ordered triples of entities and relations. Each triple in Knowledge graph is essentially a binary relation, which have the properties of symmetry, anti-symmetric, reflexive, anti-reflexive and transitive properties. This paper focuses on the relation’s properties of symmetry. In graph, symmetric relation have two directed edges in opposite directions.

KGE represents each relation, including symmetric relation, as a low-dimensional real vector. However, a single vector cannot represent two opposite directions.

We take TransE as an example to illustrate the problem of symmetric relations. TransE learns the embedding feature from equation when triplets holds. TransE’s scoring function is defined as . When the function , it means .

Assuming that there is a symmetric relation and triple in , then , ie . Since is symmetric, then the symmetric triple should hold too, satisfying , ie .

Obviously, if both and are correct, if and only if is an additive identity of vector, ie , the conclusion contradicts with the conditions of TransE model.

Taking the symmetric relation as an example, shown in figure1. When the fact holds, the fact holds too. let , and denote entities Melania, Trump and relation spouse, respectively. Then,

(4)
(5)

let Equation(4) + Equation(5),

we have

(6)

According to the KGE preset, the relations should be a non-zero real vector, and Equation (6) contradicts with the condition. The root cause of the above problem is that the symmetric relation is represented by single vector, and the single vector cannot express semantic bifurcation of symmetric relation.


Figure 2: The two subvectors of the symmetric relation are denoted as and , and the results of their score functions can be denoted as distances and . The distance of is shorter, that is, the value of is smaller, and is selected as the training subvector.

3.2 Our Method

Aiming at these problems, bi-vector models for symmetric relation are presented in this study.

Knowledge graph , , Where and are entities set and relations set, respectively.

Symmetric relation , if and are entities of knowledge graph , is the relation of , and , , then relation is symmetric relation.

Different from most of KGE models, which represent entities and relations as single vector, we represent the symmetric relation as a bi-vector with two subvectors, and . Then, in each epoch of learning, the score functions of the two subvectors are calculated, and the better score is selected as the current result. Let be the score function of the Trans series model, as show in Equation(7)

(7)

We have extended three different Trans models, which differ in their respective score functions. In TransE, score function is , where is L1 norm or L2 norm, and the score functions of subvectors are shown as Equation array(8),

(8)

and should be substituted into the following loss function,

(9)

where denotes the margin of hyperplane, and denotes . Similarly, the score function of the TransH model is shown in Equation array (10).

(10)

The score function of the TransH model is shown in Equation array (11).

(11)

The loss functions of them are calculated according to Equation (9).

4 Experiments and results

Dataset train/test/valid
FB15k 14,951 1,345 483,142/50,000/59,071 7.15/0.94/0.744 8.69/8.41/8.34
FB15k-237 14,541 237 272,115/17,535/20,466 12.48/1.44/1.13 14.97/2.65/2.58
FB13 75,043 13 316,232/5,908/23,733 1.31/0.00/0.00 1.42/0.00/0.00
WN18 40,943 18 141,442/5,000/5,000 20.97/0.52/0.72 22.38/19.07/19.01
WN11 38,696 11 112,581/2,609/10,544 1.41/0.06/0.00 1.54/0.20/0.08
WN18RR 40,943 11 86,835/3,134/3,034 34.15/0.83/1.19 36.05/27.38/27.98
Table 1: Statistics of several popular datasets. is the number of entity, and is the number of relations, train/test/valid is the number of train/test/valid set. is number of symmetric triple, is number of complement symmetric triple, is number of triple in dataset, is number of triple in dataset after complement, is percentage of symmetric triples in the train/test/valid set, is percentage of symmetric triples in the train/test/valid set after complement.

4.1 Dataset analysis and preprocessing

In this study, we compared and analyzed the commonly used knowledge graph embedding benchmark data sets FB15k, FB15k-237, FB13, WN18, WN11 and WN18RR. FB15k, FB15k-237 and FB13 are extracted from Freebase[13], which is a large-scale common sense knowledge base provided the general facts of the world. Freebase was acquired by Google and is still under maintenance. WN18, WN11 and WN18R aextract from WordNet [14] and provide semantic knowledge of words.

We count the ratio of the symmetric relations in the data set shown in the table 1. It can be seen that the proportion of symmetric data of the WN18 and FB15k data set are relatively high.

The proportion of symmetric data for relation is denoted as by the paper. We regard as symmetric relation When exceeds the threshold111In this paper, the threshold is set to 0.5..

As shown in the table2, in WN18, the relation has 1139 triples, of which 1060 are symmetric triples, and the ratio of symmetric triples is about 0.93. Semantically, the relation is the meaning of verb grouping, which is obviously a symmetric relation. From the perspective of data distribution, the symmetry rate of the relation is 0.93, and we believe it is symmetrical.

In order to simplify the problem, in this paper, symmetry is only judged by data distribution.We complement the missing symmetric triples in dataset of the symmetric relation. A more formal description is, if relation in knowledge graph is symmetric, for , if and then .

Dataset Relation SYM ALL
FB15k /military/military_combatant/force_deployments/…/combatant 78 84 0.929
/base/fight/crime_type/p…/crime/criminal_conviction/guilty_of 20 21 0.952
/base/twinnedtowns/twinned_town/…/town_twinning/twinned_towns 20 21 0.952
/base/contractbridge/…/bridge_tournament_standings/second_place 18 19 0.947
/sports/sports_position/…/sports-_team_roster/position 108 127 0.850
WN18 _derivationally_related_form 27694 29716 0.931
_verb_group 1060 1139 0.931
_similar_to 74 81 0.914
_also_see 830 1300 0.638

Remark

Table 2: Symmetric relation examples in FB15k and WN18. SYM is the number of symmetric relations, ALL is number of relations and is the proportion of the symmetric relations in the total number of relations.

4.2 Benchmarks

In order to show the superiority of our models, we compare the following benchmark KGE models.

•TransE

is the most widely used KGE model, also the earliest proposed KGE model.

•TransH

projects h and t to the hyperplane where r located, to solve the relations of 1-n, n-1 and n-n.

•TransD

uses the entity-relation matrix to obtain a more fine-grained distinction of realtion.

4.3 Verification problem

In order to verify the problem of the Trans models described in Section 3.1, We have designed the following experiments, the steps are as follows.

  1. Training Trans models. We train the TransE, TransH and TransD models on the datasets which are completed symmetric triples in Section 4.1.

  2. Constructing test dataset.We randomly selected symmetric relations and entities in FB15k and WN18 to construct test sets. Each test set contains 10,000 symmetric triples named FB15k-test-circle and WN18-test-circle. The form of triples in test sets is , where and are respectively symmetric relation and any entity. The triple example is as follows,
    ,
    .

  3. Experimental results. According to Section 3.1, if the symmetric triple is true, the relation tends to zero. We run the test sets on models and the experimental results are shown in Table 3. Almost all randomly generated triples is true. These models completely fail in dealing with all of symmetric relations.

Model Train Dataset Test Dataset MR MRR H10 H3 H1
TransE FB15k-SYM FB15k-test-circle 1.000 1.000 1.000 1.000 1.000
TransH FB15k-SYM FB15k-test-circle 1.000 1.000 1.000 1.000 1.000
TransD FB15k-SYM FB15k-test-circle 1.000 1.000 1.000 1.000 1.000
TransE WN18-SYM WN18-test-circle 1.000 1.000 1.000 1.000 1.000
TransH WN18-SYM WN18-test-circle 1.000 1.000 1.000 1.000 1.000
TransD WN18-SYM WN18-test-circle 1.000 1.000 1.000 1.000 1.000
Table 3: Circle triple test result.

4.4 Result of Experiment.

Three bi-vector Trans models named TransE-SYM, TransH-SYM and TransD-SYM proposed by us. Experimental code implementation reference open source project OpenKE[15]. These models run on datasets completed symmetric relation and get good results. The experimental results are shown in Table 4. Bi-vector models are superior to the original model in indicators of the link prediction task.

FB15k-SYM WN18-SYM
MR MRR H10 H3 H1 MR MRR H10 H3 H1
TransE 66 0.490 0.683 0.461 0.206 493 0.371 0.711 0.544 0.087
TransE-SYM 51 0.534 0.772 0.598 0.329 467 0.485 0.836 0.705 0.246
TransH 80 0.380 0.747 0.539 0.162 688 0.426 0.926 0.828 0.026
TransH-SYM 49 0.432 0.784 0.612 0.344 601 0.577 0.931 0.845 0.120
TransD 185 0.265 0.519 0.297 0.148 711 0.416 0.928 0.787 0.145
TransD-SYM 72 0.642 0.774 0.543 0.335 210 0.886 0.941 0.866 0.374
Table 4: Experimental result

5 Conclusion

This paper introduces symmetry semantics into KGE models, and points out the defect of the state-of-the-art KGE models learning symmetric relations. Bi-vector models proposed by us can improve the situation of low recognition rate of symmetric relations in Trans models.

References

  • [1] Antoine Bordes, Nicolas Usunier, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In NIPS.
  • [2] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge graph embedding by translating on hyperplanes. In AAAI.
  • [3] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In ACL.
  • [4] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning entity and relation embeddings for knowledge graph completion. In AAAI.
  • [5]

    Guoliang Ji, Kang Liu, Shizhu He, and Jun Zhao. (2016). Knowledge graph completion with adaptive sparse transfer matrix. Thirtieth Aaai Conference on Artificial Intelligence.

  • [6] Bishan Yang, Wentau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding entities and relations for learning and inference in knowledge bases. In ICLR.
  • [7] Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. 2016. Holographic embeddings of knowledge graphs. In AAAI.
  • [8] Nickel, M., Tresp, V., Kriegel, H. P. (2011, June). A Three-Way Model for Collective Learning on Multi-Relational Data. In ICML (Vol. 11, pp. 809-816).
  • [9] Johannes Welbl, Sebastian Riedel, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In ICML.
  • [10] Antoine Bordes, Jason Weston, Ronan Collobert, and Yoshua Bengio. 2012. Learning structured embeddings of knowledge bases. In AAAI.
  • [11] Richard Socher, Danqi Chen, Christopher D. Manning, and Andrew Y. Ng. 2013. Reasoning with neural tensor networks for knowledge base completion. In NIPS.
  • [12] Antoine Bordes, Xavier Glorot, and Jason Weston. 2012. Joint learning of words and meaning representations for open-text semantic parsing. In International Conference on Artificial Intelligence and Statistics.
  • [13] Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD.
  • [14] George A. Miller. 1994. Wordnet: a lexical database for english. In The Workshop on Human Language Technology.
  • [15] Han, Xu and Cao, Shulin and Lv, Xin and Lin, Yankai and Liu, Zhiyuan and Sun, Maosong and Li, Juanzi. 2018. OpenKE: An Open Toolkit for Knowledge Embedding. In Proceedings of EMNLP