On the Knowledge Graph Completion Using Translation Based Embedding: The Loss Is as Important as the Score

09/02/2019 ∙ by Mojtaba Nayyeri, et al. ∙ Microsoft University of Bonn 0

Knowledge graphs (KGs) represent world's facts in structured forms. KG completion exploits the existing facts in a KG to discover new ones. Translation-based embedding model (TransE) is a prominent formulation to do KG completion. Despite the efficiency of TransE in memory and time, it suffers from several limitations in encoding relation patterns such as many-to-many relation patterns, symmetric, reflexive etc. To tackle this problem, most of the attempts have circled around the revision of the score function of TransE i.e., proposing a more complicated score function such as Trans(A, D, G, H, R, etc) to mitigate the limitations. In this paper, we tackle this problem from a different perspective. We pose theoretical investigations of the main limitations of TransE in the light of loss function rather than the score function. To the best of our knowledge, this has not been investigated so far comprehensively. We show that by a proper selection of the loss function for training the TransE model, the main limitations of the model are mitigated. This is explained by setting upper-bound for the scores of positive samples, showing the region of truth (i.e., the region that a triple is considered positive by the model). Our theoretical proofs with experimental results fill the gap between the capability of translation-based class of embedding models and the loss function. The theories emphasize the importance of the selection of the loss functions for training the models. Our experimental evaluations on different loss functions used for training the models justify our theoretical proofs and confirm the importance of the loss functions on the performance.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge is considered as commonsense facts and other information accumulated from different sources. Throughout history, civilizations have evolved due to increase in the knowledge. With the passage of time, humans obtain many relations among different entities. Therefore, development of proper knowledge representation (KR) and management systems is essential.

The aim of KR is to study how the beliefs can be represented in an explicit, symbolic notation proper to automated reasoning. Knowledge Graph (KG) is a new direction for KR. KGs are usually represented as a set of triples (

) where are entities and is a relation, e.g. . Entities and relations are nodes and edges in the graph, respectively.

KGs are inherently incomplete, making prediction of missing links always relevant. Among different approaches used for KG completion, KG Embedding (KGE) has recently received growing attentions. KGE embeds entities and relations as low dimensional vectors. To measure the degree of plausibility of a triple, a scoring function is defined over the embeddings.

TransE, Translation-based Embedding model, Bordes et al. (2013) is one of the most widely used KGEs. The original assumption of TransE is to hold: , for every positive triple () where are embedding vectors of head, relation and tail respectively.

TransE and its many variants like TransH Wang et al. (2014) and TransR Lin et al. (2015b), underperform greatly compared to the current state-of-the-art embedding models due to the inherent limitations of their scoring functions.

Recent work has the main limitations of Translation-based models. expressive reveals that TransE cannot encode a relation pattern which is neither reflexive nor irreflexive. sun2019rotate prove that TransE is incapable of encoding symmetric relation. transH adds that TransE cannot properly encode reflexive, one-to-many, many-to-one and many-to-many relations.

TransH, TransR and TransD Wang et al. (2014); Lin et al. (2015b); Ji et al. (2015) can handle the mentioned problems of TransE (i.e. one-to-many, many-to-one, many-to-many and reflexive) by projecting entities to relation space before applying translation. kazemi2018simple investigate three additional limitations of TransE, FTransE Feng et al. (2016), STransE Nguyen et al. (2016), TransH and TransR models: (i) if the models encode a reflexive relation , they automatically encode symmetric, (ii) if the models encode a reflexive relation , they automatically encode transitive and, (iii) if entity has relation with every entity in and entity has relation with one of entities in , then must have the relation with every entity in .

The mentioned works have investigated these limitations by focusing on the capability of scoring functions in encoding relation patterns. However, we prove that the selection of loss function affects the boundary of score functions; consequently, the selection of loss functions significantly affects the limitations. Therefore, the above mentioned theories corresponding to the limitations of translation-based embedding models in encoding relation patterns are inaccurate. We pose new theories about the limitations of TransX models considering the loss functions. To the best of our knowledge, it is the first time that the effect of loss function is investigated to prove theories corresponding to the limitations of translation-based models.

In a nutshell, the key contributions of this paper is as follows: (i) We show that different loss functions enforce different upper-bounds and lower-bounds for the scores of positive and negative samples respectively. This implies that existing theories corresponding the limitation of TransX models are inaccurate. We introduce new theories accordingly and prove that the proper selection of loss functions mitigates the main limitations. (ii) We reformulate the existing loss functions and their optimization problems as an standard generalized constraint optimization problem. This makes perfectly clear for the community that which losses are suitable for which conditions. (iii) using symmetric relation patterns, we obtain the optimal upper-bound of positive triples score to enable encoding of symmetric patterns. (iv) we show the reason behind the positive effect of normalization of entities on the performance when a Margin Ranking Loss is used for training the TransX models.

2 Related Works

In this paper we extend the score function of TransE from real space to the complex space which is less restrictive in encoding relation patterns according to our theories. Accordingly, in this section, we review the score functions of TransE and some of its variants. Then, in the next section the existing limitations of Translation-based embedding models emphasized in recent works are reviewed. These limitations will be reinvestigated in the light of score and loss functions in the section 4.

TransE Bordes et al. (2013) is one of the earlier KGE models which is efficient in both time and space. The score function of TransE is defined as:

TransH Wang et al. (2014) projects each entity (e) to the relation space (). The score function is defined as . TransH can encode reflexive, one-to-many, many-to-one and many-to-many relations. However, recent theories Kazemi and Poole (2018) prove that encoding reflexive results in encoding the both symmetric and transitive which is undesired.

TransR Lin et al. (2015b) projects each entity (e) to the relation space by using a matrix provided for each relation (). TransR uses the same scoring function as TransH.

TransD Ji et al. (2015) provides two vectors for each individual entities and relations (). Head and tail entities are projected by using the following matrices:

The score function of TransD is similar to the score function of TransH.

RotatE Sun et al. (2019) rotates the head to the tail entity by using relation. RotatE embeds entities and relations in Complex space. By inclusion of constraints on the norm of entity vectors, the model would be degenerated to TransE. The scoring function of RotatE is where and is element-wise product.

TorusE Ebisu and Ichise (2018) fixes the problem of regularization in TransE by applying translation on a compact Lie group. The model has several variants including mapping from torus to Complex space. In this case, the model is regarded as a very special case of RotatE Sun et al. (2019) that applies rotation instead of translation in the target Complex space. According to Sun et al. (2019), TorusE is not defined on the entire Complex space. Therefore, it has less representation capacity. TorusE needs a very big embedding dimension (10000 as reported in Ebisu and Ichise (2018)) which is a limitation.

3 The Main Limitations Of Translation-based Embedding models

Here we review the six limitations of translation-based embedding models in encoding relation patterns (e.g., reflexive, symmetric) mentioned in the literature: Wang et al. (2014); Kazemi and Poole (2018); Wang et al. (2018); Sun et al. (2019).

Limitation L1: Wang et al. (2014): TransE cannot encode reflexive relations.

Limitation L2 Wang et al. (2018): if TransE encodes a relation , which is neither reflexive nor irreflexive the following equations should be held simultaneously: Therefore, both should be held, which result in contradiction. In this regard, TransE cannot encode a relation which is neither reflexive nor irreflexive.

Limitation L3 Sun et al. (2019): If relation is symmetric, the following equations should be held: and . Therefore, and so TransE cannot properly encode symmetric relation.

The following limitations are held for TransE, FTransE Feng et al. (2016), STransE Nguyen et al. (2016), TransH and TransR.

Limitation L4 Kazemi and Poole (2018): if a relation is reflexive on where is the set of all entities in the KG, must also be symmetric.

Limitation L5 Kazemi and Poole (2018): if is reflexive on must also be transitive.

Limitation L6 Kazemi and Poole (2018): if entity has relation with every entity in and entity has relation with one of entities in , then must have the relation with every entity in .

4 Our Model

TransE and its variants underperform compared to other embedding models due to their limitations we iterated in Section 3. In this section, we reinvestigate the limitations. We show that the corresponding theoretical proofs are inaccurate. So we propose new theories and prove that each of the limitations are resolved by revising either the scoring function or the loss. In this regard, we first propose a new variant of TransE, i.e. TransComplEx. TransComplEx with a proper selection of loss function mitigates the limitations as we discuss as follows.

4.1 TransComplEx: Translational Embedding Model in Complex Space

Inspired by Trouillon et al. (2016), in this section we propose TransComplEx that translates head entity vector to the tail entity vector using relation vector in Complex space. The score function is defined as follows:

(1)

where are complex vectors i.e., each elements of the vectors is a complex number. For example, the -th element of the vector h is denoted by . Respectively, denote real and imaginary parts of a complex number. The complex vector h contains real and imaginary vectors parts i.e. . is conjugate of the complex vector

Advantages of TransComplEx:

i) Comparing to TransE and its variants, TransComplEx has less limitations in encoding different relation patterns. The theories and proofs are provided in the next part.

ii) Using conjugate of tail vector in the formulation enables the model to make difference between the role of an entity as subject or object. This cannot be properly captured by TransE and its variants.

iii) Given the example , that plays for may affect the person to like the team. This type of information cannot be properly captured by models such as CP decomposition Hitchcock (1927) where two independent vectors are provided Kazemi and Poole (2018) for (for subject and object). In contrast, our model uses same real and imaginary vectors for when it is used as subject or object. Therefore, TransComplEx can properly capture dependency between the two triples with the same entity used as subject and object.

iiii) ComplEx Trouillon et al. (2016) has much more computational complexity comparing to TransComplEx because it needs to compute eight vector multiplications to obtain score of a triple while our model only needs to do four vector summation/subtractions. In the experiment section, we show that TransComplEx outperforms ComplEx on various dataset.

4.2 Reinvestigation of the Limitations of Translation-based Embedding Models

The aim of this part is to analyze the limitations of Translation-based embedding models (including TransComplEx) by considering the effect of both score and loss functions. Different loss functions provide different upper-bound and lower-bound for positive and negative triples scores, respectively. Therefore, the loss functions affect the limitations of the models to encode relation patterns. To investigate the limitations, we redefine the conditions that a triple is considered as positive or negative by defining upper-bound and lower-bound for the scores.

Lets , be the score of a positive () and negative () triples respectively. The negative triple () is generated by corruption of either head or tail of the triple () as mentioned in Bordes et al. (2013). Four conditions are defined as follows:

(2)
Figure 1: The region of truth for a triple: A triple is positive if (a) its residual vector (i.e., ) becomes 0 (b) its residual vector (i.e., ) lies on the border of a sphere with radius (c) its residual vector (i.e., ) lies inside of a sphere with radius , (d) its residual vector (i.e., ) lies inside of a sphere with radius

.

Figure 2: Necessity condition for encoding symmetric relation: (a) when the model cannot encode symmetric relation.There is not any common points between two hyperspheres). (b) when the intersection of two hyperspheres is a point. u = 0 means embedding vectors of all entities should be same. Therefore, symmetric relation cannot be encoded. (c) if symmetric relation can be encoded because there are several points which are intersection of two hyperspheres.

Figure 1 visualizes different conditions mentioned above. The condition (a) indicates a triple is positive if holds. It means that the length of residual vector i.e., is zero. It is the most strict condition that expresses being positive. Authors in Sun et al. (2019); Kazemi and Poole (2018) consider this condition to prove their theories.

Condition (b) considers a triple to be positive if its residual vector lies on a hyper-sphere with radius It is less restrictive than the condition (a) which considers a point to express being positive. The loss function that satisfies the conditions (a) () and (b) () is as follows:

(3)

Condition (c) considers a triple to be positive if its residual vector lies inside a hyper-sphere with radius The loss function that satisfies the condition (c) is as follows Nayyeri et al. (2019):

(4)

Remark: The loss function which is defined in Zhou et al. (2017a) is slightly different from the loss 3. The former slides the margin while the later fixes the margin by inclusion of a lower-bound for the score of negative triples. The both losses put an upper-bound for scores of positive triples.

Condition (d) is similar to (c). But it provides different for each triples. Margin ranking loss satisfies the condition (d). The loss is defined as:

(5)

where . Considering the conditions (a), (b), (c) and (d), we investigate the limitations L1 ,…, L6. We prove that existing theories are invalid under some conditions.

Limitation L1: Lemma 1: Let assumption (a) holds, then TransE and TransComplEx cannot infer a reflexive relation pattern. With assumptions (b), (c) and (d), however, this is not true anymore and the models can infer reflexive relation patterns

Proof: the proofs are provided in the supplementary material file.

Limitation L2: Lemma 2: 1) TransComplEx can infer a relation pattern which is neither reflexive nor irreflexive with condition (b), (c) and (d). 2) TransE cannot infer the relation pattern which is neither reflexive nor irreflexive.

Limitation L3: Lemma 3: 1) TransComplEx can infer symmetric patterns with condition (a), (b), (c) and (d). 2) TransE cannot infer symmetric patterns with condition (a). 3) TransE can infer a relation pattern which is symmetric with conditions (b).

Proof: proof of 1) and 2) are included in the supplementary material.

3) For TransE with condition (b), there is

(6)
(7)

The necessity condition for encoding symmetric relation is This implies Let , by definition we have

Let . We have

(8)

Regarding 8, there is

.

To avoid contradiction, . If we have Therefore, TransE can encode symmetric pattern with condition (b), if and . Figure 2 show different conditions for encoding symmetric relation.

Limitation L4: Lemma 4: 1) Let (a) holds. Limitation L4 holds for both TransE and TransComplEx. 2) Limitation L4 is not valid when assumptions (b), (c) and (d) hold.

Limitation L5: Lemma 5: 1) Under condition (a), the limitation L5 holds for both TransE and TransComplEx. 2) Under conditions (b), (c) and (d), L5 is not valid for both TransE and TransComplEx.

Limitation L6: Lemma 6: 1) With condition (a), the limitation L6 holds for both TransE and TransComplEx. 2) With conditions (b), (c) and (d), the limitation L6 doesn’t hold for the models.

4.3 Encoding Relation Patterns in TransComplEx

Most of KGE models learn from triples. Recent work incorporates relation patterns such as transitive, symmetric on the top of triples to further improve performance of models. For example, ComplEx-NNE+AER Ding et al. (2018) encodes implication pattern in the ComplEx model. RUGE Guo et al. (2018) injects First Order Horn Clause rules in an embedding model. SimplE Kazemi and Poole (2018) captures symmetric, antisymmetric and inverse patterns by weight tying in the model. Inspired by Minervini et al. (2017) and considering the score function of TransComplEx, in this part, we derive formulae for equivalence, symmetric, inverse and implication to be used as regularization terms in the optimization problem. Therefore, the model incorporates different relation patterns to optimize the embeddings.

Symmetric: In order to encode symmetric relation , the following should be held:

Therefore the following algebraic formulae is proposed to encode the relation: According to the definition of score function of TransComplEx, we have the following algebraic formulae: Using similar argument for symmetric, the following formulae are derived for transitive, composition, inverse and implication:

Equivalence: Let be equivalence relations i.e., we obtain

Implication: Let we obtain

Inverse: Let we obtain

Finally, the following optimization problem should be solved:

(9)

where is embedding parameters, is one of the losses 3, 4 or 5 and is one of the derived formulae mentioned above.

FB15k WN18
Hits Hits
MR MRR @10 MR MRR @10
TransE Bordes et al. (2013) 125 - 47.1 251 - 89.2
TransH (bern) Wang et al. (2014)* 87 - 64.4 388 - 82.3
TransR (bern) Lin et al. (2015b)* 77 - 68.7 225 - 92.0
TransD (bern) Ji et al. (2015)* 91 - 77.3 212 - 92.2
TransE-RS (bern) Zhou et al. (2017b)* 63 - 72.1 371 - 93.7
TransH-RS (bern) Zhou et al. (2017b)* 77 - 75.0 357 - 94.5
TorusE Ebisu and Ichise (2019) - 73.3 83.2 - 94.7 95.4
TorusE(with WNP) Ebisu and Ichise (2019) - 75.1 83.5 - 94.7 95.4
R-GCN Schlichtkrull et al. (2018)+ - 65.1 82.5 - 81.4 95.5
ConvE Dettmers et al. (2018)++ 51 68.9 85.1 504 94.2 95.5
ComplEx Trouillon et al. (2016)++ 106 67.5 82.6 543 94.1 94.7
ANALOGY Liu et al. (2017)++ 121 72.2 84.3 - 94.2 94.7
SimplE Kazemi and Poole (2018) - 72.7 83.8 - 94.2 94.7
SimplE+ Fatemi et al. (2018) - 72.5 84.1 - 93.7 93.9
PTransE Lin et al. (2015a) 58 - 84.6 - - -
KALE Guo et al. (2016) 73 52.3 76.2 241 53.2 94.4
RUGE Guo et al. (2018) 97 76.8 86.5 - - -
ComplEx-NNE+AER Ding et al. (2018) 116 80.3 87.4 450 94.3 94.8
RPTransComplEx3 38 70.5 88.3 451 92.7 94.8
RPTransComplEx4 38 72.4 88.8 275 92.4 95.4
RPTransComplEx5 59 61.7 82.2 547 94.0 94.7
TransComplEx4 38 68.2 87.5 284 92.2 95.5
TransE4 46 64.8 87.2 703 68.7 94.5
Table 1: Link prediction results. Rows 1-8: Translation-based models with no injected relation patterns. Rows 9-12: basic models with no injected relation patterns. Rows 13-17: models which encode relation patterns. Results labeled with *, + and ++ are taken from Zhou et al. (2017b), Ebisu and Ichise (2019) and Akrami et al. (2018) while the rest are taken from original papers/code. Dashes: results could not be obtained.
FB15k-237 WN18RR
Hits Hits
MR MRR @10 MR MRR @10
TransE Bordes et al. (2013)+ - 25.7 42.0 - 18.2 44.4
DistMult Bordes et al. (2013)+ - 24.1 41.9 - 43.0 49.0
ComplEx Trouillon et al. (2016)+ - 24.0 41.9 - 44.0 51.0
R-GCN Schlichtkrull et al. (2018)+ - 24.8 41.7 - - -
ConvE Dettmers et al. (2018)+ - 31.6 49.1 - 46.0 48.0
TorusE Ebisu and Ichise (2019) - 30.5 48.4 - 45.2 51.2
TorusE (with WNP) Ebisu and Ichise (2019) - 30.7 48.5 - 46.0 53.4
RPTransComplEx3 210 27.7 46.4 - - -
RPTransComplEx4 226 31.9 49.5 - - -
RPTransComplEx5 216 25.3 43.8 - - -
TransComplEx4 223 31.7 49.3 4081 38.9 49.8
TransE4 205 27.2 45.3 3850 20.0 47.5
Table 2: Link prediction results. Rows 1-7: basic models with no injected relation patterns. Results labeled with + are taken from Ebisu and Ichise (2019) while the rest are taken from original papers/code. Dashes: results could not be obtained.

5 Experiments and Evaluations

In this section, we evaluate performance of our model, TransComplEx, with different loss functions on link prediction task. The aim of the task is to complete the triple () by prediction of the missed entity or . Filtered Mean Rank (MR), Mean Reciprocal Rank (MRR) and Hit@10 are used for evaluations Wang et al. (2017); Lin et al. (2015b).

Dataset.

We use two dataset extracted from Freebase Bollacker et al. (2008) (i.e., FB15K Bordes et al. (2013) and FB15K-237 Toutanova and Chen (2015)) and two others extracted from WordNet Miller (1995) (i.e. WN18 Bordes et al. (2013) and WN18RR Dettmers et al. (2018)). FB15K and WN18 are earlier dataset which have been extensively used to compare performance of KGEs. FB15K-237 and WN18RR are two dataset which are supposed to be more challenging after removing inverse patterns from FB15K and WN18. Guo et al. (2018) and Ding et al. (2018) extracted different relation patterns from FB15K and WN18 respectively. The relation patterns are provided by their confidence level, e.g. . We drop the relation patterns with confidence level less than 0.8. Generally, we use 454 and 14 relation patterns for FB15K and WN18 respectively. We do grounding for symmetric and transitive relation patterns. Thanks to the formulation of score function, grounding is not needed for inverse, implication and equivalence.

Experimental Setup.

We implement TransComplEx with the losses 3, 4 and 5 and TransE with the loss 4

in Pytorch. Adagrad is used as an optimizer. We generate 100 mini-batches in each iteration. The hyperparameter corresponding to the score function is embedding dimension

. We add slack variables to the losses 3 and 4 to have soft margin as in Nayyeri et al. (2019). The loss 4 is rewritten as follows Nayyeri et al. (2019):

(10)

We set and to one and search for the hyperparameters and in the sets and respectively. Moreover, we generate negative samples per each positive. The embedding dimension and learning rate are tuned from the sets respectively. All hyperparameters are adjusted by early stopping on validation set according to MRR. RPTransComplEx denotes the TransComplEx model which is trained by the loss function (3, 4, 5). RP indicates that relation patterns are injected during learning by regularizing the derived formulae (see 9). TransComplEx refers to our model trained with the loss without regularizing relation patterns formulae. The same notation is used for TransE. The optimal configurations for RPTransComplEx3 are for FB15K, for FB15K-237, for WN18; for RPTransComplEx4 are for FB15K, for FB15K-237, for WN18; for RPTransComplEx5 are for FB15K, for FB15K-237, for WN18; for TransComplEx4 are for FB15K, for FB15K-237, for WN18, for WN18RR, for TransE4 are for FB15K, for FB15K-237, for WN18, for WN18RR.

Results.

Table 1 presents comparison of TransComplEx and its relation pattern encoded variants (RPTransComplEx) with three classes of embedding models including Translation-based models (e.g. TransX, TorusE), relation pattern encoded models (e.g. RUGE, ComplEx-NNE+AER, SimplE, SimplE+), and other state-of-the-art embedding models (e.g. ConvE, ComplEx, ANALOGY). To investigate our theoretical proofs corresponding to the effect of loss function, we train TransComplEx with different loss functions. As previously discussed, FB15K-237 and WN18RR are two more challenging dataset provided recently. Therefore, in order to have a better evaluation, Table 2 presents comparison of our models with state-of-the-art embedding methods on these two dataset. For WN18RR, we do not use any relation patterns to be encoded. The results labeled with ”*”, ”+” and ”++” are taken from Zhou et al. (2017b), Ebisu and Ichise (2019) and Akrami et al. (2018) respectively.

Discussion of Results.

According to the Table 1, FB15K dataset part, PRTransComplEx trained by the loss 4 significantly outperforms all Translation-based embedding models including the recent work TorusE. Note that TorusE is trained by embedding dimension 10000 while our model uses embedding dimension at most 200. Comparing to relation pattern encoded embedding models including recent works ComplEx-NNE+AER, RUGE, SimplE and SimplE+, our model outperforms them in the terms of MR and Hit@10. Moreover, the model significantly outperforms popular embedding models including ConvE and ComplEx. Regarding our theories, the loss 4 has less limitations comparing to the loss 3. This is consistent with our theories where RPTransComplEx4 outperforms RPTransComplEx3. TransComplEx without encoding relation patterns still obtains accuracy as good as state-of-the-art models. TransComplEx outperforms TransE while both are trained by the loss 4 in the terms of MR, MRR and Hit@10 which is consistent with our theories (TransComplEx score function has less limitations than TransE). Regarding the results on WN18, the accuracy of TransComplEx is very close to the state-of-the-art models. Encoding relation patterns cannot improve the performance on WN18 because the models learn relation patterns from data well. The loss 5 provides different upper-bounds and lower-bounds for the score of positive and negative triples respectively and also the margin can slide. Therefore, the accuracy would be degraded Zhou et al. (2017b). Generally, the loss 4 gets better performance which is consistent to our theoretical results. As shown in the Table 2, FB15K-237 part, with and without encoding relation patterns, TransComplEx trained by the loss 4 outperforms all the baselines in terms of MRR and Hit@10. TransComplEx4 outperforms TransE4 showing the effectiveness of our proposed score function. Regarding WN18RR, TorusE has better performance comparing to our model. However, the results are obtained with a very big embedding dimension ().

6 Conclusion

In this paper, we reinvestigated the main limitations of Translation-based embedding models from two aspects: score and loss. We showed that existing theories corresponding to the limitations of the models are inaccurate because the effect of loss functions has been ignored. Accordingly, we presented new theories about the limitations by consideration of the effect of score and loss functions. We proposed TransComplEx, a new variant of TransE which is proven to be less limited comparing to the TransE. The model is trained by using various loss functions on standard dataset including FB15K, FB15K-237, WN18 and WN18RR. According to the experiments, TransComplEx significantly outperformed translation-based embedding models. Moreover, TransComplEx got competitive performance comparing to the state-of-the-art embedding models while it is more efficient in time and memory. The experimental results conformed the presented theories corresponding to the limitations.

References

  • Akrami et al. (2018) Farahnaz Akrami, Lingbing Guo, Wei Hu, and Chengkai Li. 2018. Re-evaluating embedding-based knowledge graph completion methods. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 1779–1782. ACM.
  • Bollacker et al. (2008) Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 1247–1250. AcM.
  • Bordes et al. (2013) Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems, pages 2787–2795.
  • Dettmers et al. (2018) Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. 2018. Convolutional 2d knowledge graph embeddings. In

    Thirty-Second AAAI Conference on Artificial Intelligence

    .
  • Ding et al. (2018) Boyang Ding, Quan Wang, Bin Wang, and Li Guo. 2018. Improving knowledge graph embedding using simple constraints. arXiv preprint arXiv:1805.02408.
  • Ebisu and Ichise (2018) Takuma Ebisu and Ryutaro Ichise. 2018. Toruse: Knowledge graph embedding on a lie group. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Ebisu and Ichise (2019) Takuma Ebisu and Ryutaro Ichise. 2019. Generalized translation-based embedding of knowledge graph. IEEE Transactions on Knowledge and Data Engineering.
  • Fatemi et al. (2018) Bahare Fatemi, Siamak Ravanbakhsh, and David Poole. 2018. Improved knowledge graph embedding using background taxonomic information. arXiv preprint arXiv:1812.03235.
  • Feng et al. (2016) Jun Feng, Minlie Huang, Mingdong Wang, Mantong Zhou, Yu Hao, and Xiaoyan Zhu. 2016. Knowledge graph embedding by flexible translation. In Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning.
  • Guo et al. (2016) Shu Guo, Quan Wang, Lihong Wang, Bin Wang, and Li Guo. 2016. Jointly embedding knowledge graphs and logical rules. In

    Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

    , pages 192–202.
  • Guo et al. (2018) Shu Guo, Quan Wang, Lihong Wang, Bin Wang, and Li Guo. 2018. Knowledge graph embedding with iterative guidance from soft rules. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Hitchcock (1927) Frank L Hitchcock. 1927.

    The expression of a tensor or a polyadic as a sum of products.

    Journal of Mathematics and Physics, 6(1-4):164–189.
  • Ji et al. (2015) Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 687–696.
  • Kazemi and Poole (2018) Seyed Mehran Kazemi and David Poole. 2018. Simple embedding for link prediction in knowledge graphs. In Advances in Neural Information Processing Systems, pages 4284–4295.
  • Lin et al. (2015a) Yankai Lin, Zhiyuan Liu, Huanbo Luan, Maosong Sun, Siwei Rao, and Song Liu. 2015a. Modeling relation paths for representation learning of knowledge bases. arXiv preprint arXiv:1506.00379.
  • Lin et al. (2015b) Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015b. Learning entity and relation embeddings for knowledge graph completion. In Twenty-ninth AAAI conference on artificial intelligence.
  • Liu et al. (2017) Hanxiao Liu, Yuexin Wu, and Yiming Yang. 2017. Analogical inference for multi-relational embeddings. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    , pages 2168–2178. JMLR. org.
  • Miller (1995) George A Miller. 1995. Wordnet: a lexical database for english. Communications of the ACM, 38(11):39–41.
  • Minervini et al. (2017) Pasquale Minervini, Luca Costabello, Emir Munoz, Novacek, and Pierre-Yves Vandenbussche. 2017. Regularizing knowledge graph embeddings via equivalence and inversion axioms. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 668–683. Springer.
  • Nayyeri et al. (2019) Mojtaba Nayyeri, Sahar Vahdati, Jens Lehmann, and Hamed Shariat Yazdi. 2019. Soft marginal transe for scholarly knowledge graph completion. arXiv preprint arXiv:1904.12211.
  • Nguyen et al. (2016) Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, and Mark Johnson. 2016. Stranse: a novel embedding model of entities and relationships in knowledge bases. arXiv preprint arXiv:1606.08140.
  • Schlichtkrull et al. (2018) Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607. Springer.
  • Sun et al. (2019) Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197.
  • Toutanova and Chen (2015) Kristina Toutanova and Danqi Chen. 2015. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pages 57–66.
  • Trouillon et al. (2016) Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning, pages 2071–2080.
  • Wang et al. (2017) Quan Wang, Zhendong Mao, Bin Wang, and Li Guo. 2017. Knowledge graph embedding: A survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743.
  • Wang et al. (2018) Yanjie Wang, Rainer Gemulla, and Hui Li. 2018. On multi-relational link prediction with bilinear models. In Thirty-Second AAAI Conference on Artificial Intelligence.
  • Wang et al. (2014) Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014.

    Knowledge graph embedding by translating on hyperplanes.

    In Twenty-Eighth AAAI conference on artificial intelligence.
  • Zhou et al. (2017a) Xiaofei Zhou, Qiannan Zhu, Ping Liu, and Li Guo. 2017a. Learning knowledge embeddings by combining limit-based scoring loss. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1009–1018. ACM.
  • Zhou et al. (2017b) Xiaofei Zhou, Qiannan Zhu, Ping Liu, and Li Guo. 2017b. Learning knowledge embeddings by combining limit-based scoring loss. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pages 1009–1018. ACM.

Appendix A Supplementary Material

The proof of lemmas are provided as follows:

Lemma 1: 1) Let assumption a) holds, be a reflexive relation and are true facts. If TransE and TransComplEx infer a relation pattern which is reflexive, they automatically infer as a true fact. 2) The models can infer reflexive relation patterns with condition b), c) and d).

Proof: 1) Let be a reflexive relation and condition a) holds. For TransE, we have

(11)

Therefore, the relation vector collapses to a null vector ( As a consequence of , if are facts, then will have a same embedding vectors which automatically results in to be a fact.

For TransComplEx with condition a), the following equations are obtained:

(12)

Therefore, all entities which are connected by the relation will have the same real and imaginary parts which causes the same problem mentioned for TransE (i.e. is inferred to be true triple when are facts.

2) Let b) holds. For a reflexive relation , we have which is not contradiction. For TransE, we get Therefore, TransE with condition b) is capable of encoding reflexive relation. The condition b) is a special case of c) , d). Therefore, with the same token, it is proved that TransE and TransComplEx can encode reflexive relations.

Lemma 2: 1) Let the assumption b) or c) or d) holds. TransComplEx can infer a relation pattern which is neither reflexive nor irreflexive. 2) TransE cannot infer the relation pattern.

proof: 1) Let the relation be neither reflexive nor irreflexive. There exists two triples that are positive and negative respectively. Therefore the following inequalities hold:

(13)

Equation 13 is rewritten as follows:

(14)

For TransE in real space, and cannot be held simultaneously when . Therefore, TransE in real space cannot encode a relation which is neither reflexive nor irreflexive. In contrast, TransE in complex space can encode the relation by proper assignment of imaginary parts of entities. Therefore, theoretically TransComplEx can infer a relation which is neither reflexive nor irreflexive.

Lemma 3: 1) TransComplEx can infer symmetric patterns with condition a), b), c) and d). 2) TransE cannot infer symmetric patterns with condition a). 3) TransE can infer a relation pattern which is symmetric and reflexive with conditions b), c) and d).

Proof: 1), 2) Let be a symmetric relation and a) holds. We have

(15)

Trivially, we have

(16)

For TransE in real space, there is

Therefore, It means that TransE cannot infer symmetric relations with condition a). For TransComplEx, additionally we have

It concludes . Therefore, TransE in complex space with condition a) can infer symmetric relation. Because a) is an special case of b) and c), TransComplEx can infer symmetric relations in all conditions.

3) For TransE with condition b), there is

(17)
(18)

The necessity condition for encoding symmetric relation is This implies Let , by 18 we have

Let . We have

(19)

Regarding 19, we have

.

To avoid contradiction, . If we have Therefore, TransE can encode symmetric pattern with condition b), if and . From the proof of condition b), we conclude that TransE can encode symmetric patterns under conditions c) and d).

Lemma 4: 1) Let a) holds. Limitation L4 holds for both TransE and TransComplEx. 2) Limitation L4 is not valid when assumptions b), c) and d) hold.

Proof: 1) The proof of the lemma with condition a) for TransE is mentioned in the paper Kazemi and Poole (2018). For TransComplEx, the proof is trivial. 2) Now, we prove that the limitation L4 is not valid when b) holds.

Let condition b) holds and relation be reflexive, we have

Let . To violate the limitation L4, the triple should be negative i.e.,

Considering , we have

Therefore, the limitation L4 is not valid i.e., if a relation is reflexive, it may not be symmetric. TransE is special case of TransComplEx and also condition b) is special case of condition c). Therefore using conditions b), c) and d), the limitation L4 is not valid for TransE and TransComplEx.

Lemma 5: 1) Under condition a), the limitation L5 holds for both TransE and TransComplEx. 2) Under conditions b), c) and d), L5 is not valid for both TransE and TransComplEx.

proof

1) Under condition a), equation holds. Therefore, according to the paper Kazemi and Poole (2018), the model has the limitation L5.

2) If a relation is reflexive, with condition b), we have Therefore, Let

(20)

we need to show the following inequality wouldn’t give contradiction:

From 20 we have which is not contradiction.

Therefore, with conditions b) and c), the limitation L5 is not valid for both TransE and TransComplEx.

Limitation L6: Lemma 6: 1) With condition (a), the limitation L6 holds for both TransE and TransComplEx. 2) With conditions (b), (c) and (d), the limitation L6 doesn’t hold for the models.

Proof: 1) With condition (a), the limitation L6 is proved in Kazemi and Poole (2018). 2) Considering the assumption of L6 and the condition (b), we have

(21)

We show the condition that holds.

Substituting 21 in , we have

Therefore, the limitation L6 is not valid with condition (b), (c) and (d).

Figure 3 shows that the limitation L6 is invalid by proper selection of loss function.

Figure 3: Investigation of L6 with condition (c): The limitation is not valid, because the triple () can get an score to be considered as negative while triples () are positive.