Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding

by   Tengwei Song, et al.
Beihang University

Knowledge graph embedding models learn the representations of entities and relations in the knowledge graphs for predicting missing links (relations) between entities. Their effectiveness are deeply affected by the ability of modeling and inferring different relation patterns such as symmetry, asymmetry, inversion, composition and transitivity. Although existing models are already able to model many of these relations patterns, transitivity, a very common relation pattern, is still not been fully supported. In this paper, we first theoretically show that the transitive relations can be modeled with projections. We then propose the Rot-Pro model which combines the projection and relational rotation together. We prove that Rot-Pro can infer all the above relation patterns. Experimental results show that the proposed Rot-Pro model effectively learns the transitivity pattern and achieves the state-of-the-art results on the link prediction task in the datasets containing transitive relations.



There are no comments yet.


page 1

page 2

page 3

page 4


RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space

We study the problem of learning representations of entities and relatio...

SpaceE: Knowledge Graph Embedding by Relational Linear Transformation in the Entity Space

Translation distance based knowledge graph embedding (KGE) methods, such...

Knowledge Graph Embedding with Multiple Relation Projections

Knowledge graphs contain rich relational structures of the world, and th...

AutoETER: Automated Entity Type Representation for Knowledge Graph Embedding

Recent advances in Knowledge Graph Embed-ding (KGE) allow for representi...

STaR: Knowledge Graph Embedding by Scaling, Translation and Rotation

The bilinear method is mainstream in Knowledge Graph Embedding (KGE), ai...

Relation Embedding with Dihedral Group in Knowledge Graph

Link prediction is critical for the application of incomplete knowledge ...

HousE: Knowledge Graph Embedding with Householder Parameterization

The effectiveness of knowledge graph embedding (KGE) largely depends on ...

Code Repositories


Rot-Pro: Modeling Transitivity by Projection in Knowledge Graph Embedding

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge graph embedding (KGE) aims to learn low-dimensional dense vectors to express the entities and relations in the knowledge graphs (KG). It is widely used in recommendation system, question answering, dialogue systems

qa1 ; qa2 ; qa3 . The general intuition of KGE is to model and infer relations between entities in knowledge graphs, which has complex patterns such as symmetry, asymmetry, inversion, composition, and transitivity as shown in Table 1.

Many studies dedicate to find a method, which is able to model various relation patterns transe ; distmult ; CompLex ; rotate . TransE models relations as translations, aims to model the inversion and composition patterns; DisMult can model symmetric relations by capturing interactions between head and tail entities and relations. One representative model proposed recently is RotatE (rotate, ), which is proved to be able to model symmetry, asymmetry, inversion and composition patterns by modeling relation as a rotation in the complex plane. However, none of them can model all the five relation patterns, especially the transitivity pattern.

Relation pattern Definition
Symmetry if , then
Asymmetry if , then
Inversion if and , then
Composition if
, then
Transitivity if and , then
Table 1: Common relation patterns.

This paper focus on modeling the transitivity pattern. We theoretically show that the transitive relations can be modeled with idempotent transformations, i.e. projections valenza2012linear . Any projection matrix is similar to a diagonal matrix with elements in the diagonal being or . We design the projection by constraining the similarity matrix to be a rotation matrix, which has less parameters to be learn.

In order to model not only transitivity but also other relation patterns shown in Table 1

, we propose the Rot-Pro model which combines the projection and relational rotation together. We theoretically prove that Rot-Pro can infer the symmetry, asymmetry, inversion, composition, and transitivity patterns. Experimental results show that the Rot-Pro model can effectively learn the transitivity pattern. The Rot-Pro model achieves the state-of-the-art results on the link prediction task in the Countries dataset containing transitive relations and outperforms other models in the YAGO3-10 and FB15k-237 dataset.

2 Related work

Symmetry Asymmetry Inversion Composition Transitivity
Table 2: The supported relation patterns of several models rotate .

There are mainly two types of knowledge graph embedding models, either using translation transformation or linear transformation between head and tail entities.

Trans-series models.

Trans-series models, which is well-known in KGE area, are essentially translation transformation based models. TransE (transe, ) proposed a pure translation distance-based score function, which assumes the added embedding of head entity and relation should be close to the embedding of tail entity . This simple approach is effective in capturing composition, asymmetric and inversion relations, but is hard to handle the 1-to-N, N-to-1 and N-N relations.

To overcome these issues, many variants and extensions of TransE have been proposed. TransH (transh, )

projects entities and relations into a relation-specific hyperplanes and enables different projections of an entity in different relations. TransR

(transr, ) introduces relation-specific spaces, which builds entity and relation embeddings in different spaces separately. TransD (transd, ) simplifies TransR by constructs dynamic mapping matrices. For the purpose of model optimization, some models relax the requirement for translational distance. TransA (transa, ) replaces Euclidean distance by Mahalanobis distance to enable more adaptive metric learning, and a recent model TransMS (transMS, ) transmits multi-directional semantics by complex relations.

The variants of TransE improve the capability of the models to handle 1-to-N, N-to-1 and N-N relations as well as effectively modeling symmetric and asymmetric relations, but they are no longer able to model composition and inversion relations as they do linear transformation on head and tail entities separately before modeling the relation as translation. BoxE (abboud2020boxe, ), a recent trans-series model, embeds entities as points, and relations as a set of boxes, for yielding a model that could express multiple relation patterns including composition and inversion, but it cannot express transitivity.

Bilinear models.

Models of bilinear series model relations as linear transformation matrix from head to tail entity. The type of relation patterns that a linear transformation based model can infer depends on the property of . RESCAL (Rescal, ) proposes the transformation of relation as a matrix that models the pairwise interactions between entities and relation. The score of a fact is defined by a bilinear function: . DistMult (distmult, ) simplifies RESCAL by restricting to diagonal matrices. Therefore, it cannot handle other types of relations except symmetry. HolE (hole, ) combines the expressive power of RESCAL with the simplicity of DistMult which introduces circular correlations. HolE can express multiple types of relations, since cyclic correlation operations are not commutative.

Recently, some KGE models begin to model relation patterns explicitly. Dihedral (dihedral, ) models relations in KGs with the representation of dihedral group that has properties to support the relation as symmetry. To expand Euclidean space, ComplEx (CompLex, ) firstly introduces complex vector space which can capture both symmetric and asymmetric relations. RotatE (rotate, ) also models in complex space and can capture additional inversion and composition patterns by introducing rotational Hadmard product. QuatE (zhang2019quaternion, ) extends RotatE, using a quaternion inner product and gains more expressive semantic learning capability. ATTH (atth, ) proposes a low-dimensional hyperbolic knowledge graph embedding method, which capture logical patterns such as symmetry and asymmetrical.

However, none of existing models is capable of modeling transitivity relation pattern. We are the first to show that the transitivity can be modeled with projections, and we prove that the proposed model is able to infer all the five relation patterns shown in Table 1.

3 Rot-Pro: Modeling Relation Patterns by Projection and Rotation

3.1 Preliminary

RotatE is a representative approach that models a relation as an element-wise rotation from the embedding of head entity to the embedding of tail entity in complex vector space. This can be denoted as , where is the embedding of relation and is the rotation function that rotates the th element of with a phase of the th element of . For each embedding , let and be the real number and imaginary number of its th dimension, the rotation transformation in the

th dimension is defined by an orthogonal matrix whose determinant is

, as follows:


It has been proved that RotatE can infer symmetry, asymmetry, inversion, and composition relation patterns rotate . However, it cannot infer the transitivity pattern, and we will explain in what follows.

(a) Transitive Chain
(b) Limitation of TransE
(c) Limitation of RotatE
Figure 1: Illustration of transitive chain and the limitation of TransE and RotatE on representing transitivity pattern.

3.2 Representation of transitive relation

Relation is a transitive relation, if for any instances and of relation , is also an instance of . For convenience of illustration, we define the transitive chain of a transitive relation as follows.

Definition 1.

A transitive chain of is defined as a chain of instances of , where are different entities.

For the transitive closure of a transitive chain, every two entities in the chain should be connected by the transitive relation . Hence, it can be represented as a fully connected directed graph with edges. It can be proved that a transitive relation can be represented as the union of transitive closures of all transitive chains. Thus, the representation of transitive relations can be reduced to the representation of transitive chains.

An example of a transitive chain and its transitive closure are shown in Figure 1(a), where form a transitive chain, and are instances derived by transitivity via one-hop, and is the only instance derived via two-hops.

Due to the speciality of transitivity, current models are unable to effectively model such transformation in vector space. For instance, in TransE (Figure 1(b)), where a relation is regarded as a translation between the head and tail entities, it requires the translation to be a zero vector to model transitivity, which forces the embeddings of entities in a transitive chain to be the same. Thus, it cannot model transitivity. In RotatE (Figure 1(c)), it requires the relational rotation phase in each dimension to be () to model transitivity, which also forces the embeddings of entities to be the same in a transitive chain.

Our solution.

Based on the observation on transitive closures of transitive chains, in each transitive chain , for each entity , can be derived by transitivity via -hops (). If we model each relation as a kind of transformation , then it requires for each relation instance . Therefore, the transformation of a transitive relation must satisfies that (), i.e. the result of transforming an entity embedding multiple times is equivalent to that of transforming it once. This inspires us to model the transitivity pattern in terms of the idempotent transformation (projection valenza2012linear ) which has the same property. For each relation , let

be an invertible matrix on the

th dimension, a general orthogonal projection is defined by the idempotent matrix:


Without loss of generality, we simply set to be a rotation matrix, which rotates the original axis by a phase . The orthogonal projection defined by is performed in the new axis after rotation:


In the rest of paper, we will omit the dimensional indices in and for simplicity. In this way, for entities in a transitive chain, we have (). This implies that , which does not force the entity embeddings to be the same. The embeddings can be different to each other and have the same projected vector under .

3.3 The Rot-Pro model

Model formulation. In order to model not only transitivity but also other relation patterns shown in Table 1, we combine the above projection based representation for transitivity and the relational rotation based representation for symmetry, asymmetry, inversion, and composition together. We propose Rot-Pro to model relations as relational rotations on the projected entity embeddings on complex space . For each triple , the Rot-Pro model requires that


We demonstrate in the following theorem that Rot-Pro enables the modeling and inferring of all the five types of relation patterns introduced above.

Theorem 1.

Rot-Pro can infer the symmetry, asymmetry, inversion, composition, and transitivity patterns.


(1) Let and ,

becomes an identity matrix and

becomes an identity transformation, and our model is reduced to the RotatE model. Thus, Rot-Pro can also infer the symmetry, asymmetry, inversion and composition patterns as RotatE does rotate .

(2) Here we will construct the solutions in Rot-Pro model for transitive relations. Let and , is a projection to the real axis as shown in Figure 2 111We can also set and , then is a projection to the imaginary axis .. As discussed in previous section, to model the transitivity of relation , the projected entity embeddings in a transitive chain must satisfy that . Therefore, the phase of relational rotation can only be () and for any .

According to Equation 4, the following equation is expected to hold.


The above equation holds iff


This equation holds if for any in the transitive chain,

where is a constant. That is, all these entity embeddings in the transitive chain are located in the line defined by Equation (7) on the complex plane as shown in Figure 2.


Here, different value of can represent different transitive chain. In summary, we construct the solutions for representing transitivity in Rot-Pro model, i.e. , , , and for any entity embedding , it satisfies that , where is a constant. ∎

Score function.

For each triple , the distance function of the Rot-Pro model is defined as following:


The score function .

Figure 2: The representation of transitivity pattern in complex plane.

3.4 Optimization objective

In the training process, we adopt the self-adversarial negative sampling, which has been proved as an effective optimization approach to KGE (rotate, ). The negative sampling loss with self-adversarial training is defined as:


where is a fixed margin,

is the sigmoid function,

is the th negative instance and is the distribution for negative sampling (rotate, ).

In addition, to ensure the learned matrix to be a projection, the values of and in Equation 2 should be restricted to or . To enforce such constraint, we proposed a projection penalty loss as follows:


Here is the number of relations, is the Hadamard product, and , where if , otherwise . Here and are hyper-parameters. We define to impose more penalty to values which are far from or than that of values which are close to or .

Let be a hyper-parameter, the total loss is defined as the weighted average of the above two losses.


4 Experiments

4.1 Datasets

We evaluate the Rot-Pro model on four well-known benchmarks. In general, FB15k-237 and WN18RR are two widely-used benchmarks and YAGO3-10 and Countries are two benchmarks with abundant relation patterns including transitivity.

  • FB15k-237: Freebase (Freebase, ) contains information including people, media, geographical and locations. FB15k is a subset of Freebase and FB15k-237 (fb15k-237, ) is a modified version of FB15k, which excludes inverse relations to resolve a flaw with FB15k (conve, ). It contains 14,541 entities, 237 relations, and 272,115 training triples.

  • WN18RR: WN18RR (conve, ) is a subset of WN18 (transe, ) from WordNet (Wordnet, ). WordNet is a dataset that characterizes associations between English words. Compared with WN18, WN18RR retains most of the symmetric, asymmetric and compositional relations, while removing the inversion relations. It contains 40,943 entities, 11 relations, and 86,835 training triples.

  • YAGO3-10: YAGO (yago, ) is a dataset which integrates vocabulary definitions of WordNet with classification system of Wikipedia. YAGO3-10 yago3 is a subset of YAGO, which contains 123,182 entities, 37 relations and 1,079,040 training triples. According to the ontology of YAGO3, it contains almost all common relation patterns.

  • Countries: Countries (countries, ) is a relatively small-scale dataset, which contains 2 relations and 272 entities (244 countries, 5 regions and 23 sub-regions). The two relations of Countries are locatedIn and neighborOf, which are transitive and symmetric relations respectively. The Countries dataset has 3 tasks, each requiring inferring a composition pattern with increasing length and difficulty.

4.2 Evaluation protocol

We evaluate the KGE models on three common evaluation metrics: mean rank (MR), mean reciprocal rank (MRR), and top-

Hit Ratio (Hit@). For each valid triples in the test set, we replace either or with every other entities in the dataset to create corrupted triples in the link prediction task. Following previous work (transe, ; conve, ; convkb, ; zhang2019quaternion, ; kbat, ), all the models are evaluated in a filtered setting, i.e, corrupt triples that appear in training, validation, or test sets are removed during ranking. The valid triple and filtered corrupted triples are ranked in ascending order based on their prediction scores. Lower MR, higher MRR or higher Hit@ indicate better performance.

4.3 Experiment setup

With the hyper-parameters introduced, we train Rot-Pro using a grid search of hyper-parameters: fixed margin in Equation 9 , weights tuning hyper-parameters for loss, , value of in Equation 10 , value of in Equation  10 . Both the real and imaginary parts of the entity embeddings are uniformly initialized, and the phases of the relational rotations are initialized between . In some settings, the phases of the relational rotations are also normalized to between during training.

4.4 Main results

We compare Rot-Pro with several state-of-the-art models, including TransE (transe, ), DistMult (distmult, ), ComplEx (CompLex, ), ConvE (conve, ), as well as RotatE (rotate, ) and BoxE (abboud2020boxe, ), to empirically show the importance of being able to model and infer more relation patterns for the task of predicting missing links. Table 3 summarizes our results on FB15k-237 and WN18RR, where results of baseline models are taken from Sun et al (rotate, ) and Ralph et al (abboud2020boxe, ). We can see that Rot-Pro outperforms the baseline models on most evaluation metrics. Compared to RotatE, the improvement of Rot-Pro is limited since there is no sufficient transitive relation defined on these two datasets, but the results are still comparable with other baseline models.

Table 4 summarizes our results on YAGO3-10 and Countries, which contain transitive relations. Hence the improvement of Rot-Pro over RotatE and other linear transformation models is much more significant. Specifically, Rot-Pro obtains better AUC-PR result than existing state-of-the-art approaches, which indicates that Rot-Pro could effectively infer relation patterns such as transitivity, symmetry and composition. As a translation transformation model, BoxE outperforms Rot-Pro on YAGO3-10 on most evaluation metrics, which indicates it is also a strong KGE model for inferring multiple relation patterns. However, the performance of BoxE on specific transitivity test sets is still not comparable with Rot-Pro, where additional experiments can be found in the appendix.

FB15k-237 WN18RR
MR MRR Hit@1 Hit@3 Hit@10 MR MRR Hit@1 Hit@3 Hit@10
TransE (transe, ) 357 .294 - - .465 3384 .226 - - .501
DistMult (distmult, ) 254 .241 .155 .263 .419 5110 .43 .39 .44 .49
ComplEx (CompLex, ) 339 .247 .158 .275 .428 5261 .44 .41 .46 .51
ConvE (conve, ) 244 .325 .237 .356 .501 4187 .43 .40 .44 .52
RotatE (rotate, ) 177 .338 .241 .375 .533 3340 .476 .428 .492 .571
BoxE abboud2020boxe 163 .337 .238 .347 .538 3207 .451 .400 .472 .541
Rot-Pro 201 .344 .246 .383 .540 2815 .457 .397 .482 .577
Table 3: Link prediction results on FB15k-237 and WN18RR.
YAGO3-10 Countries (AUC-PR)
MR MRR Hit@1 Hit@3 Hit@10 S1 S2 S3
DistMult (distmult, ) 5926 .34 .24 .38 .54 1.00 0.72 0.52
ComplEx (CompLex, ) 6351 .36 .26 .40 .55 0.97 0.57 0.43
ConvE (conve, ) 1671 .44 .35 .49 .62 1.00 0.99 0.86
RotatE (rotate, ) 1767 .495 .402 .550 .670 1.00 1.00 0.95
BoxE abboud2020boxe 1022 .560 .484 .608 .691 - - -
Rot-Pro 1797 5̇42 .443 .596 .699 1.00 1.00 0.998
Table 4: Link prediction results on YAGO3-10 and Countries.

4.5 Validation of learned representation of transitive relations

We conduct further analysis on the the Rot-Pro model to verify that the model can actually learn the representations of transitive relations and have the theoretical property as expected.

(a) ()-Init
(b) ()-Init
(c) ()-Train
Figure 3: Distributions of relational rotation phases. The -axis is the relational rotation phases. The -axis is the number of dimensions of relation embeddings that have non-trivial projection before rotation with a specific phase, i.e. the embedding of parameter and are or .

To do this, we first investigate the distributions of relational rotation phases in all dimensions of entity embeddings obtained by training on YAGO3-10. According to our theoretical analysis, we expect the model could learn to represent transitivity, i.e. for any non-trivial projection (i.e. or ), the corresponding phase of relational rotation should be . The experimental results are shown in Figure 3(a). It can be observed that the Rot-Pro model does learn the relational rotation phases and as expected. However, it also learns the unexpected relational rotation phase . Further experiments reveal that by turning the initialization range of the relational rotation phases, the problem of learning unexpected relational rotation phase could be mitigated. By changing the initialization range of relational rotation phases from to , the number of relational rotation phases becomes significantly less. When we further restrict the relational rotation phases to during training, almost all relational rotation phases become or .

(a) Result on transitive test sets
(b) Result on transitive test sets
(c) Rot-Pro--Init
(d) Rot-Pro--Train
Figure 4: (a) shows the Hit@10 results of the RotatE and Rot-Pro models on three test sets for transitivity. (b) and (c) show the representation of four entities in a transitive chain in two variants of Rot-Pro models with different constraints of relational rotation phase.

The results above are also reflected in the quantitative test. To fully understand the impact of changing initialization range on the performance of the Rot-Pro model on modelling transitive relations, we construct three sub-test sets S1, S2, S3 of YAGO3-10 for evaluation, which consist of a single transitive relation isLocatedIn. Test set contains instances of isLocatedIn in the original test set. Test set is obtained by applying the transitivity once on instances of isLocatedIn in the YAGO3-10 dataset. Test set is constructed similarly to , except by applying the transitivity at least twice. We take the RotatE model as baseline, and compare it with three variant of Rot-Pro models with different settings: the first one with relational rotation phase initialized in , the second one with relational rotation phase initialized in , the third one with relational rotation phase restricted in during training. The experimental result is shown in Figure 4(a). It shows that tuning of initialization range also largely improves the performance of the Rot-Pro model, which coincides with the improvement of learning correct representations of transitive relations. All the variant of Rot-Pro models outperform RotatE significantly, especially when the relational rotation phase is restricted to during training.

(a) Example illustration
(b) Variation of loss
Figure 5: (a) is an example of miss-placed four entities on a transitive chain, which consist of three triples: : (Florida_State_University, isLocatedIn, United_States), : (United_States, isLocatedIn, North_America), : (North_America, isLocatedIn, Americas). (b) is the variation of loss for these triples.

We also visualize one dimension of embeddings of three entities connected by a transitive relation isLocatedIn in YAGO3-10. Figure 4(c) shows the visualization of entity embeddings of the Rot-Pro model trained with initialization range , which contains a miss placed entity embedding. While Figure 4(d) is the visualization of embeddings of entities of the Rot-Pro model trained by restricting the relational rotation phase to during training, where all entity embeddings in the transitive chain are placed correctly as expected. We can see that these vectors are basically fit in a line and can almost be projected to the same vector in the rotated axis.


The Rot-Pro model with no additional restrictions on the relational rotation phase may learn a phase which does not exactly meet our expectation. A possible representations of four entities in a transitive chain with relational rotation phase is illustrated in Figure 5(a), in which four out of the six instances of transitive relation are correctly represented, while the other two instances and are not. Obviously, this is not an optimal solution for the model, and the reason is likely to be that the model falls into a local optimum during the learning process. To demonstrate this, we plot the variation of loss for three triples in a transitive chain with the relational rotation phase range over . The result is shown in Figure 5(b). We can find that there is indeed a local optimum at , and the global optimum is at and , which is consistent with our conjecture.

4.6 Limitation

(a) ()-Init
(b) ()-Init
(c) ()-Train
Figure 6: The distribution of relational rotation phases of three Rot-Pro variants over all dimensions of a specific symmetric relation isMarriedTo. The meaning of and axes is the same as Figure 3.

According to the experimental results, Rot-Pro is sensitive to the range of relational rotation phases, and hence prone to fall into the local optimum solution. Though it can learn the idempotency of transitivity correctly by enforcing addition constraints on training, however, such constraints have negative impact on the learning of other relation patterns, such as symmetry. We can find in Figure 6(a) that for a symmetric relation, the relational rotation phases learned by a Rot-Pro without phase constraint are either , or , which indicates that it has similar capability of modeling and inferring symmetry relation pattern as RotatE. By narrowing of the range of relational rotation phases, the histogram on the symmetry relation is gradually disrupted as shown in Figure 6(b) and 6(c). Therefore, a trade-off should be made between the better modeling and inferring of transitivity and the other relation patterns. Such limitation might be further optimized through learning each relation pattern separately and integrate through mechanisms such as attention, which we will study in future works.

5 Conclusion and Future Work

In this paper, we theoretically showed that the transitive relations can be modeled with projections that is an idempotent transformation. We also theoretically proved that the proposed Rot-Pro model is able to infer the symmetry, asymmetry, inversion, composition, and transitivity patterns. Our experimental results empirically showed that the Rot-Pro model can effectively learn the transitivity pattern. Our model also has the potential to be improved by extending the complex space to higher dimension space, such as quaternion space zhang2019quaternion ; orthogonal . While the proof of expressiveness in many previous works is focused on the expressiveness of each relation pattern separately, it is worthwhile to further investigate whether a model can handle all common relation patterns simultaneously, considering that a single relation may exhibit multiple relation patterns and different relation patterns may have complex interactions in knowledge graphs.

This work was supported by the National Key Research and Development Plan of China (Grant No. 2018AAA0102301) and the National Natural Science Foundation of China (Grant No. 61690202).


  • [1] Ralph Abboud, İsmail İlkan Ceylan, Thomas Lukasiewicz, and Tommaso Salvatori. BoxE: A box embedding model for knowledge base completion, 2020.
  • [2] X. He J. Gao B. Yang, W.-t. Yih and L. Deng. Embedding entities and relations for learning and inference in knowledge bases. In ICLR, pages 1–13, 2015.
  • [3] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26, pages 2787–2795, 2013.
  • [4] Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, and Christopher Ré. Low-dimensional hyperbolic knowledge graph embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6901–6914. Association for Computational Linguistics, July 2020.
  • [5] L. Chen, G. Zeng, Q. Zhang, X. Chen, and D. Wu. Question answering over knowledgebase with attention-based lstm networks and knowledge embeddings. In 2017 IEEE 16th International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC), pages 243–246, 2017.
  • [6] Dat Quoc Nguyen Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung.

    A novel embedding model for knowledge base completion based on convolutional neural network.

    In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), volume 2, page 327–333, 2018.
  • [7] Kamal Singh Dennis Diefenbach and Pierre Maret. WDAqua-core1: a question answering service for RDF knowledge bases. In Companion of the The Web Conference 2018 on The Web Conference (WWW), page 1087–1091, 2018.
  • [8] Sameer Singh Guillaume Bouchard and Theo Trouillon. On approximate reasoning capabilities of low-rank vector spaces. In AAAI Spring Syposium on Knowledge Representation and Reasoning (KRR): Integrating Symbolic and Neural Approaches, 2015.
  • [9] Liheng Xu Kang Liu Guoliang Ji, Shizhu He and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In

    Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing

    , pages 687–696, 2015.
  • [10] Y. Hao H. Xiao, M. Huang and X. Zhu. TransA: An adaptive approach for knowledge graph embedding. In

    Proceedings of AAAI Conference on Artificial Intelligence

    , pages 1–7, 2015.
  • [11] P. Paritosh T. Sturge K. Bollacker, C. Evans and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, page 1247–1250, 2008.
  • [12] L. Rosasco M. Nickel and T. Poggio. Holographic embeddings of knowledge graphs. In Proceedings of AAAI conference on artificial intelligence, page 1955–1961, 2016.
  • [13] Biega J. Mahdisoltani, F. and F. M. Suchanek. YAGO3: A knowledge base from multilingual wikipedias. Proceedings of CIDR 2015, 2015.
  • [14] Volker Tresp Maximilian Nickel and Hans-Peter Kriegel. A three-way model for collective learning on multi-relational data. In

    Proceedings of the 28th International Conference on International Conference on Machine Learning, (ICML)

    , 2011.
  • [15] George A Miller. Wordnet: a lexical database for english. In Communications of the ACM, 38(11), page 39–41, 1995.
  • [16] Deepak Nathani, Jatin Chauhan, Charu Sharma, and Manohar Kaul. Learning attention-based embeddings for relation prediction in knowledge graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4710–4723, 2019.
  • [17] K. Liu S. He, C. Liu and J. Zhao. Generating natural answers by incorporating copying and retrieving mechanisms in sequence-to-sequence learning. In ACL, page 199–208, 2017.
  • [18] H. Zhang J. Yan H. He S. Yang, J. Tian and Y. Jin. TransMS: knowledge graph embedding for complex relations by multidirectional semantics. In Proceedings of IJCAI, page 1935–1942, 2019.
  • [19] Fabian M. Suchanek, Gjergji Kasneci, and Gerhard Weikum. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, page 697–706, 2007.
  • [20] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. RotatE: Knowledge graph embedding by relational rotation in complex space. In International Conference on Learning Representations, 2019.
  • [21] S. Riedel E. Gaussier T. Trouillon, J. Welbl and G. Bouchard. Complex embeddings for simple link prediction. In Proceedings of 33rd Int. Conf. Mach. Learn, page 2071–2080, 2016.
  • [22] Yun Tang, Jing Huang, Guangtao Wang, Xiaodong He, and Bowen Zhou. Orthogonal relation transforms with graph context modeling for knowledge graph embedding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2713–2722, Online, July 2020. Association for Computational Linguistics.
  • [23] Pontus Stenetorp Tim Dettmers, Pasquale Minervini and Sebastian Riedel. Convolutional 2D knowledge graph embeddings. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
  • [24] Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, page 57–66, 2015.
  • [25] R.J. Valenza. Linear Algebra: An Introduction to Abstract Mathematics. Undergraduate Texts in Mathematics. Springer New York, 2012.
  • [26] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In AAAI Conference on Artificial Intelligence, 2014.
  • [27] C. Xu and R. Li. Relation embedding with dihedral group in knowledge graph. In Proceedings of ACL, pages 263–272, 2019.
  • [28] Maosong Sun Yang Liu Yankai Lin, Zhiyuan Liu and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, pages 2181–2187, 2015.
  • [29] Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embedding. Advances in Neural Information Processing Systems, page 2731–2741, 2019.

Appendix A Proof of a property of transitive relation

In section 3.2 of the submitted paper, we use the conclusion that “the transitive relation can be represented as the union of transitive closures of of all transitive chains.” Here, we prove it in the following lemma.

Lemma 1.

A transitive relation can be represented as the union of transitive closures of all transitive chains.


For any given transitive relation , it can be represented as a directed graph which satisfies that for any vertex and , if there is a path from to , then there is an edge connecting to directly. We can see that if there is a path from to whose length is larger than (i.e. ), then the edge connecting and directly can be derived through transitivity, i.e. the transitive chain implies that . By removing any edge that and is connected by a path longer than , we can obtain a new graph .

For any instance , if it is an edge in , then is a transitive chain itself and hence it is in the transitive closure of itself; otherwise, is an edge removed from and hence there is a path from to in whose length is larger than , then can be derived through transitivity based on the path, i.e. is in the transitive closure of the path (transitive chain). Hence any instance of a transitive relation is in a transitive closure of a transitive chain. Thus, a transitive relation can be represented as the union of transitive closures of all transitive chains. ∎

Appendix B Statistics and split of datasets.

The datasets we used for experiments are open-sourced, which can be obtained in the source code

222 of RotatE [20]. Table 5 shows the statistic of these datasets, where the number of training triples in the S1, S2, and S3 datasets of Counties are separated by ’/’.

entities relations train valid test
FB15k-237 14,541 237 272,115 17,535 20,466
WN18RR 40,943 11 86,835 3,034 3,134
YAGO3-10 123,182 37 1,079,040 5,000 5,000
Countries 271 2 985/ 1,063/ 1,111 24 24
Table 5: Statistics of datasets

Appendix C Computational resources

Our model is implemented in Python 3.6 using Pytorch 1.1.0. Experiments are performed on a workstation with Intel Xeon Gold 5118 2.30GHz CPU and NVIDIA Tesla V100 16GB GPU.

Appendix D Hyper-parameter settings

We list the best hyper-parameter setting of Rot-Pro on the above datasets in Table 6. The setting of dimension and batch size is the same as RotatE [20].

dimension batch size fixed margin
FB15k-237 1000 1024 9.0 0.000001 1.5 0.001
WN18RR 500 512 4.0 0.000001 1.3 0.0003
YAGO3-10 500 1024 16.0 0.000001 1.5 0.0005
Countries 500 512 0.1 0.000001 1.5 0.0005
Table 6: Hyper-parameter settings

Appendix E Transitivity performance comparison with BoxE

The fully expressive of BoxE refers to that it is able to express inference patterns, which includes symmetry, anti-symmetry, inversion, composition, hierarchy, intersection, and mutual exclusion. However, it does not model and infer the transitivity pattern. Therefore, we further conducted experiments on the three sub-test sets S1, S2, S3 we sampled from YAGO3-10 as described in the paper to verify this. The experimental results are listed as below.

S1 S2 S3
BoxE Rot-Pro BoxE Rot-Pro BoxE Rot-Pro
MR .343 .337 .290 .328 .381 .447
Hit@1 .255 .247 .262 .235 .349 .337
Hit@3 .385 .376 .291 .366 .385 .517
Hit@10 .504 .512 .342 .522 .439 .626
Table 7: Link prediction result of BoxE and Rot-Pro on S1, S2, S3 test sets.