1 Introduction
Knowledge graphs (KGs) live at the heart of many semantic applications (e.g., question answering, search and natural language processing). KGs enable not only powerful relational reasoning but also the ability to learn structural representations. Reasoning with KGs have been an extremely productive research direction, with many innovations leading to improvements to many downstream applications. However, realworld KGs are usually incomplete. As such, completing KGs and predicting missing links between entities have gained growing interest. Learning lowdimensional representations of entities and relations for KGs is an effective solution for this task.
Learning KG embeddings in the complex space has been proven to be a highly effective inductive bias, largely owing to its intrinsic asymmetrical properties. This is demonstrated by the ComplEx embedding method which infers new relational triplets with the asymmetrical Hermitian product.
In this paper, we move beyond complex representations, exploring hypercomplex space for learning KG embeddings. More concretely, quaternion embeddings are utilized to represent entities and relations. Each quaternion embedding is a vector in the hypercomplex space with three imaginary components , as opposed to the standard complex space with a single real component and imaginary component i. We propose a new scoring function, where the head entity is rotated by the relational quaternion embedding through Hamilton product. This is followed by a quaternion inner product with the tail entity .
There are numerous benefits for this formulation. (1) The Hamilton operator provides a greater extent of expressiveness compared to the complex Hermitian operator and the inner product in euclidean space. The Hamilton operator forges interlatent interactions between all of , resulting in a highly expressive model. (2) Quaternion representations are highly desirable for parameterizing smooth rotation and spatial transformations in vector space. They are generally considered robust to sheer/scaling noise and perturbations (i.e., numerically stable rotations) and avoid the problem of Gimbal locks. Moreover, quaternion rotations have two planes of rotation^{1}^{1}1A plane of rotation is an abstract object used to describe or visualize rotations in space. while complex rotations only work on single plane, giving the model more degrees of freedom. (3) Our QuatE framework subsumes the ComplEx method, concurrently inheriting its attractive properties such as its ability to model symmetry, antisymmetry and inversion. (4) Our model can maintain equal or even less parameterization, while outperforming previous work.
Experimental results demonstrate that our method achieves stateoftheart performance on four wellestablished knowledge graph completion benchmarks (WN18, FB15K, WN18RR, and FB15K237).
2 Related Work
Knowledge graph embeddings have attracted intense research focus in recent years, and a myriad of embedding methodologies have been proposed. We roughly divide previous work into translational models and semantic matching models based on the scoring function, i.e. the composition over head & tail entities and relations.
Translational methods popularized by TransE (Bordes et al., 2013) are widely used embedding methods, which interpret relation vectors as translations in vector space, i.e., . A number of models aiming to improve TransE are proposed subsequently. TransH (Wang et al., 2014)
introduces relationspecific hyperplanes with a normal vector. TransR
(Lin et al., 2015) further introduces relationspecific space by modelling entities and relations in distinct space with a shared projection matrix. TransD (Ji et al., 2015) uses independent projection vectors for each entity and relation and can reduce the amount of calculation compared to TransR. TorusE (Ebisu and Ichise, 2018) defines embeddings and distance function in a compact Lie group, torus and shows better accuracy and scalability. The recent stateoftheart, RotatE (Sun et al., 2019) proposes a rotationbased translational method with complexvalued embeddings.On the other hand, semantic matching models include bilinear models, such as RESCAL (Nickel et al., 2011), DistMult (Yang et al., 2014), HolE (Nickel et al., 2016) and ComplEx (Trouillon et al., 2016)
, and neuralnetworkbased models. These methods measure plausibility by matching latent semantics of entities and relations. In RESCAL, each relation is represented with a square matrix, while DistMult replace it with a diagonal matrix in order to reduce the complexity. SimplE
(Kazemi and Poole, 2018) is also a simple yet effective bilinear approach for knowledge graph embedding. HolE explores the holographic reduced representations and makes use of circular correlation to capture rich interactions between entities. ComplEx embeds entities and relations in complex space and utilizes Hermitian product to model the antisymmetric patterns, which has shown to be immensely helpful in learning KG representations. The scoring function of ComplEx is isomorphic to that of HolE (Trouillon and Nickel, 2017). Neural networks based methods have also been adopted, e.g., Neural Tensor Network
(Socher et al., 2013) and ERMLP (Dong et al., 2014)are two representative neural network based methodologies. More recently, convolution neural networks
(Dettmers et al., 2018), graph convolutional networks (Schlichtkrull et al., 2018) and deep memory networks (Wang et al., 2018) also show promising performance on this task.Different from previous work, QuatE takes the advantages (e.g., its geometrical meaning and rich representation capability, etc.) of quaternion representations to enable rich and expressive semantic matching between head and tail entities, assisted by relational rotation quaternions. Our framework subsumes DistMult and ComplEx, with the capability to generalize to more advanced hypercomplex spaces. QuatE utilizes the concept of geometric rotation. Unlike the RotatE which has only one plane of rotation, there are two planes of rotation in QuatE. QuatE is a semantic matching model while RotatE is a translational model. We also point out that the composition property introduced in TransE and RotatE can have detrimental effects on the KG embedding task.
Quaternion is a hypercomplex number systems firstly described by Hamilton (Hamilton, 1844)
with applications in wide variety of areas including astronautics, robotics, computer visualisation, animation and special effects in movies, navigation. Lately, Quaternions have attracted attention in the field of machine learning. Quaternion recurrent neural networks (QRNNs) obtain better performance with fewer number of free parameters than traditional RNNs on the phoneme recognition task. Quaternion representations are also useful for enhancing the performance of convolutional neural networks on multiple tasks such as automatic speech recognition
(Parcollet et al., ) and image classification (Gaudet and Maida, 2018; Parcollet et al., 2018a). Quaternion multiplayer perceptron
(Parcollet et al., 2016)and quaternion autoencoders
(Parcollet et al., 2017) also outperform standard MLP and autoencoder. In a nutshell, the major motivation behind these models is that quaternions enable the neural networks to code latent inter and intradependencies between multidimensional input features, thus, leading to more compact interactions and better representation capability.3 Hamilton’s Quaternions
Quaternion (Hamilton, 1844) is a representative of hypercomplex number system, extending traditional complex number system to fourdimensional space. A quaternion consists of one real component and three imaginary components, defined as , where are real numbers and are imaginary units. i, j and k are square roots of , satisfying the Hamilton’s rules: . More useful relations can be derived based on these rules, such as ij = k, ji = k, jk=i, ki=j, kj=i and ik=j. Figure 1(b) shows the quaternion imaginary units product. Apparently, the multiplication between imaginary units is noncommutative. Some widely used operations of quaternion algebra are introduced as follows:
Conjugate: The conjugate of a quaternion is defined as .
Norm: The norm of a quaternion is defined as .
Inner Product: The quaternion inner product between and is obtained by taking the inner products between corresponding scalar and imaginary components and summing up the four inner products:
(1) 
Hamilton Product (Quaternion Multiplication): The Hamilton product is composed of all the standard multiplications of factors in quaternions and follows the distributive law, defined as:
(2) 
which determines another quaternion. Hamilton product is not commutative. Spatial rotations can be modelled with quaternions Hamilton product. Multiplying a quaternion, , by another quaternion , has the effect of scaling by the magnitude of followed by a special type of rotation in four dimensions. As such, we can also rewrite the above equation as:
(3) 
4 Method
4.1 Quaternion Representations for Knowledge Graph Embeddings
Suppose that we have a knowledge graph consisting of entities and relations. and denote the sets of entities and relations, respectively. The training set consists of triplets , where and . We use and to denote the set of observed triplets and the set of unobserved triplets, respectively. represents the corresponding label of the triplet . The goal of knowledge graph embeddings is to embed entities and relations to a continuous lowdimensional space, while preserving graph relations and semantics.
In this paper, we propose learning effective representations for entities and relations with quaternions. We leverage the expressive rotational capability of quaternions. Unlike RotatE which has only one plane of rotation (i.e., complex plane, shown in Figure 1(a)), QuatE has two planes of rotation. Compared to Euler angles, quaternion can avoid the problem of gimbal lock (loss of one degree of freedom). Quaternions are also more efficient and numerically stable than rotation matrices. The proposed method can be summarized into two steps: (1) rotate the head quaternion using the unit relation quaternion; (2) take the quaternion inner product between the rotated head quaternion and the tail quaternion to score each triplet.
Quaternion Embbeddings of Knowledge Graphs
More specifically, we use a quaternion matrix to denote the entity embeddings and to denote the relation embeddings, where is the dimension of embeddings. Given a triplet , the head entity and the tail entity correspond to : and : , respectively, while the relation is represented by : .
HamiltonProductBased Relational Rotation
We first normalize the relation quaternion to a unit quaternion to eliminate the scaling effect by dividing by its norm:
(4) 
We visualize a unit quaternion in Figure 1(c) by projecting it into 3D space. We keep the unit hypersphere which passes through in place. The unit quaternion can be project in, on or out of the unit hypersphere depending on the value of real part.
Secondly, we rotate the head entity by doing Hamilton product between it and :
(5) 
where denotes the elementwise multiplication between two vectors. Rightmultiplication by a unit quaternion is a rightisoclinic rotation on Quaternion . We can also swap and and do a leftisoclinic rotation, which does not fundamentally change the geometrical meaning. Isoclinic rotation is a special case of double plane rotation where the angles for each plane are equal.
Scoring Function and Loss
We apply the quaternion inner product as the scoring function:
(6) 
Following Trouillon et al. (2016), we formulate the task as a classification problem, and the model parameters are learned by minimizing the following regularized logistic loss:
(7) 
Here we use the norm with regularization rates and to regularize and , respectively. is sampled from the unobserved set
using the uniform negative sampling strategy. Note that the loss function is in euclidean space, as we take the summation of all components when computing the scoring function in Equation (
6). We utilise Adagrad (Duchi et al., 2011) for optimization.Initialization
We adopt the initialization algorithm in (Parcollet et al., 2018b) tailored for quaternionvalued networks to speed up model efficiency and convergence (Glorot and Bengio, 2010). The initialization of entities and relations follows the rule:
(8) 
Here denote the scalar and imaginary coefficients, respectively. is randomly generated from the interval . is a normalized quaternion, whose scalar part is zero. is randomly generated from the interval , reminiscent to the He initialization (He et al., 2015).
Model  Scoring Function  Parameters  

TransE  
HolE  
DistMult  
ComplEx  
RotatE  
TorusE  
QuatE 
4.2 Discussion
Table 1 summarizes several popular knowledge graph embedding models, including scoring functions, parameters and time complexities. TransE, HolE and DistMult use euclidean embeddings, while ComplEx and RotatE operate in the complex space. In contrast, our model operates in the quaternion space.
Capability in Modeling Symmetry, Antisymmetry and Inversion. The flexibility and representational power of quaternions enable us to model major relation patterns at ease. Similar to ComplEx, our model can model both symmetry and antisymmetry relations. The symmetry property of QuatE can be proved by setting the imaginary parts of to zero. One can easily check that the scoring function is antisymmetric when the imaginary parts are nonzero.
As for the inversion pattern , we can utilize the conjugation of quaternions. Conjugation is an involution and is its own inverse. One can easily check that:
(9) 
The detailed proof of antisymmetry and inversion can be found in the appendix.
As for the composition patterns, both transE and RotatE have fixed composition methods (Sun et al., 2019). TransE composes two relations using the addition () and RotatE uses the Hadamard product (). We argue that it is unreasonable to fix the composition patterns, as there might exist multiple composition patterns even in a single knowledge graph. For example, suppose there are three persons . If is the elder sister (denoted as ) of and is the elder brother (denoted as ) of , we can easily infer that is the elder brother of . The relation between and is instead of or , violating the two composition methods of TransE and RotatE.
Connection to DistMult and ComplEx. Quaternions have more degrees of freedom compared to complex numbers. Here we show that the QuatE framework can be seen as a generalization of ComplEx. If we set the coefficients of the imaginary units j and k to zero, we get complex embeddings as in ComplEx and the Hamilton product will also degrade to complex number multiplication. We further remove the normalization of the relational quaternion, obtaining the following equation:
(10) 
where denotes standard componentwise multilinear dot product. Equation 10 recovers the form of ComplEx. This framework brings another mathematical interpretation for ComplEx instead of just taking the real part of the Hermitian product. Another interesting finding is that Hermitian product is not necessary to formulate the scoring function of ComplEx.
If we remove the imaginary parts of all quaternions and remove the normalization step, the scoring function becomes , degrading to DistMult in this case.
Octonions. Apart from Quaternion, we can also extend our framework to Octonions (hypercomplex number with one real part and seven imaginary parts) and even Sedenions (hypercomplex number with one real part and fifteen imaginary parts). For completeness, we provide the details and results of Octonion embeddings in the appendix.
5 Experiments and Results
5.1 Experimental Setup
Datasets Description: We conducted experiments on four widely used benchmarks, WN18, FB15K, WN18RR and FB15K237, of which the statistics are summarized in Table 2. WN18 (Bordes et al., 2013) is extracted from WordNet^{2}^{2}2https://wordnet.princeton.edu/, a lexical database for English language, where words are interlinked by means of conceptualsemantic and lexical relations. WN18RR (Dettmers et al., 2018) is a subset of WN18, with inverse relations removed. FB15K (Bordes et al., 2013) contains relation triples from Freebase, a large tuple database with structured general human knowledge. FB15K237 (Toutanova and Chen, 2015) is a subset of FB15K, with inverse relations removed.
Dataset  N  M  #training  #validation  #test  avg. #degree 

WN18  40943  18  141442  5000  5000  3.45 
WN18RR  40943  11  86835  3034  3134  2.19 
FB15K  14951  1345  483142  50000  59071  32.31 
FB15K237  14541  237  272115  17535  20466  18.71 
Evaluation Protocol:
Three popular evaluation metrics are used, including Mean Rank (MR), Mean Reciprocal Rank (MRR) and Hit ratio with cutoff values
. MR measures the average rank of all correct entities with lower value representing better performance. MRR is the average inverse rank for correct entities. Hit@n measures the proportion of correct entities in the top entities. Following Bordes et al. (2013), filtered results are reported to avoid possibly flawed evaluation.Baselines: We compared QuatE with a number of strong baselines. For Translational Distance Models, we reported TransE (Bordes et al., 2013) and two recent extensions, TorusE (Ebisu and Ichise, 2018) and RotatE (Sun et al., 2019); For Semantic Matching Models, we reported DistMult (Yang et al., 2014), HolE (Nickel et al., 2016), ComplEx (Trouillon et al., 2016) , SimplE (Kazemi and Poole, 2018), ConvE (Dettmers et al., 2018), RGCN (Schlichtkrull et al., 2018) and KNGE (ConvE based) (Wang et al., 2018).
Implementation Details:
We implemented our model using pytorch
^{3}^{3}3https://pytorch.org/ and tested it on a single GPU. The hyperparameters are determined by grid search. The best models are selected by early stopping on the validation set. The embedding size is tuned amongst . Regularization rate and are searched in . Learning rate is fixed to without further tuning. The number of negatives () per training sample is selected from . We create batches for all the datasets. For fair comparison, techniques such as reciprocal relations adding inverse relations for each triplet (Lacroix et al., 2018) and selfadversarial negative sampling (Sun et al., 2019), are out of the scope of this paper and not used. For most baselines, we report the results in the original papers, and exceptions are provided with references. For RotatE (without selfadversarial negative sampling), we use the best hyperparameter settings provided in the paper to reproduce the results. We also report the results of RotatE with selfadversarial negative sampling and denote it as aRotatE.5.2 Results
WN18  FB15K  

Model  MR  MRR  Hit@10  Hit@3  Hit@1  MR  MRR  Hit@10  Hit@3  Hit@1 
TransE    0.495  0.943  0.888  0.113    0.463  0.749  0.578  0.297 
DistMult  655  0.797  0.946      42.2  0.798  0.893     
HolE    0.938  0.949  0.945  0.930    0.524  0.739  0.759  0.599 
ComplEx    0.941  0.947  0.945  0.936    0.692  0.840  0.759  0.599 
ConvE  374  0.943  0.956  0.946  0.935  51  0.657  0.831  0.723  0.558 
RGCN+    0.819  0.964  0.929  0.697    0.696  0.842  0.760  0.601 
SimplE    0.942  0.947  0.944  0.939    0.727  0.838  0.773  0.660 
NKGE  336  0.947  0.957  0.949  0.942  56  0.73  0.871  0.790  0.650 
TorusE    0.947  0.954  0.950  0.943    0.733  0.832  0.771  0.674 
RotatE  184  0.947  0.961  0.953  0.938  32  0.699  0.872  0.788  0.585 
aRotatE  309  0.949  0.959  0.952  0.944  40  0.797  0.884  0.830  0.746 
QuatE  162  0.950  0.959  0.954  0.945  17  0.782  0.900  0.835  0.711 
WN18RR  FB15K237  

Model  MR  MRR  Hit@10  Hit@3  Hit@1  MR  MRR  Hit@10  Hit@3  Hit@1 
TransE  3384  0.226  0.501      357  0.294  0.465     
DistMult  5110  0.43  0.49  0.44  0.39  254  0.241  0.419  0.263  0.155 
ComplEx  5261  0.44  0.51  0.46  0.41  339  0.247  0.428  0.275  0.158 
ConvE  4187  0.43  0.52  0.44  0.40  244  0.325  0.501  0.356  0.237 
RGCN+              0.249  0.417  0.264  0.151 
NKGE  4170  0.45  0.526  0.465  0.421  237  0.33  0.510  0.365  0.241 
RotatE  3277  0.470  0.565  0.488  0.422  185  0.297  0.480  0.328  0.205 
aRotatE  3340  0.476  0.571  0.492  0.428  177  0.338  0.533  0.375  0.241 
QuatE  2314  0.488  0.582  0.508  0.438  87  0.348  0.550  0.382  0.248 
The empirical results on four datasets are reported in Table 3 and Table 4. QuatE performs extremely competitively compared to the existing stateoftheart models across all metrics. As a quaternionvalued method, QuatE outperforms the two representative complexvalued models ComplEx and RotatE. The performance gains over RotatE also confirm the advantages of quaternion rotation over rotation in the complex plane. Although we do not use selfadversarial negative sampling for QuatE, it still outperforms aRotatE.
Relation Name  RotatE  QuatE 

hypernym  0.148  0.173 
derivationally_related_form  0.947  0.953 
instance_hypernym  0.318  0.364 
also_see  0.585  0.629 
member_meronym  0.232  0.232 
synset_domain_topic_of  0.341  0.468 
has_part  0.184  0.233 
member_of_domain_usage  0.318  0.441 
member_of_domain_region  0.200  0.193 
verb_group  0.943  0.924 
similar_to  1.000  1.000 
On the WN18 dataset, QuatE outperforms all the baselines on all metrics except Hit@10. RGCN+ achieves high value on Hit@10, yet is surpassed by most models on the other four metrics. The four recent models NKGE, TorusE, RotaE, and aRotatE achieves comparable results. QuatE also achieves the best results on the FB15K dataset on MR, Hit@10 and Hit@3. while the second best results scatter amongst RotatE, aRotatE and DistMult. We are wellaware of the good results of DistMult reported in (Kadlec et al., 2017), yet they used a very large negative sampling size (i.e., , ). QuatE outperforms RotatE on all metrics on FB15K. The results also demonstrate that QuatE can effectively capture the symmetry, antisymmetry and inversion patterns since they account for a large portion of the relations in these two datasets.
As shown in Table 4, QuatE achieves a large performance gain over existing stateoftheart models on the two datasets where trivial inverse relations are removed. On WN18RR in which there are a number of symmetry relations, aRotatE is the second best, while other baselines are relatively weaker. The key competitors on the dataset FB15K237 where a large number of composition patterns exist are NKGE and aRotatE. Table 5 summarizes the MRR for each relation on WN18RR, confirming the superior representation capability of quaternion in modelling different types of relation.
Methods with fixed composition patterns such as TransE and RotatE are relatively weak at times. We also observe a large margin on MR for all datasets, reducing the second best to half on the two Freebase datasets. We conclude that QuatE can rank groundtruth triplets higher on average.
Model  TorusE  RotatE  QuatE 

Space  
WN18  409.61M  40.95M  40.96M () 
FB15K  162.96M  31.25M  26.08M() 
WN18RR    40.95M  16.38M() 
FB15K237    29.32M  5.82M() 
5.3 Model Analysis
Number of Free Parameters Comparison. Table 6 shows the amount of parameters comparison between QuatE and two recent competitive baselines: RotatE and TorusE. TorusE uses a very large embedding dimension for both WN18 and FB15K. This number is even close to the entities amount of FB15K which we think is not preferable since our original intention is to embed entities and relations to a lower dimensional space. QuatE reduces the parameter size of the complexvalued counterpart RotatE (and aRotatE) largely, saving up to parameters while maintaining superior performance.
WN18  FB15K  WN18RR  FB15K237  

Analysis  MRR  Hit@10  MRR  Hit@10  MRR  Hit@10  MRR  Hit@10 
0.936  0.951  0.686  0.866  0.415  0.482  0.272  0.463  
0.784  0.945  0.599  0.809  0.401  0.471  0.263  0.446  
0.947  0.958  0.787  0.889  0.477  0.563  0.344  0.539 
Analysis on different variants of scoring function. Same hyperparameters as QuatE are used.
Ablation Study on Quaternion Normalization. We remove the normalization step in QuatE and use the original relation quaternion to project head entity. From Table 7, we clearly observe that normalizing the relation to unit quaternion is a critical step for the embedding performance. This is likely because scaling effects in nonunit quaternions are detrimental.
Hamilton Products between Head and Tail Entities. We reformulate the scoring function of QuatE following the original formulate of ComplEx. We do Hamilton product between head and tail quaternions and consider the relation quaternion as weight. Thus, we have . As a result, the geometric property of relational rotation is lost, which leads to poor performance as shown in Table 7.
Additional Rotational Quaternion for Tail Entity. We hypothesize that adding an additional relation quaternion to tail entity might bring the model more representation capability. So we revise the scoring function to , where represents the rotational quaternion for tail entity. From Table 7, we observe that it achieves competitive results without extensive tuning. However, it might cause some losses of efficiency.
6 Conclusion
In this paper, we design a new knowledge graph embedding model which operates on the quaternion space with welldefined mathematical and physical meaning. Our model is advantageous with its capability in modelling several key relation patterns, expressiveness with higher degrees of freedom as well as its good generalization. Empirical experimental evaluations on four wellestablished datasets show that QuatE achieves an overall stateoftheart performance, outperforming multiple recent strong baselines, with even fewer free parameters.
References
 Bordes et al. [2013] Antoine Bordes, Nicolas Usunier, Alberto GarciaDuran, Jason Weston, and Oksana Yakhnenko. Translating embeddings for modeling multirelational data. In Advances in neural information processing systems, pages 2787–2795, 2013.

Dettmers et al. [2018]
Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel.
Convolutional 2d knowledge graph embeddings.
In
ThirtySecond AAAI Conference on Artificial Intelligence
, 2018.  Dong et al. [2014] Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. Knowledge vault: A webscale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–610. ACM, 2014.
 Duchi et al. [2011] John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121–2159, 2011.
 Ebisu and Ichise [2018] Takuma Ebisu and Ryutaro Ichise. Toruse: Knowledge graph embedding on a lie group. In ThirtySecond AAAI Conference on Artificial Intelligence, 2018.
 Gaudet and Maida [2018] Chase J Gaudet and Anthony S Maida. Deep quaternion networks. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.
 Glorot and Bengio [2010] Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256, 2010.
 Hamilton [1844] William Rowan Hamilton. Lxxviii. on quaternions; or on a new system of imaginaries in algebra: To the editors of the philosophical magazine and journal. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 25(169):489–495, 1844.

He et al. [2015]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification.
InProceedings of the IEEE international conference on computer vision
, pages 1026–1034, 2015.  Ji et al. [2015] Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), volume 1, pages 687–696, 2015.
 Kadlec et al. [2017] Rudolf Kadlec, Ondrej Bajgar, and Jan Kleindienst. Knowledge base completion: Baselines strike back. ACL 2017, page 69, 2017.
 Kazemi and Poole [2018] Seyed Mehran Kazemi and David Poole. Simple embedding for link prediction in knowledge graphs. In Advances in Neural Information Processing Systems, pages 4289–4300, 2018.
 Lacroix et al. [2018] Timothee Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical tensor decomposition for knowledge base completion. In International Conference on Machine Learning, pages 2869–2878, 2018.
 Lin et al. [2015] Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. Learning entity and relation embeddings for knowledge graph completion. In Twentyninth AAAI conference on artificial intelligence, 2015.
 Nguyen et al. [2017] Dai Quoc Nguyen, Tu Dinh Nguyen, Dat Quoc Nguyen, and Dinh Phung. A novel embedding model for knowledge base completion based on convolutional neural network. arXiv preprint arXiv:1712.02121, 2017.
 Nickel et al. [2011] Maximilian Nickel, Volker Tresp, and HansPeter Kriegel. A threeway model for collective learning on multirelational data. In ICML, volume 11, pages 809–816, 2011.
 Nickel et al. [2016] Maximilian Nickel, Lorenzo Rosasco, and Tomaso Poggio. Holographic embeddings of knowledge graphs. In Thirtieth Aaai conference on artificial intelligence, 2016.
 Parcollet et al. [2016] T. Parcollet, M. Morchid, P. Bousquet, R. Dufour, G. Linarès, and R. De Mori. Quaternion neural networks for spoken language understanding. In 2016 IEEE Spoken Language Technology Workshop (SLT), pages 362–368, Dec 2016. doi: 10.1109/SLT.2016.7846290.
 [19] Titouan Parcollet, Ying Zhang, Mohamed Morchid, Chiheb Trabelsi, Georges Linarès, Renato De Mori, and Yoshua Bengio. Quaternion convolutional neural networks for endtoend automatic speech recognition. arXiv preprint arXiv:1806.07789.
 Parcollet et al. [2017] Titouan Parcollet, Mohamed Morchid, and Georges Linarès. Quaternion denoising encoderdecoder for theme identification of telephone conversations. In INTERSPEECH, 2017.
 Parcollet et al. [2018a] Titouan Parcollet, Mohamed Morchid, and Georges Linarès. Quaternion convolutional neural networks for heterogeneous image processing. CoRR, abs/1811.02656, 2018a. URL http://arxiv.org/abs/1811.02656.
 Parcollet et al. [2018b] Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid, Georges Linarès, Chiheb Trabelsi, Renato De Mori, and Yoshua Bengio. Quaternion recurrent neural networks. The International Conference on Learning Representations, abs/1806.04418, 2018b.
 Schlichtkrull et al. [2018] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European Semantic Web Conference, pages 593–607. Springer, 2018.
 Socher et al. [2013] Richard Socher, Danqi Chen, Christopher D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In Advances in neural information processing systems, pages 926–934, 2013.
 Sun et al. [2019] Zhiqing Sun, ZhiHong Deng, JianYun Nie, and Jian Tang. Rotate: Knowledge graph embedding by relational rotation in complex space. 2019.
 Toutanova and Chen [2015] Kristina Toutanova and Danqi Chen. Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pages 57–66, 2015.
 Trouillon and Nickel [2017] Théo Trouillon and Maximilian Nickel. Complex and holographic embeddings of knowledge graphs: a comparison. arXiv preprint arXiv:1707.01475, 2017.
 Trouillon et al. [2016] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex embeddings for simple link prediction. In International Conference on Machine Learning, pages 2071–2080, 2016.
 Wang et al. [2018] Kai Wang, Yu Liu, Xiujuan Xu, and Dan Lin. Knowledge graph embedding with entity neighbors and deep memory network. arXiv preprint arXiv:1808.03752, 2018.
 Wang et al. [2014] Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by translating on hyperplanes. In TwentyEighth AAAI conference on artificial intelligence, 2014.
 Yang et al. [2014] Bishan Yang, Wentau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575, 2014.
7 Appendix
7.1 Proof of Antisymmetry and Inversion
Proof of antisymmetry pattern
In order to prove the antisymmetry pattern, we need to prove the following inequality when imaginary components are nonzero:
(11) 
Firstly, we expand the left term:
We then expand the right term:
We can easily see that those two terms are not equal as the signs for some terms are not the same.
Proof of inversion pattern
To prove the inversion pattern, we need to prove that:
(12) 
We expand the right term:
We can easily check the equality of these two terms.
7.2 Hyperparameters Settings
We list the best hyperparameters setting of QuatE on the benchmark datasets:

WN18:

FB15K:

WN18RR:

FB15K237:
7.3 Octonion for Knowledge Graph embedding
As we mentioned in the Section 4, we can generalize our framework into the Octonion space. Here, we use OctonionE to denote this method and details are given in the following text.
Octonions are hypercomplex numbers with seven imaginary components. The Octonion algebra, or Cayley algebra, defines operations between Octonion numbers. An Octonion is represented in the form: , where are imaginary units which re the square roots of . The multiplication rules are encoded in the Fano Plane (shown in Figure 2). Multiplying two neighboring elements on a line results in the third element on that same line. Moving with the arrows gives a positive answer and moving against arrows gives a negative answer.
The conjugate of Octonion is defined as: .
The norm of Octonion is defined as: .
If we have another Octonion: . We derive the multiplication rule with the Fano Plane.
(13) 
We can also consider Octonions as a combination of two Quaternions. The scoring functions of OctonionE remains the same as QuatE.
(14) 
The results of OctonionE on dataset WN18 and WN18RR are given below. We observe that OctonionE performs equally to QuatE. It seems that extending the model to Octonion space does not give additional benefits. Octonions lose some algebraic properties such as associativity, which might bring some side effects to the model.
WN18  

Model  MR  MRR  Hit@10  Hit@3  Hit@1 
OctonionE  182  0.950  0.959  0.954  0.944 
WN18RR  
Model  MR  MRR  Hit@10  Hit@3  Hit@1 
OctonionE  2098  0.486  0.582  0.508  0.435 
Comments
There are no comments yet.