1 Introduction
Embeddings of knowledge graphs have received significant attention due to their excellent performance for tasks like link prediction and entity resolution. In this short paper, we are providing a comparison of two stateoftheart knowledge graph embeddings for which their equivalence has recently been established, i.e., ComplEx and HolE [Nickel, Rosasco, and Poggio, 2016; Trouillon et al., 2016; Hayashi and Shimbo, 2017]. First, we briefly review both models and discuss how their scoring functions are equivalent. We then analyze the discrepancy of results reported in the original articles, and show experimentally that they are likely due to the use of different loss functions. In further experiments, we evaluate the ability of both models to embed symmetric and antisymmetric patterns. Finally, we discuss advantages and disadvantages of both models and under which conditions one would be preferable to the other.
2 Equivalence of Complex and Holographic Embeddings
In this section, we will briefly review Holographic and Complex embeddings and discuss the equivalence of their scoring functions.
Let be a knowledge graph, which consists of entities , relation types and observed triples . Furthermore, let be a training set, which associates with each possible triple in its truth values . That is, for a possible triple with and it holds that
For knowledge graphs with a large number of possible triples we employ negative sampling as proposed by Bordes et al. [2013]. The objective of knowledge graph completion is then to learn a scoring function for any and which predicts the truth value of possible triples. We will write and .
For notational convenience, we define the trilinear product of three complex vectors as:
where , and denotes the Hadamard product, i.e. the elementwise product between two vectors of same length.
In the following, we will consider the discrete Fourier transform (DFT) of purely real vectors only :
. For :(1) 
where is the value in the resulting complex vector . Note that the components in Equation 1 are indexed from 0 to .
Holographic Embeddings
The holographic embeddings model (HolE) [Nickel, Rosasco, and Poggio, 2016] represents relations and entities with realvalued embeddings , , and scores a triple with the dot product between the embedding of the relation and the circular correlation of the embeddings of entities and :
(2) 
The circular correlation can be written with the discrete Fourier transform (DFT),
(3) 
where is the inverse DFT. In this case, the embedding vectors are real , and so is the result of the inverse DFT, since the circular correlation of realvalued vectors results in a realvalued vector.
Complex Embeddings
The complex embeddings model (ComplEx) [Trouillon et al., 2016, 2017] represents relations and entities with complexvalued embeddings , , and scores a triple with the real part of the trilinear product of the corresponding embeddings:
(4) 
where are complex vectors, and is the complex conjugate of the vector .
Equivalence
The equivalence of HolE and ComplEx has recently been shown by Hayashi and Shimbo [2017]. In the following, we briefly discuss this equivalence of both models and how it can be derived. For completeness, a full proof similar to that of Hayashi and Shimbo [2017] is included in Appendix A.
First, to derive the connection between HolE and ComplEx, consider Parseval’s Theorem:
Theorem 1.
Suppose are real vectors. Then .
Using creftypecap 1 as well as Equations 3 and 2, we can then rewrite the scoring function of HolE as:
(5)  
(6) 
Hence, for HolE we could directly learn complex embeddings instead of learning embeddings and mapping them into the frequency domain and back. However, to ensure that the trilinear product of these complex embeddings is a real number, we would either need to enforce the same symmetry constraints on and that arise from the DFTs or—alternatively—take only the realvalued part of the trilinear product. We show in Appendix A that these are two ways of performing the same operation, hence showing that the scoring functions of ComplEx and HolE are equivalent—up to a constant factor.
Furthermore, both models have equal memory complexity, as the equivalent complex vectors are twice as small (see proof in Appendix A) but require twice as much memory as realvalued ones of same size—for a given floatingpoint precision. However, the complex formulation of the scoring function reduces the time complexity from (quasilinear) to (linear).
3 Loss Functions & Predictive Abilities
The experimental results of HolE and ComplEx as reported by Nickel, Rosasco, and Poggio [2016] and Trouillon et al. [2016] agreed on the WN18 data set, but diverged significantly on FB15K [Bordes et al., 2014]—although both scoring function are equivalent. Since the main difference in the experimental settings was the use of different loss functions—i.e., margin loss versus logistic loss—we analyze in this section whether the discrepancy of results can be attributed to this fact. For this purpose, we implemented both loss functions for the complex representation within the same framework, and compared the results on the WN18 and FB15K data sets.
First, note that in both data sets, only positive training triples are provided. Negative examples are generated by corrupting the subject or object entity of each positive triple, as described in Bordes et al. [2013]. In the original HolE publication [Nickel, Rosasco, and Poggio, 2016], a pairwise margin loss is optimized over each positive and its corrupted negative :
(7) 
where
is the margin hyperparameter, and
the standard logistic function. The entity embeddings are also constrained to unit norm : , for all .Whereas in Trouillon et al. [2016], the generated negatives are merged into the training set at each batch sampling, and the loglikelihood is optimized with regularization:
(8) 
Optimization is conducted with stochastic gradient descent, AdaGrad
[Duchi, Hazan, and Singer, 2011], and early stopping, as described in Trouillon et al. [2016]. A single corrupted negative triple is generated for each positive training triple. The results are reported for the best validated models after gridsearch on the following values: 10, 20, 50, 100, 150, 200, 0.1, 0.03, 0.01, 0.003, 0.001, 0.0003, 0.0 for the loglikelihood loss, and 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 for the maxmargin loss. The raw and filtered mean reciprocal ranks (MRR), as well as the filtered hits at 1, 3 and 10 are reported in Table 1.WN18  FB15K  
MRR  Hits at  MRR  Hits at  
Loss  Filtered  Raw  1  3  10  Filtered  Raw  1  3  10 
Margin  0.938  0.605  0.932  0.942  0.949  0.541  0.298  0.411  0.627  0.757 
NegLL  0.941  0.587  0.936  0.945  0.947  0.639  0.250  0.523  0.725  0.825 
The margin loss results are consistent with the HolE ones originally reported in Nickel, Rosasco, and Poggio [2016], which confirms the equivalence of the scoring functions, and supports the hypothesis that the loss was responsible for the difference in previously reported results. The loglikelihood results are also coherent, as one must note that the higher scores reported on FB15K in Trouillon et al. [2016] are due to the use of more than one generated negative sample for each positive training triple. Here, we generated a single negative sample for each positive one in order to keep the comparison fair between the two losses. The maxmargin loss achieves a better raw MRR (rankings without removing the training samples) on both datasets, but much worse filtered metrics on FB15K, suggesting that this loss can be more prone to overfitting.
4 Scoring Function & Symmetry
The results in Section 3 suggest that the choice of scoring function, i.e., ComplEx or HolE, does not affect the predictive abilities of the model. An additional important question is whether one of the models—in practice—is better suited for modeling certain types of relations. In particular, for symmetric relations, HolE needs to learn embeddings for which the imaginary part after the DFT is close to zero. ComplEx, on the other hand, can learn such representations easily as it operates directly in the complex domain. The question whether this difference in models translates to differences in practice affects the learning of both symmetric and antisymmetric relations. Relations are symmetric when triples have the same truth value by permutation of the subject and object entities: for all , whereas facts of antisymmetric relations have inverse truth values: . To evaluate this question experimentally, we reproduced the joint learning of synthetic symmetric and antisymmetric relations described in Trouillon et al. [2016] on both scoring functions. We used the loglikelihood loss as all negatives are observed.
We generated randomly a symmetric matrix, and a antisymmetric matrix. Jointly, they represent a tensor. To ensure that all test values are predictable, the upper triangular parts of the matrices are always kept in the training set, and the diagonals are unobserved. We conducted 5fold crossvalidation on the lowertriangular matrices, using the uppertriangular parts plus 3 folds for training, one fold for validation and one fold for testing. The regularization parameter is validated among the same values as in the previous experiment.
Figure 1 shows the best crossvalidated average precision (area under the precisionrecall curve) for the two scoring functions for ranks ranging up to 50. Both models manage to perfectly model symmetry and antisymmetry. As the ComplEx model has twice has many parameters for a given rank, it reaches a perfect average precision with a twice smaller rank. This confirms that the representation of the scoring function does not affect the learning abilities of the models in practice.
5 Discussion
We have demonstrated that the scoring functions of the HolE and ComplEx models are directly proportional. This hence extends the existence property of the ComplEx model over all knowledge graphs [Trouillon et al., 2017] to the HolE
model. We also showed experimentally that the difference between the reported results of the two models was due to the use of different loss functions, and specifically that the loglikelihood loss can produce a large improvement of predictive performances over the more often used margin loss. We have also shown that Complex and Holographic embeddings can be trained equally well on symmetric and antisymmetric patterns. All these things being equal, an interesting question is then in which settings one of the two models is preferable. Complex embeddings have an advantage in terms of time complexity as they scale linearly with the embedding dimension, whereas Holographic embeddings scale quasilinearly. An advantage of Holographic embeddings however is that the embeddings remain strictly in the real domain, which makes it easier for them to be used in other realvalued machine learning models. In contrast, Complex embeddings can not easily be transformed to realvalued vectors and used without loss of information—i.e. the specific way the real and imaginary parts interact in algebraic operations. Complexvalued models in which Complex embeddings can be directly input are emerging in machine learning
[Trabelsi et al., 2017; Danihelka et al., 2016], but this path is yet to be explored for other relational learning problems. Hence, if the task of interest is link prediction, Complex embeddings offer an improved runtime complexity in the order of . If the embeddings should be used in further machine learning models, e.g. for entity classification, Holographic embeddings provide better compatibility with existing realvalued methods.Furthermore, while the choice of the loss is of little consequence on the WN18 dataset, our experiments showed that the loglikelihood loss performed significantly better on FB15K. While much research attention has been given to scoring functions in link prediction, little has been said about the losses, and the maxmargin loss has been used in most of the existing work [Bordes et al., 2013; Yang et al., 2015; Riedel et al., 2013]
. An interesting direction of future work is therefore a more detailed study of loss functions for knowledge graph embeddings—especially in light of the highly skewed label distribution and the openworld assumption which are characteristic for knowledge graphs but unusual for standard machine learning settings.
Acknowledgments
This work was supported in part by the Association Nationale de la Recherche et de la Technologie through the CIFRE grant 2014/0121.
Appendix A Proof of Equivalence
In this section, we provide the full proof for the equivalence of both models. Note that a similar proof has recently been derived by Hayashi and Shimbo [2017].
We start from Equation 5 and show that there always exists corresponding realvalued holographic embeddings and complex embeddings such that the scoring functions of HolE and ComplEx are directly proportional, i.e. they are mathematically equal up to a constant multiplier : . The key idea is in showing that the symmetry structure of vectors resulting from Fourier transform of realvalued vectors is such that, the trilinear product between these structured vectors is actually equal to keeping the real part of the trilinear product of their first half.
First, we derive a property of the DFT on real vectors , showing that the resulting complex vector has a partially symmetric structure, for :
and given that is an integer: ,  
and since ,  
(9) 
Two special cases arise, the first one is , which is not concerned by the above symmetry property:
And the second one is when is even:
From LABEL:eq:f_k/2, LABEL:eq:f_0 and 9, we write the general form of the Fourier transform of a real vector :
(12) 
where , with , and is in reversed order: .
We can then derive Equation 6 for , first with being odd:
(13) 
where . The derivation is similar when is even, with double prime vectors being .
As mentioned in Section 2, the complex vectors equivalent to the real vectors are twice smaller, but take twice as much memory as realvalued ones of same size at a given floatingpoint precision. Both models hence have the exact same memory complexity.
References
 Bordes et al. [2013] Bordes, A.; Usunier, N.; GarciaDuran, A.; Weston, J.; and Yakhnenko, O. 2013. Translating embeddings for modeling multirelational data. In Advances in Neural Information Processing Systems, 2787–2795.
 Bordes et al. [2014] Bordes, A.; Glorot, X.; Weston, J.; and Bengio, Y. 2014. A semantic matching energy function for learning with multirelational data. Machine Learning 94(2):233–259.
 Danihelka et al. [2016] Danihelka, I.; Wayne, G.; Uria, B.; Kalchbrenner, N.; and Graves, A. 2016. Associative long shortterm memory. arXiv preprint arXiv:1602.03032.
 Duchi, Hazan, and Singer [2011] Duchi, J.; Hazan, E.; and Singer, Y. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12:2121–2159.
 Hayashi and Shimbo [2017] Hayashi, K., and Shimbo, M. 2017. On the equivalence of holographic and complex embeddings for link prediction. arXiv preprint arXiv:1702.05563.

Nickel, Rosasco, and
Poggio [2016]
Nickel, M.; Rosasco, L.; and Poggio, T. A.
2016.
Holographic embeddings of knowledge graphs.
In
AAAI Conference on Artificial Intelligence
, 1955–1961.  Riedel et al. [2013] Riedel, S.; Yao, L.; McCallum, A.; and Marlin, B. M. 2013. Relation extraction with matrix factorization and universal schemas. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 74–84.
 Trabelsi et al. [2017] Trabelsi, C.; Bilaniuk, O.; Serdyuk, D.; Subramanian, S.; Santos, J. F.; Mehri, S.; Rostamzadeh, N.; Bengio, Y.; and Pal, C. J. 2017. Deep complex networks. arXiv preprint arXiv:1705.09792.
 Trouillon et al. [2016] Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, E.; and Bouchard, G. 2016. Complex embeddings for simple link prediction. In International Conference on Machine Learning, volume 48, 2071–2080.
 Trouillon et al. [2017] Trouillon, T.; Dance, C. R.; Welbl, J.; Riedel, S.; Gaussier, É.; and Bouchard, G. 2017. Knowledge graph completion via complex tensor factorization. arXiv preprint arXiv:1702.06879, to appear in the Journal of Machine Learning Research.
 Yang et al. [2015] Yang, B.; Yih, W.T.; He, X.; Gao, J.; and Deng, L. 2015. Embedding entities and relations for learning and inference in knowledge bases. In International Conference on Learning Representations.
Comments
There are no comments yet.