1 Introduction
The construction and applications of knowledge graphs (KGs) have attracted much attention in recent years. Many KGs, such as WordNet [9], DBpedia [8], and Freebase [1], have been built and successfully applied to some AI domains, including information retrieval [16], recommender systems [19], questionanswering [5, 6]
, and natural language processing
[17]. A large KG stores billions of factual triplets in the form of directed graphs, where each triplet in the form of (head entity, relation, tail entity) (denoted by (, , ) in this paper) stands for an edge with two end nodes in the graph, indicating that there exists a specific relationship between the head and tail entities. On a graph with this kind of symbolic representation, algorithms that compute semantic relationships between entities usually have high computational complexity and lack scalability. Therefore, knowledge graph embedding is proposed to improve the calculation efficiency. By embedding entities and relations into a lowdimensional vector space, we can efficiently implement the operations such as the calculation of semantic similarity between entities, which is of considerable significance to the completion, reasoning, and applications of KGs.Quite a few methods [2, 12, 4, 10] have been proposed for knowledge graph embedding. Given a KG, these methods first assign one or more vectors (or matrices) to each entity and relation, then define a scoring function to measure the plausibility of each triplet, and finally maximize the global plausibility of all triplets. Thus, scoring functions play a critical role in the methods, which determine the capability and computational complexity of models. The capability of a model is primarily influenced by the variety of connectivity patterns and mapping properties of relations it can model. In a KG, following [10], we have four connectivity patterns of relations:

Symmetry (Antisymmetry). A relation is symmetric (antisymmetric) if , :

Inversion. Relation is inverse to relation if , :

Composition. Relation is composed of relation and relation if , , :
Also, following [2], we have four mapping properties of relations:

OnetoOne (1to1). Relation is 1to1 if a head can appear with at most one tail.

OnetoMany (1toN). Relation is 1toN if a head can appear with many tails.

ManytoOne (Nto1). Relation is Nto1 if many heads can appear with the same tail.

ManytoMany (NtoN). Relation is NtoN if many heads can appear with many tails.
If an embedding method could model connectivity patterns and mapping properties as many as possible, it would potentially benefit the applications. For example, in a link prediction task, a model has learned that relation Nationality is a Composition of BornIn and LocatedIn. When triplets (Tom, BornIn, New York), (New York, LocatedIn, United States) both hold, it can infer that triplet (Tom, Nationality, United States) holds. Another negative instance is that if a method cannot model Nto1
mapping property, it probably treats Leonardo DiCaprio and Kate Winslet as the same entity when it reads relations (Leonardo DiCaprio,
ActorIn, Titanic) and (Kate Winslet, ActorIn, Titanic).In this paper, we proposed a novel method, namely linear representation embedding (LineaRE), which interprets a relation as a linear function of entities head and tail. Specifically, our model represents each entity as a lowdimensional vector (denoted by or ), and each relation as two weight vectors and a bias vector (denoted by , , and ), where , , , , and . Given a golden triplet (, , ), we expect the equation , where denotes the Hadamard (elementwise) product. Tables 1 & 2 summarize the scoring functions and the modeling capabilities of some stateoftheart KG embedding methods, respectively. Table 1 shows that, the parameters of ComplEx and RotatE are defined in complex number spaces and those of the others (including our model) are defined in real number spaces. Compared with most of the other models, the scoring function of our LineaRE is simpler. Table 2 shows that, some of them (such as TransE and RotatE) are better at modeling connectivity patterns but do not consider complex mapping properties. In contrast, some others (TransH and DistMult) are better at modeling complex mapping properties but sacrifice some capability to model connectivity patterns. Our LineaRE has the most comprehensive modeling capability.
The contributions of the paper are threefold: (1) We propose a novel LineaRE method for KG embedding, which is simple and can cover all the above connectivity patterns and mapping properties. (2) We provide formal mathematical proofs to demonstrate the modeling capabilities of LineaRE. (3) We conduct extensive experiments to evaluate our LineaRE on the task of link prediction on several benchmark datasets. The experimental results show that LineaRE has significant improvements compared with the existing stateoftheart methods.
2 Related Work
Knowledge graph embedding models can be roughly categorized into two groups [14]: translational models and multiplicative models.
Translational Models.
Given a triplet (, , ), TransE [2] interprets the relation as a translation from the head entity to the tail entity . When a relation is symmetric, its vector will be represented by , resulting in TransE being unable to distinguish different symmetric relations. In addition, TransE has issues in dealing with 1toN, Nto1, and NtoN relations. TransH [15]
was proposed to address the issues of TransE in modeling complex relations, which interprets each relation as a translating operation on a hyperplane. However, such an operation cannot model inversion and composition patterns.
Multiplicative Models.
DistMult [18] is a bilinear model. For a triplet (, , ), the relation is represented as a diagonal matrix to capture pairwise interactions between the components of and along the same dimension. But, this simple model can only deal with symmetric relations. ComplEx [12] was proposed to address the issues of DistMult in modeling antisymmetric relations by introducing complexvalued embeddings. Unfortunately, ComplEx is still not capable of modeling the composition pattern, and the space and time complexity of the model are considerably increased.
Other Models.
ConvE [4] is a multilayer convolutional network model. The 2D convolution is able to extract feature interactions between two embeddings and . RotatE [10] represents entities and relations as complex vectors, and interprets the relation as a rotation from the head entity to the tail entity for a triplet. RotatE can model all the above connectivity patterns, but does not consider the complex mapping properties.
3 Our Method
In this section, we will introduce our proposed LineaRE model. First, we mathematically prove the powerful modeling capabilities of LineaRE. Then, we introduce the loss function used in our method.
3.1 Linear Representation Embedding
We provide the details of our proposed LineaRE in this part. We represent each entity as a lowdimensional vector ( or ), and each relation as two weight vectors (, ) and a bias vector (), where , , , , and . In LineaRE, a relation (, , ) defines straight lines in the rectangular coordinate system with and as axes. Therefore, we call our model linear representation embedding. Given a golden triplet (, , ), we expect that:
(1) 
where denotes the Hadamard (elementwise) product. The scoring function of LineaRE is:
(2) 
The connectivity patterns and mapping properties of relations are implicit in the properties of the straight lines. Formally, we have main results as follows:
Theorem 1.
LineaRE can model symmetry, antisymmetry, inversion and composition patterns.
Proof.
With and as the axes, LineaRE represents each dimension of a relation as a straight line in the rectangular coordinate system. Figure 1 illustrates LineaRE in onedimension case.

Symmetry (Each straight line of the relation is symmetrical about , Figure 1(a)).
When holds, , then , where is a constant vector. When holds, , then . To sum up, when or , LineaRE can model symmetry pattern.

Antisymmetry (There exist some straight lines not symmetrical about in the relation, Figure 1(b)).
When and , LineaRE can model symmetry pattern.

Inversion (The straight lines of and along the same dimension are symmetrical about , Figure 1(c)).
That is, the slopes of the straight lines along the same dimension in and are mutually reciprocal, and the intercepts are symmetrical about .

Composition (Composition of linear functions, Figure 1(d).)
is a linear mapping from to , and is a linear mapping from to , then a new linear mapping from to (ie., ) can be obtained by combining and .
∎
Theorem 2.
LineaRE can model 1to1, 1toN, Nto1 and NtoN relations.
Proof.
1to1: obviously, LineaRE can model 1to1 relations. 1toN: as shown in Figure 1(e), the straight line is one dimension of relation , close to axis. let be the maximum error, a given can appear with multiple values with low errors, where . The closer the straight line is to the axis, the larger the range of values is. Thus, multiple tail entities appearing with the same head entity can be appropriately far away from each other in such dimensions, and in other dimensions, these tail entities are closer to each other. Similarly, there exist some straight lines close to axis in Nto1 relations. NtoN relations contain both straight lines close to axis and straight lines close to axis. ∎
Corollary 1.
TransE model is a special case of LineaRE.
Proof.
Let , our LineaRE becomes TransE, ie., TransE defines a relation as straight lines with a constant slope of one, which is a special case of LineaRE. ∎
3.2 Loss Function
A KG only contains positive triplets, and the way to construct a negative sample is to randomly replace the head or tail entity of an observed triplet, which is called negative sampling. Many negative sampling methods have been proposed [3, 20, 13], among which the selfadversarial negative sampling method [10]
dynamically adjusts the weight of negative samples according to their scores as the training goes on. We adopt this negative sampling technique. Specifically, the weight (i.e., probability distribution) of negative triplets for a golden triplet is as follows:
(3) 
where is the temperature of sampling.
Then, we define the logistic loss function for a observed triplet and its negative samples:
(4) 
(5) 
where is a parameter that can adjust the margin between positive and negative sample scores; is the regularization coefficient; is the set of entities in the KG. Adam [7] is used as the optimizer.
Datasets  # E  # R  # Train  # Valid  # Test 

FB15k  14,951  1,345  483,142  50,000  59,071 
WN18  40,943  18  141,442  5,000  5,000 
FB15k237  14,541  237  272,115  17,535  20,466 
WN18RR  40,943  11  86,835  3,034  3,134 
Model  FB15k  WN18  

MR  MRR  hit@1  hit@3  hit@10  MR  MRR  hit@1  hit@3  hit@10  
TransE  35  .729  .638  .798  .873  184  .798  .713  .869  .949 
TransH  36  .731  .641  .800  .873  372  .796  .717  .856  .948 
DistMult  59  .789  .730  .830  .887  496  .810  .694  .922  .949 
ComplEx  63  .809  .757  .846  .894  531  .948  .945  .949  .953 
ConvE  64  .745  .670  .801  .873  504  .942  .935  .947  .955 
RotatE  40  .786  .723  .835  .884  264  .949  .943  .953  .960 
LineaRE  36  .839  .799  .864  .906  170  .952  .947  .955  .961 
Model  FB15k237  WN18RR  

MR  MRR  hit@1  hit@3  hit@10  MR  MRR  hit@1  hit@3  hit@10  
TransE  172  .334  .238  .371  .523  2933  .196  .021  .317  .529 
TransH  168  .339  .243  .375  .531  4736  .210  .018  .387  .473 
DistMult  301  .311  .225  .341  .485  6580  .424  .397  .433  .476 
ComplEx  376  .313  .227  .342  .486  6671  .446  .416  .462  .503 
ConvE  246  .316  .239  .350  .491  5277  .46  .39  .43  .48 
RotatE  174  .338  .245  .373  .526  3536  .477  .429  .493  .569 
LineaRE  168  .353  .258  .389  .545  1887  .486  .445  .500  .571 
Rel. Cat  1to1  1toN  Nto1  NtoN  1to1  1toN  Nto1  NtoN 

Task  Predicting Head (Hits@10)  Predicting Tail (Hits@10)  
TransE  .913  .974  .622  .880  .895  .705  .967  .908 
TransH  .914  .973  .612  .883  .894  .680  .967  .910 
DistMult  .925  .965  .657  .890  .923  .821  .949  .917 
ComplEx  .928  .962  .673  .897  .934  .831  .950  .923 
RotatE  .933  .973  .630  .894  .933  .709  .965  .922 
LineaRE  .926  .972  .723  .905  .913  .837  .965  .932 
Task  Predicting Head (MRR)  Predicting Tail (MRR)  
TransE  .736  .925  .489  .721  .731  .582  .903  .744 
TransH  .731  .922  .470  .728  .730  .559  .905  .751 
DistMult  .813  .922  .526  .793  .805  .683  .886  .817 
ComplEx  .820  .928  .557  .819  .815  .717  .890  .838 
RotatE  .859  .938  .511  .790  .857  .627  .906  .814 
LineaRE  .825  .938  .618  .842  .817  .751  .919  .865 
4 Experiments
In this section, We conduct extensive experiments to evaluate the proposed LineaRE method.
4.1 Datasets
Four widely used benchmark datasets are used in our experiments: FB15k [2], WN18 [2], FB15k237 [11] and WN18RR [4]. The statistical information of these datasets is summarized in Table 3.
FB15k is a subset of Freebase, while WN18 is a subset of WordNet, and [10] showed that the key of link prediction on both of them is to model the symmetry, antisymmetry, and inversion patterns. FB15k237 and WN18RR are subsets of FB15k and WN18, respectively. The main connectivity patterns in FB15k237 are symmetry, antisymmetry, and composition. There is almost no composition pattern in WN18RR. Thus, the main connectivity patterns in WN18RR are symmetry and antisymmetry.
4.2 Experimental Settings
We use link prediction as a canonical task to evaluate KG embedding models, because it reflects the level to which a model preserves the global and local structural information of KGs. Specifically, we let be the test set and be the set of all entities in the dataset. For each test triplet (, , ) , we replace the tail entity by each entity in turn, forming candidate triplets {(, , )}. Some candidate triplets may exist in the dataset (training, validation, or test set), and it is common practice to delete them (except the current test triplet). The model is then used to calculate the dissimilarity of these corrupted triplets and sort them in ascending order. Eventually, the rank of the correct entity is stored. The prediction process for the head entity is the same.
Evaluation Protocol.
We report several standard evaluation metrics: the Mean of those predicted Ranks (MR), the Mean of those predicted Reciprocal Ranks (MRR), and the Hits@N (i.e., the proportion of correct entities ranked in the top N, where
). A lower MR is better while higher MRR and Hits@N are better.Baselines.
We compare the performance of our model LineaRE and that of six stateoftheart models on link prediction tasks. For fairness, all the models except ConvE use the same negative sampling technique (selfadversarial negative sampling proposed by [10]
), and the hyperparameters of different models are selected from the same ranges. Because ConvE is quite different from other models in principle, we extract the experimental results directly from the original paper
[4].Hyperparameter Settings.
The hyperparameters are selected according to the performances on the validation dataset via grid search. We set the ranges of hyperparameters as follows: temperature of sampling {0.5, 1.0}, fixed margin {6, 9, 12, 15, 18, 24, 30}, in softplus {0.75, 1.0, 1.25}, embedding size {125, 250, 500, 1000}, batchsize {512, 1024, 2048}, and number of negative samples {128, 256, 512, 1024}. Optimal configurations for our LineaRE are: =1.0, =1.25, =15, =1000, =2048 and =128 on FB15k; =0.5, =1.25, =6, =500, =1024 and =512 on WN18; =0.5, =1.0, =12, =1000, =2048 and =128 on FB15k237; =0.5, =1.0, =12, =1000, =2048 and =128 on WN18RR.
4.3 Experimental Results
4.3.1 Main Results
The main results on FB15k and WN18 are summarized in Table 4. LineaRE significantly outperforms all these previous stateoftheart models in almost all the metrics except that TransE performs slightly better than LineaRE in the metric MR on FB15k. Table 5 summarizes the results on FB15k237 and WN18RR. No previous model performs better than our LineaRE in any metric. Table 6 summarizes the detailed results by relation category^{1}^{1}1 Following [15], for each relation , we compute the average number of tails per head () and the average number of head per tail (). If and , is treated as onetoone; if and , is treated as a manytomany; if and , is treated as onetomany. If and , is treated as manytoone. on FB15k, which shows that our LineaRE achieves the best performance on complex relations.
4.3.2 Analysis of Results
Then, we analyze the performances of these models with respect to the connectivity patterns and mapping properties in detail (Refer to Table 2, which summarizes the modeling capabilities of these models):
Symmetry (RotatE and TransE).
Among these methods, the difference between RotatE and TransE is only that the former can model symmetric relations and the latter cannot. The performance of RotatE is significantly better than that of TransE, because there are many symmetric relations in all datasets, especially in WN18RR.
Antisymmetry and Inversion (ComplEx and DistMult).
Complex embeddings enable ComplEx to model two more connectivity patterns (antisymmetry and inversion) than DistMult. The former performs better than the latter on all datasets, especially on WN18, which contains a large number of antisymmetric triplets and inverse triplets.
Composition (LineaRE and ComplEx).
Complex not only can model all connectivity patterns except composition but also can model complex mapping properties, which makes it achieve very good performance on all datasets other than on FB15k237 where the main connectivity pattern is composition. Also, DistMult, which cannot model composition pattern, also performs poorly on FB15k237. The difference between our LineaRE and ComplEx is that LineaRE is capable of modeling composition pattern, thus, our model performs better, especially on FB15k237.
Complex mapping properties. (LineaRE and RotatE).
RotatE has a powerful modeling capability for all the above connectivity patterns, which makes it perform well on these datasets. However, it is still inferior to our LineaRE because our LineaRE has the same modeling capability in connectivity patterns as RotatE and moreover, LineaRE can deal with complex mapping properties that RotatE cannot handle. On the relatively more complex dataset FB15k, our LineaRE gains a more prominent advantage. Table 6 shows that DistMult, ComplEx and LineaRE, which are capable of modeling complex mapping properties, perform well on 1toN (predicting tail), Nto1 (predicting head), and NtoN relations, while RotatE and TransE perform worse.
The performance of TransH is worse than expected. We express our conjecture briefly. As we all know that two points determine a straight line. When two relations have many common entities, their hyperplanes will be the same, resulting in many entities being restricted to a straight line. This leads to a poor performance, especially on KGs like WordNet. In the original paper [15], the experimental results of TransH on WN18 are also inferior to those of TransE.
4.3.3 Investigation of Relation Embeddings
To verify our theoretical analysis of the modeling capabilities of LineaRE in Section 3.2, we investigate some relevant relation embeddings (500 dimensions on WN18 and 1000 dimensions on FB15k237). Figure 2(a) counts the angles between the straight lines corresponding to relation in WN18 and the axis. Almost all of the 500 angles are equal to or close to 45° or 135°. Relation and in WN18 are a pair of inverse relations. We first inverse one of them about , and then compute and count the angles between the straight lines of the two relations along the same dimensions. Figure 2(b) shows that most angles are equal to or close to 0° or 180°. In FB15k237, is a composition of and ^{2}^{2}2 represents relation /award/award_nominee/award_nomina tions./award/award_nomination/nominated_for, represents relation /award/award_category/winners./award/award_honor/award _winner, and represents /award/award_category/nominees./awa rd/award_nomination/nominated_for. . We compute the angles between the composite straight lines and the lines of along the same dimensions. Figure 2(c) shows that the composition of and is very similar to . For 1toN relation , , and , Figure 2(d) shows that there are more straight lines close to axis than axis. Besides, the investigation of straight line intercepts is consistent with our theoretical analysis.
5 Conclusion
In this paper, we proposed a novel KG embedding method LineaRE, which models connectivity patterns and mapping properties of relations in linear representation. Extensive experimental results on the task of link prediction show that the LineaRE model significantly outperforms existing stateoftheart models on four widely used datasets. A deep investigation into the relation embeddings further verifies our theoretical analysis of its modeling capabilities. The source code is available at https://github.com/pengyanhui/LineaRE.
References
 [1] (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247–1250. Cited by: §1.
 [2] (2013) Translating embeddings for modeling multirelational data. In Advances in neural information processing systems, pp. 2787–2795. Cited by: Table 1, Table 2, §1, §2, §4.1.
 [3] (2017) Kbgan: adversarial learning for knowledge graph embeddings. arXiv preprint arXiv:1711.04071. Cited by: §3.2.

[4]
(2018)
Convolutional 2d knowledge graph embeddings.
In
ThirtySecond AAAI Conference on Artificial Intelligence
, Cited by: Table 1, §1, §2, §4.1, §4.2.  [5] (2017) An endtoend model for question answering over knowledge base with crossattention combining global knowledge. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 221–231. Cited by: §1.
 [6] (2019) Knowledge graph embedding based question answering. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, pp. 105–113. Cited by: §1.
 [7] (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.2.
 [8] (2015) DBpedia–a largescale, multilingual knowledge base extracted from wikipedia. Semantic Web 6 (2), pp. 167–195. Cited by: §1.
 [9] (1995) WordNet: a lexical database for english. Communications of the ACM 38 (11), pp. 39–41. Cited by: §1.
 [10] (2019) Rotate: knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197. Cited by: Table 1, Table 2, §1, §2, §3.2, §4.1, §4.2.
 [11] (2015) Observed versus latent features for knowledge base and text inference. In Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality, pp. 57–66. Cited by: §4.1.

[12]
(2016)
Complex embeddings for simple link prediction.
In
International Conference on Machine Learning
, pp. 2071–2080. Cited by: Table 1, Table 2, §1, §2.  [13] (2018) Incorporating gan for negative sampling in knowledge representation learning. In ThirtySecond AAAI Conference on Artificial Intelligence, Cited by: §3.2.
 [14] (2017) Knowledge graph embedding: a survey of approaches and applications. IEEE Transactions on Knowledge and Data Engineering 29 (12), pp. 2724–2743. Cited by: §2.
 [15] (2014) Knowledge graph embedding by translating on hyperplanes. In TwentyEighth AAAI conference on artificial intelligence, Cited by: Table 1, Table 2, §2, §4.3.2, footnote 1.
 [16] (2017) Explicit semantic ranking for academic search via knowledge graph embedding. In Proceedings of the 26th international conference on world wide web, pp. 1271–1279. Cited by: §1.
 [17] (2019) Leveraging knowledge bases in lstms for improving machine reading. arXiv preprint arXiv:1902.09091. Cited by: §1.
 [18] (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575. Cited by: Table 1, Table 2, §2.
 [19] (2016) Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 353–362. Cited by: §1.
 [20] (2019) NSCaching: simple and efficient negative sampling for knowledge graph embedding. In 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 614–625. Cited by: §3.2.