1 Introduction
Knowledge graphs (KGs) such as DBpedia, Freebase and Yago encode structured information in the form of a multirelational directed graph in which nodes represent entities and edges represent relations between nodes. In its simplest form, a KG is a collection of triples where and are head and tail entities (nodes), respectively, and is the relation (edge) between. For instance, (Albert Einstein, coauthor, Boris Podolsky) is a triple. Link prediction, entity resolution and linkedbased clustering are among the most common tasks in KG analysis [Nickel et al.2016].
Knowledge Graph Embeddings (KGEs) have become one of the most promising approaches for KG analysis [Wang et al.2017]
. The assumption is that there are global features which explain the existence of triples in a KG and embedding models try to capture those features using (typically low dimensional) vectors known as embeddings. Therefore, a KGE model assigns vectors (
) to the symbolic entities and relations (). The vectors are initialized randomly and updated by solving an optimization problem. To measure the degree of plausibility of a triple (), a scoring function is defined. The function takes the embedding vectors of the triple and returns a value showing plausibility of the triple. KGEs have a wide range of downstream applications such as recommender systems, question answering, sentiment analysis etc.
Several KGE models have been proposed so far. Earlier works such as TransE [Bordes et al.2013], RESCAL [Nickel et al.2012] and EMLP [Socher et al.2013] focus just on existing triples as inputs for the link prediction task (predicting missing relations between entities). Due to the intrinsic incompleteness of KGs, relying only on triples may not deliver the best performance. Recent works such as ComplExNNE+AER and RUGE have invested the usage of background knowledge such as logical rules in order to enhance the performance [Guo et al.2018, Ding et al.2018].
To exploit rules, the inherent incapability of some existing models to encode rules is an obstacle. For instance, different variants of translational approaches such as TransE, FTransE, STransE, TransH and TransR have restrictions in encoding reflexive, symmetric and transitivity relations [Kazemi and Poole2018]. Considering TransE as a concrete example, the main intuition is that one should derive the embedding vector of tail when it is the sum of the embedding vectors of head and relation i.e. . Once is assumed to be symmetric, e.g. “coauthor”, we have which results in . Therefore, TransE cannot capture symmetry and collapses all symmetrical relations into a null vector, resulting the same embedding vectors of all entities (i.e. ).
Due to incompleteness of KGs, even if one adds groundings of rules to a KG, there is still no guarantee that a capable embedding model learns the associated rules. That means, we need to properly inject rules into the learning process of a capable model. This issue has also been highlighted in the recent works [Guo et al.2018, Ding et al.2018], but not been investigated deeply in the literature. Therefore, the capability of a model to support rules as well as how rules are injected, i.e. encoding techniques, are the main challenges. Existing KGE models have solely addressed one of the mentioned challenges. This indeed causes two issues:
a) Solely focusing on encoding techniques and disregarding capability of a model, has the risk that the model is expected to learn a rule which is not capable of. For example, RUGE proposes a general optimization framework to iteratively inject first order logical rules in an embedding model. ComplEx is used as the base model for rule injection. However, ComplEx is not capable of encoding composition pattern [Sun et al.2019]. The similar issue can be found in [Minervini et al.]. minervini2017adversarial use functionfree Horn clause rules to regularize an embedding model by including inconsistency loss. The loss measures degree of violation from assumption on adversarially generated examples. Although their framework nicely encodes Horn rules, they use DistMult to inject rules which is not capable of encoding asymmetric rule. For example, the authors [Minervini et al.] try to inject rule in DistMult. During injection of , is also injected wrongly. Therefore, the model considers many false triples as positive.
b) Solely focusing on capability of a model and disregarding encoding techniques, has the risk that the model does not properly encode rule due to incompleteness of KGs. For example, RotatE [Sun et al.2019] is proven to be capable of encoding inverse, symmetric (asymmetric), and composition rules without providing any rule injection mechanism. The authors [Sun et al.2019] show their model properly encodes the rules. However, the results are obtained by generation of a lot of negative samples (e.g. 1000) together with using a very big embedding dimension (e.g. 1000). Such a big setting requires a very powerful computational infrastructure, adversely limit their applicability. Apart from lack of providing encoding technique, RotatE is not fully expressive i.e., the model is incapable of encoding some rules e.g., reflexive.
In contrast to the previous works, this paper addresses and contributes to the both previously highlighted points, i.e. the capability and the encoding technique, to avoid the mentioned issues. Regarding capability, our first contribution is that we propose a new neural embedding model (LogicENN) which is capable enough to encode rules, i.e. function free clauses with predicates of arity at most 2. Moreover, LogicENN avoids grounding for two logical rules: implication and equivalence relations. As the second contribution, we prove that LogicENN is fully expressive, i.e. for any ground truth of clauses of the above form, there exists a LogicENN model (with embedding vectors) that represents that ground truth. To the best of our knowledge, it is the first time that theoretical proofs are provided for the expressiveness of a neural network based embedding model. This proof indeed reassures us to inject different Horn rules in the model (encoding technique). Regarding the encoding technique, our third contribution is that we additionally derive formulae for enforcing the model to learn different relations including (anti)symmetric, implication, equivalence, inverse, transitive, composition, negation as well as irreflexive. To our knowledge, our model is the first model that can encode these rules as well as provides practical solution for encoding them.
2 Related Works
We investigate the related works in the light of two main issues we mentioned in the previous section, i.e. i) capability of a model to encode rules and, ii) the encoding techniques. Moreover, we briefly review the relevant neural based models and show, in contrast to LogicENN, they are not able to avoid grounding for the implication and equivalence relationships.
Considering the capability, [Kazemi and Poole2018] reports that TransE, FTransE, STransE, TransH and TransR have restrictions in encoding rules. More concretely, TransE is incapable of encoding reflexive, symmetric and transitivity [Yoon et al.2016, Wang et al.2014] and DistMult [Yang et al.2015] cannot capture antisymmetric. The CP decomposition cannot encode both symmetric and antisymmetric relations [Trouillon et al.2017]. ProofPaper also investigate expressiveness of different bilinear models from a ranking perspective of their scoring matrix.
Despite the fact that score function of DistMult and ComplEx are similar, ComplEx can encode symmetric and antisymmetric relations due to the algebraic properties of complex numbers [Trouillon et al.2016]. SimplE [Kazemi and Poole2018] is one the recent embedding model which is proven to be fully expressive. Moreover, conditions for encoding symmetric, antisymmetric and inverse patterns are derived. Although SimplE is fully expressive, for each entity/relation, two vectors should be provided which doubles the space. RotatE [Sun et al.2019] is able to encode symmetric, antisymmetric and inverse and composition patterns. Although RotatE is shown to properly encode the patterns, a lot of negative samples should be generated together with a very big embedding dimension. It is indeed a big limitation when the model is trained on a large scale KG.
Regarding encoding techniques, various approaches are introduced in the literature, which we review the most relevant ones. As a preprocessing step, [Rocktäschel et al.2015]
iteratively infer new facts based on rules till no new facts can be inferred from a KG. Then, they regard both ground atoms and existing rules as the set of new rules to be learned. Accordingly, marginal probability of them are included in the training set and the loss function is minimized. KALE
[Guo et al.2016] uses margin ranking loss over logical formulae as well as triple facts and jointly learnins triples and formulae. In order not to rely on propositionalization for implication, [Demeester et al.2016] proposes a lifted rule injection method. minervini2017regularizing derive formulae for inverse and equivalence rules according to the score functions of TransE, ComplEx and DistMult. The obtained formulae are added to the objective as a regularization terms. Other methods that consider relation paths, which is closely connected to rules, are wellstudied in the literature e.g. [Neelakantan et al.2015, Lin et al.2015a, Guu et al.2015].There are also other ways of encoding rules, e.g. RUGE [Guo et al.2018] presents a generic (modelindependent) framework to inject rules with confidence scores into an embedding model. The rules are encoded as constraints for an optimization problem. One of the main disadvantages of RUGE is that the model needs grounding of all rules. For example, to inject the rule should be replaced by all the entities that the triple exists in the KG. In contrast to RUGE, [Ding et al.2018] follows a model dependent approach for injection of rules. It encodes nonnegativity and entailment as constraints in ComplEx. It is shown [Ding et al.2018] that the model dependent approach of [Ding et al.2018] outperforms the generic approach of [Guo et al.2018] on the FB15k dataset. However, [Ding et al.2018] can only inject implication rule which is a limitation.
As we mentioned, LogicENN, in contrast to other relevant neural based models, avoids grounding for the implication and equivalence relationships. EMLP, ERMLP, NTN, ConvE and ConvKB are among the most successful models in the literature [Socher et al.2013, Dong et al.2014, Socher et al.2013, Dettmers et al.2018, Nguyen et al.2018]. The main common characteristics of all models is that , and are treated as inputs or weights of hidden layers while in LogicENN and are inputs and is the output of the network. Having relations encoded as inputs or hidden layer weights requires that all groundings of the rules be fed into the network. The detailed explanation of why LogicENN is capable of avoiding grounding is properly addressed in Section 3.
To sum up, many models, like translation based models are incapable of encoding some rules. The models which are reported to be capable, can either learn rules using existing triples in a KG or are enforced to learn by properly injecting the rules into their formulation. The former kinds of models have still the risk of not properly learning rules as data in KGs are known to be very incomplete. Therefore injecting rules enhance the learning performance of models. Regarding fully expressiveness (FE), SimplE and RESCAL, HolE etc are FE under some conditions [Wang et al.2018].
3 The LogicENN Approach
In this section we introduce our model and contribute to both the capability and the encoding technique. We first present LogicENN as a neural embedding model which is capable of encoding rules and we prove that it is fully expressive. We then discuss how we can algebraically formulate the rules and inject them into the model. We then present our optimization approach in order to learn rules by LogicENN.
This work considers clauses of the form “”, in which “conclusion” is an atom and “premise” is a conjunction of several atoms. Atoms are triples of type where are variables and “” is a known relation in the KG. We refer to such clauses as rules from now on.
3.1 The Proposed Neural Embedding Model
It is known that using the same embedding space to represent both entities and relations is less competitive compared to considering two separate spaces [Lin et al.2015b]. This motivates to consider a neural network (NN) in which entities and relations are embedded in two different spaces. Another motivation is that the previously reviewed NN approaches encode relations into the input layer or consider them as input weights of a hidden layer which is restrictive for avoiding grounding when one considers implication relationship.
We consider entity pairs as input and relations as output. More precisely, we consider the embeddings of entity pairs, , as input which together with weights are randomly initialized in the beginning. During learning, LogicENN optimizes both weights and embeddings of the entities according to its loss function. The output weights of the network are embeddings of relations and the hidden layer weights are shared between all entities and relations as shown in Figure 1. Despite LogicENN takes embedding pairs of entities as input, it learns the embedding of each individual entity through a unique vector. This is in contrast to some matrix factorization approaches which loose information by binding embedding vectors in form of entityentity or entityrelations [Nickel et al.2016].
We denote the score function of a given triple by , or more compactly by . Without loss of generality, we use a single hidden layer for the NN to show theoretical capabilities of LogicENN and we define its score as:
(1) 
where is the number of nodes in hidden layer, and are input and output weights of the th hidden node respectively.
are the output weights of the network which are actually embedding of relations. That is because in the last layer a linear function acts as the activation function.
is the output of the th hidden node and is feature mapping of the hidden layer of the network which is shared between all relations. are embedding vectors of head and tail respectively and is the embedding dimension. Therefore . Finally, is an activation function and is the inner product.Rule  Definition

Formulation based on score function  Formulation based on NN  Equivalent regularization form
(Denoted as in Equation (2) ) 

Equivalence  
Symmetric  
Asymmetric  NC  
Negation  NC  
Implication  
Inverse  
Reflexivity  NC  
Irreflexive  NC  
Transitivity  
Composition 
Due to having shared hidden layers in the design, LogicENN is efficient in space complexity. The space complexity of the proposed model is where are number of entities and relations respectively.
3.2 Capability of the Proposed Network
As mentioned in the section 1, if a model is not fully expressive, it might be wrongly expected to learn a rule which is incapable of. Therefore, investigation of the theories corresponding to the expressiveness of an embedding model is indeed important. Accordingly, we now prove that LogicENN is fully expressive i.e., capable of representing every ground truth over entities and relations in a KG.
Let be the set of all possible neural networks with hidden nodes as defined by (1). Therefore, the set of all possible networks with arbitrary number of hidden nodes will be . Let denote set of continuous functions over . Let be the set of entities and is an entity with embedding vector . We also assume that is a compact set. We have the following theorem.
Theorem 1.
Let be the set of all possible networks defined as above, be dense in where is arbitrary embedding dimension. Given any ground truth in a KG with true facts, there exists a LogicENN in with embedding dimension , that can represent the ground truth. The same holds when is dense in where is the Cartesian product of two compact sets.
The theorem proof as well as more detailed technical discussion are included in the supplementary materials of the paper.
3.3 Formulating Rules
Let be two grounded atoms of a clause as defined in the beginning of Section 3, and let the truth values of and are denoted by and respectively. To model the truth values of negation, conjunction, disjunction and implication of and we define , , as in [Guo et al.2018] but we define . These can be used to derive formulation of rules based both on score function as well as the NN, as shown in Table 6.
As an example, consider the implication rule . Using , we can infer . By (1), we get . Provided that our activation function is positive, i.e. , we will have . That latter formula is independent of and which means we do not need any grounding for implication. The same procedure shows that we can avoid grounding for equivalence. However for other rules this is not possible.
Using truth values defined as above, we can derive a formulation of a rule based on score function (e.g. for implication) in the 3rd column of Table 6, and its equivalent formulation based on the NN of (1) in the 4th column. Assume indicates True and indicates False. We now state the necessity and sufficiency conditions for LogicENN to infer various rules. For the proof we can do similar procedure to the one we did for implication. The detailed proof is provided in the supplementary materials of the paper.
Theorem 2.
Since KGs may contain wrong data or facts with less truth confidence [Ding et al.2018], the assumption of in Theorem 4 is too rigid in practice. Therefore, as shown in Table 6 we consider to be a slack variable that allows enough flexibility to deal with uncertainty in KG. This allows us to infer uncertainty as through a validation step of learning. Although considering as slack variables improves flexibility, due to grounding we will have too many slack variables. Therefore, in the implementation level we decided to consider one slack variable for each relation type e.g. one was used for all the equivalence relations (see the last column of Table 6). This enables the model to mitigate the negative effect of uncertainty of rules by considering average uncertainty per rule type. Experimental results show the effectiveness of inclusion of a slack variable per a rule type. During experiments, we obtained the hyperparameters corresponding to each rule type sequentially through validation step. Therefore, instead of having combinations for hyperparameter search corresponding to the rules injection, we have combinations where refers to the number of candidates for the slack variable of the th rule type. Experimental results confirms that this approach gets satisfactory performance as well as significant reduction in the search space.
3.4 Rule Injection and Optimization
To inject rules into embeddings and weights of (1), we define the following optimization:
(2)  
subject to 
where is the set of all positive or negative samples, is set to 1 for positive samples. For negative samples if we get big scores, then the model should suppress it by enforcing a big value for using formula No.5 in [Sun et al.2019]. refers to the th rule, is the number of groundings, is a regularization term and represents the label of which is 1 for positive and 1 for negative samples.
In (2), we use negative loglikelihood loss with regularizations over logical rules. Loss as the first term focuses on learning facts in KG while the second term, i.e. regularization, injects rules into the learning process. The regularization are provided as penalties in the last column of Table 6.
4 Experiments and Discussions
FB15k – Raw  FB15k – Filtered  WN18 – Raw  WN18 – Filtered  
MR  Hits@10  MRR  FMR  FHits@10  FMRR  MR  Hits@10  MRR  FMR  FHits@10  FMRR  
TransE [Bordes et al.2013]  201  43.4  18.4  70  61.8  30.7  263  75.4    251  89.2   
DistMult [Yang et al.2015]  279  50.0  25.5  120.4  84.2  70.5        655  94.6  79.7 
ComplEx [Trouillon et al.2016]  266  48.5  23.0  106  82.6  67.5  573  82.1  58.7  543  94.7  94.1 
ANALOGY [Liu et al.2017]  279  50.5  26.0  121  84.3  72.2      65.7    94.7  94.2 
ConvE [Dettmers et al.2018]  191  52.5  27.2  51  85.1  68.9        504  95.5  94.2 
SimplE [Kazemi and Poole2018]      24.2    83.8  72.7      58.8    94.7  94.2 
RotatE [Sun et al.2019]  162  57.5  31.0  74  80.6  61.8  636  84.2  66.2  627  94.6  93.0 
QuatE [Zhang et al.2019]  182  52.6  27.0  37  79.1  56.1  402  81.9  58.0  386  95.7  92.8 
PTransE [Lin et al.2015a]  207  51.4    58  84.2               
KALE [Guo et al.2016]  225  47.5  21.3  73  76.2  52.3  252  83.3  39.5  241  94.4  53.2 
RUGE [Guo et al.2018]  203  55.3  28.5  97  86.5  76.8             
ComplExNNE+AER [Ding et al.2018]  193  57.3  29.3  116  87.4  80.3  481  83.5  61.9  450  94.8  94.3 
LogicENN (our work)  175  66.9  40.2  112  87.4  76.6  368  84.2  66.3  357  94.8  92.3 
To show the capability of LogicENN, we evaluated it on the link prediction task. The task is to complete a triple when or missing, i.e. to predict given or given . For evaluation, we will use Mean Rank (MR), Mean Reciprocal Rank (MRR) and Hit@10 in the raw and filtered settings as reported in [Wang et al.2017, Lin et al.2015b].
Datasets.
We used FB15k and WN18 with the settings reported in [Bordes et al.2013]. We used the rules reported in [Guo et al.2018] for FB15k, and the rules in [Guo et al.2016] for WN18. The confidence levels of rules are supposed to be no less than 0.8. Totally we used 454 rules for FB15k and 14 rules for WN18. As both data sets are reported [Dettmers et al.2018] to have the inverse of triples in their test sets, it is argued that the increase of performance of some models might be due to the fact that models have learned more from the inverses rather the graph itself. Using these data sets to compare just the rulebased models would be fine as all are using rules, e.g. experiments of RUGE [Guo et al.2018]. However when one wants to compare rulebased with nonrulebased models, it would be better to use a data set which has not much inverses. As FB15k237 has already addressed this, we used it to compare LogicENN with other nonrulebased models.
To formulate rules, we categorized them according to their definitions in Table 6. We did grounding for all rules except for those denoted by NC, as well as equivalence and implication since LogicENN does not need them by formulation (see Sec. 3). To maximize the utility of inferencing, like RUGE, we take as valid groundings whose premise triples are observed in the training set, while conclusion triples are not.
MR  Hit@10  MRR  
ReLU  ReLU  ReLU  
With No Rule  320  430  42.2  36.6  18.1  15.2 
Inverse  187  180  62.1  59.3  37.7  35.2 
Implication  321  421  40.5  37.1  18.2  16.4 
Symmetry  299  387  42.2  37.9  18.5  17.2 
Equivalence  302  330  41.7  38.4  19  17.7 
Composition  303  391  41  37.1  18.1  16 
Raw  Filtered  

MR  Hits@10  MR  Hits@10  
ComplEx  620  25.4  457  45.7 
ComplExN3  553  29.7  421  50.0 
ConvE  489  28.4  246  49.1 
ASRCOMPLEX  570  26.3  420  46.1 
RotatE  374  33.3  258  47.1 
QuatE  354  32.2  161  48.3 
LogicENN  454  34.7  424  47.3 
Experimental Setup.
To select the structure of the model, we tried different settings for the number of neurons/layers and types of activation functions. Two of the best settings were LogicENN
and LogicENN which both had 3 hidden layers with 1k, 2k and 200 neurons respectively. The 4th layer was the output layer where the number of neurons were equal to the number of relationships in the datasets.In LogicENN we used ReLU on all hidden layers and in LogicENN we used Sigmoid as activation functions between layers and ReLU on the last hidden layer. Both of LogicENN and LogicENN were combined with each of rules we used. When models were armed with all existing rules for a dataset, we denote them by LogicENN and LogicENN respectively. Moreover, LogicENN denotes our approach when we have added the reverse of triples in the target data set, as also done in [Lacroix et al.2018].
We implemented the models in PyTorch and used the Adam optimizer for training. We select the optimal hyperparameters of our models by early validation stopping according to MRR on the validation set. We restricted the iterations to 2000. For basic models of LogicENN
and LogicENN which integrate no rules, we created 100 minibatches on each dataset. We tuned the embedding dimensionality in {}, the learning rate in {} and the ratio of negatives over positive training samples in {}. The optimal configuration for both LogicENN and LogicENN are: = 200, = 0.001, = 8 on FB15k; and = 200, = 0.001, = 5 on WN18. Based on LogicENN and LogicENN with their optimal configurations, we further tuned the regularization coefficient in {} and slack variables in {} for different types of rules (see Table 6) to obtain all optimal hyperparameters of LogicENN and LogicENN which integrate all rules in datasets. For LogicENN, we find the following hyperparameters are optimal: = 0.05, = 1, = 0.5, = 5, = 0.1, = 3 on FB15k; = 0.01, = 0.1 on WN18. The optimal configurations of LogicENN on FB15k are : = 0.05, = 0.5, = 0.1, = 3, = 0.1, = 3. For WN18 they are = 0.005, = 0.1.Results.
Table 2 shows comparison of LogicENN with eight stateoftheart embedding models as basic models which only use observed triples in KG and rely on no rules. We also took PTransE, KALE, RUGE, ComplExNNE+AER as additional baselines. They encode relation paths or logical rules like LogicENN. Among them, the first two are extension of TransE, while the rest are extensions of ComplEx.
To compare both raw and filtered results of LogicENN and baselines, we take the results of the first five baseline models on FB15k reported by [Akrami et al.2018] and use the code provided by [Guo et al.2016, Guo et al.2018, Ding et al.2018] for KALE, ComplEx, RUGE and ComplExNNE+AER to produce the raw results with the optimal configurations reported in the original papers. We also ran the code of RotatE to get its results. Because RotatE [Sun et al.2019] uses Complex vectors, we set its embedding dimension to 130 (260 adjustable parameters per each entity) and generate 10 negative samples to have a fair comparison to our method. Results of QuatE [Zhang et al.2019] are obtained by running their code with embedding dimension 60 and 10 negative samples without using type constraint to have a fair comparison to our method. We set the embedding dimension to 60 (240 adjustable parameters) because QuatE provides 4 vectors for each entity. Other results in Table 2 are taken from the original papers.
As we previously discussed, FB15k and WN18 have the inverse of triples in their test sets. To show the performance of LogicENN in comparison of nonrulebased methods we ran experiments on FB15k237 which is reported to have not much reverse of triples. Table 4 shows the comparison of our method with other nonrulebased models in this regard.
Discussion of Results.
As shown in Table 2, LogicENN outperformed all embedding models on FB15k in the raw setting using MR, Hit@10 and MRR. For the filtered setting it also performs better considering FHit@10 and very close to RUGE (the 2nd best performing model) using FMRR. Considering WN18, our model got the best performance in RawHit@10 and RawMRR. In the terms of FHit@10, only ConvE and QuatE outperformed our model.
To investigate whether inclusion of logical rules improve the performance of our model, we added each rule to the naked LogicENN separately. Table 3 shows the improvements by adding each rules separately. As shown, inclusion of each rule improves the performance of the naked model. For FB15k, the best improvement is obtained by the inverse rule which is the most common rule in FB15k. Two variants of model i.e. LogicENN and LogicENN performed better when rules added.
The two most recent methods of RUGE and ComplExNNE+AER use ComplEx score function to encode rules. As Table 2 shows, the performance of ComplEx on RawHit@10 was 48.5% which encoding of rules by RUGE and ComplExNNE+AER improved it by less than 10% (to 55.5% and 57.3% respectively). In contrast, our method without any ruleencoding performed around 40% (Table 3) which jumped to around 67% (Table 2) when rules were encoded. It shows our method improved around 27% which is more than double of its competitors. Therefore we can conclude that the encoding techniques of Table 6 can properly encode rules.
Figure 2 shows that the model has properly learned the equivalence and implication relations. To better comprehend that, in Section 3.3 we already explained that we can avoid grounding for the implication rule and the resulting formula was and we had . Similar argument implies that . Therefore, if the model has properly learned implication (equivalence) the differences of the embedding vectors of these two relations should contain negative (zero) elements. In Figure 2, the xaxis represents the means of the elements of the two
s and the yaxis represents their variances. As depicted, the points associated to the equivalence relations are accumulated around the origin and the points associated to the implication relations are negative. This shows that LogicENN has properly encoded these rules without using grounding for 30 implication and 68 equivalence relations in FB15k.
As Table 4 show, LogicENN outperforms all stateoftheart in the terms of Raw Hits@10. We should note that the originally reported result of ComplExN3 used =2k as the embedding dimension [Lacroix et al.2018]. To have a fair and equal comparison, we reran their code with the same setting we used for all of our experiments, i.e. we used embedding dimension of 200.
5 Conclusion and Future Work
In this work we introduced a new neural embedding model (LogicENN) which is able to encode different rules. We proved that LogicENN is fully expressive and we derived algebraic formulae to enforce it to learn different relations. We also showed how rules can be properly injected into learning.
Our extensive experiments on different benchmarks show that LogicENN outperformed all embedding models on FB15k in the raw and performed very well in the filtered setting. For WN18, the model performed better than almost all others in the raw and very close to the best models in the filtered settings. On FB15k237 the model was better than nonrulebased models on raw Hit@10.
The expressiveness of other kinds of neural models as well as the necessity and sufficiency conditions for injecting rules, are targets of future work.
References
 [Akrami et al.2018] F. Akrami, L. Guo, W. Hu, and C. Li. Reevaluating embeddingbased knowledge graph completion methods. In ACMCIKM, 2018.
 [Bordes et al.2013] A. Bordes, N. Usunier, A. GarciaDuran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multirelational data. In NIPS, 2013.
 [Demeester et al.2016] T. Demeester, T. Rocktäschel, and S. Riedel. Lifted rule injection for relation embeddings. arXiv:1606.08359, 2016.
 [Dettmers et al.2018] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. Convolutional 2d knowledge graph embeddings. In AAAI, 2018.
 [Ding et al.2018] B. Ding, Q. Wang, B. Wang, and L. Guo. Improving knowledge graph embedding using simple constraints. In ACL, 2018.
 [Dong et al.2014] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, Ni Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A webscale approach to probabilistic knowledge fusion. In ACMSIGKDD, 2014.
 [Guan et al.2018] S. Guan, X. Jin, Y. Wang, and X. Cheng. Shared embedding based neural networks for knowledge graph completion. In 27th ACMCIKM, 2018.
 [Guo et al.2016] S. Guo, Q. Wang, L. Wang, B. Wang, and Li Guo. Jointly embedding knowledge graphs and logical rules. In EMNLP, 2016.
 [Guo et al.2018] S. Guo, Q. Wang, L. Wang, B. Wang, and Li Guo. Knowledge graph embedding with iterative guidance from soft rules. In AAAI, 2018.
 [Guu et al.2015] K. Guu, J. Miller, and P. Liang. Traversing knowledge graphs in vector space. In EMNLP, 2015.
 [Han et al.2018] X. Han, C. Zhang, T. Sun, Y. Ji, and Z. Hu. A triplebranch neural network for knowledge graph embedding. IEEE Access, 6, 2018.
 [Huang et al.2000] G.B Huang, Y.Q Chen, and H.A Babri. Classification ability of single hidden layer feedforward neural networks. IEEE TNN, 11(3), 2000.
 [Kazemi and Poole2018] S.M Kazemi and D. Poole. Simple embedding for link prediction in knowledge graphs. arXiv:1802.04868, 2018.
 [Kuttler2011] Kenneth Kuttler. Multivariable calculus, applications and theory. 2011.
 [Lacroix et al.2018] T. Lacroix, N. Usunier, and G. Obozinski. Canonical tensor decomposition for knowledge base completion. arXiv:1806.07297, 2018.
 [Lin et al.2015a] Y. Lin, Z. Liu, H. Luan, M. Sun, S. Rao, and S. Liu. Modeling relation paths for representation learning of knowledge bases. In EMNLP, 2015.
 [Lin et al.2015b] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, volume 15, 2015.
 [Liu et al.2017] H. Liu, Y. Wu, and Y. Yang. Analogical inference for multirelational embeddings. In ICML, 2017.
 [Minervini et al.] P. Minervini, T. Demeester, T. Rocktäschel, and S. Riedel. Adversarial sets for regularising neural link predictors. arXiv:1707.07596.
 [Minervini et al.2017] P. Minervini, L. Costabello, E. Muñoz, V. Nováček, and P.Y Vandenbussche. Regularizing knowledge graph embeddings via equivalence and inversion axioms. In ECML PKDD, 2017.
 [Neelakantan et al.2015] A. Neelakantan, B. Roth, and A. McCallum. Compositional vector space models for knowledge base completion. arXiv:1504.06662, 2015.

[Nguyen et al.2018]
D.Q Nguyen, T.D Nguyen, D.Q Nguyen, and D. Phung.
A novel embedding model for knowledge base completion based on convolutional neural network.
In NAACLHLT, 2018.  [Nickel et al.2012] M. Nickel, V. Tresp, and H.P. Kriegel. Factorizing YAGO: scalable machine learning for linked data. In 21st conf. on World Wide Web. ACM, 2012.
 [Nickel et al.2016] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge graphs. Proc. of IEEE, 104(1), 2016.
 [Rocktäschel et al.2015] T. Rocktäschel, S. Singh, and S. Riedel. Injecting logical background knowledge into embeddings for relation extraction. In NAACLHLT, 2015.
 [Shi and Weninger2017] B. Shi and T. Weninger. Proje: Embedding projection for knowledge graph completion. In AAAI, volume 17, 2017.

[Socher et al.2013]
R. Socher, D. Chen, C.D Manning, and Andrew Ng.
Reasoning with neural tensor networks for knowledge base completion.
In NIPS, 2013.  [Sun et al.2019] Z. Sun, Z. Deng, J. Nie, and J. Tang. Factorizing yago: scalable machine learning for linked data. In ICLR, 2019.
 [Trouillon et al.2016] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embeddings for simple link prediction. In ICML, 2016.
 [Trouillon et al.2017] T. Trouillon, C. Dance, É. Gaussier, J. Welbl, S. Riedel, and et al. Knowledge graph completion via complex tensor factorization. JMLR, 18(1), 2017.

[Wang et al.2014]
Z. Wang, J. Zhang, J. Feng, and Z. Chen.
Knowledge graph embedding by translating on hyperplanes.
In AAAI, volume 14, 2014.  [Wang et al.2017] Q. Wang, Z. Mao, B. Wang, and Li Guo. Knowledge graph embedding: A survey of approaches and applications. IEEETKDE, 29(12), 2017.
 [Wang et al.2018] Y. Wang, R. Gemulla, and H. Li. On multirelational link prediction with bilinear models. In AAAI, 2018.
 [Yang et al.2015] B. Yang, W.t Yih, X. He, J. Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. In ICLR, 2015.
 [Yoon et al.2016] H.G Yoon, H.J Song, S.B Park, and S.Y Park. A translationbased knowledge graph embedding preserving logical property of relations. In NAACLHLT, 2016.
 [Zhang et al.2019] Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embedding. arXiv preprint arXiv:1904.10281, 2019.
6 Supplementary Materials for the Paper: LogicENN
This section contains supplementary materials for our paper called: “LogicENN: A Neural Based Knowledge Graphs Embedding Model with Logical Rules”
In Section 6.1, we first review the relevant neural based models and describe their similarities and differences to LogicENN. Then in Section 6.2, we will provide our proposed theorems with their detailed proofs.
6.1 LogicENN vs Stateoftheart Neural Based Models
This section describes the relevant neural network based embedding models. We divided them to models which do not consider logical rules into their embeddings and those which consider rules. We then compare their score functions with LogicENN and discuss how it is different from other stateoftheart models.
Before progressing more, we first define the relevant notations. Vectors of are denoted by bold noncapital letters (e.g. or as vectors of zeros and ones respectively), matrices by bold capital letters and tensors by underlined bold capital letters. are weight matrices which have no dependency on , or , while () means that the weight matrix (tensor) is associated to relation . Moreover, and are 2Dreshape of , and respectively.
6.1.1 Neural Network Based Embedding Models
EMlp.
which is standard Multi Layer Perceptron (MLP) for KGE. EMLP uses one neural network per each relations in the KG which has high space complexity.
ErMlp.
[Dong et al.2014] In contrast to EMLP which uses one neural network per each relation, ERMLP shares its weights between all entities and relations. The relation embedding is placed as input of the network.
Ntn.
[Socher et al.2013] employs 3way tensor in the hidden layer to better capture interactions between features of two entities. The score function of NTN is as follows:
(3) 
where are the 3way relation specific tensor and bias of hidden layer respectively.
ConvE.
[Dettmers et al.2018] is a multilayer convolutional network for link prediction. The score function of ConvE is as follows:
(4) 
where is filter and
is a linear transformation matrix.
is an activation function.ConvKB.
[Nguyen et al.2018] is a multilayer convolutional network for link prediction with the following score function:
(5) 
Method  Score Function ( 

EMLP  
ERMLP  
NTN  
ConvE  
ConvKB 
Senn.
[Guan et al.2018] defines three multilayer neural networks with Relu activation function for head, relation and tail prediction. Then, it integrates them into one loss function to train the model.
Tbnn.
[Han et al.2018] is a triple branch neural network in which parallel branched layers are defined on the top of an interaction layer where each embedding of any element of a KG is specified by its multi restriction. The loss function is defined based on the score of three elements of each triple.
ProjE.
[Shi and Weninger2017] is a two layer neural network. The first layer is a combination layer which works on tail and relation and the second layer is a projection layer which projects the obtained vector from the last layer to the candidateentity matrix. The candidateentity matrix is a subset of entity matrix where entities can be sampled in different ways.
In short, Table 5 specifies the score functions of different neural based embedding models.
6.1.2 KG Embedding Models with Logical Rules
Ruge.
provides a general framework to iteratively inject logical rules in KGE. Given a set of soft logical rules where is a rule and is its confidence value, rules are represented as mathematical constraints to obtain soft labels for unlabeled triples. Then, an optimization problem is solved to update embedding vectors based on hard and soft labeled triples. The framework is used to train ComplEx model as case study.
ComplExNNE+AER.
[Ding et al.2018], which is a model dependent approach, derives formula for entailment rule to avoid grounding in ComplEx. The model outperforms RUGE on FB15K in the terms of Meanrank and Hit@k.
6.1.3 Comparison of LogicENN with Other Models
LogicENN uses the scoring function (1). We formulate the score function to separate the entity and relation spaces. This enables the model to map entities by a universal hidden layer mapping . Since we prove that is universal, we can share it between all relations. Since is used by several relations, we have fewer parameters. Several neural embedding models such as NTN and EMLP didn’t share parameters of hidden layer. Therefore, for each relation, a separate neural network is used. ERMLP feeds entity pairs as well as relation to the neural network. Inclusion of relation in the hidden layer disables the model to avoid grounding for implication rule. The same problem happens for ConvE and ConvKB. Moreover, full expressiveness of ConvE and ConvKB is not investigated yet.
Regarding encoding techniques, we derive formula for the proposed NN to encode function free Horn clause rules. For implication and equivalence rules, we approximate the original formula by avoiding grounding. Since we proved that our model is fully expressive, we can encode all Horn clause rules.
Regarding the last column of the Table 1, we add slack variables to better handle uncertainty during injection of rules in the embeddings. The uncertainty is inherited from the fact that KG contain False positive triples.
6.2 Theorems and Proofs
In this section we state the theorems which prove the full expressiveness of our proposed model LogicENN.
Theorem 3.
Let be the set of all possible networks defined as above, be dense in where is arbitrary embedding dimension. Given any ground truth in a KG with true facts, there exists a LogicENN in with embedding dimension , that can represent the ground truth. The same holds when is dense in where is the Cartesian product of two compact sets.
Proof.
Regarding the assumption of the theorem, is a compact set. is a compact set, since the Cartesian product of two compact set is also compact [Kuttler2011]. Regarding lemma 2.1 in [Huang et al.2000], given disjoint regions , there exists at least one continuous function such that when . are arbitrary distinct constant values. Therefore, dealing with ground truth, , there exists a continuous function that represents the ground truth. Because is dense in or , there exists at least one neural network in that approximates the function . As a conclusion, there exists a LogicENN that can represent the ground truth. ∎
Remark: The density assumption of in Theorem 3 depends on the activation function of Equation (1) of the paper. When the activation function is continuous, bounded, and nonconstant, then is dense in for every compact set . When it is unbounded and nonconstant, then the set is dense in for all finite measure . In this case, the compactness condition can be removed. For nonpolynomial activation functions which are locally essentially bounded, the set is dense in .
Rule  Definition

Formulation based on score function  Formulation based on NN  Equivalent regularization form
(Denoted as in Equation (2) ) 

Equivalence  
Symmetric  
Asymmetric  NC  
Negation  NC  
Implication  
Inverse  
Reflexivity  NC  
Irreflexive  NC  
Transitivity  
Composition 
Theorem 4.
Proof for the Equivalence Relation..
Based on the theorem statement, we want to show that LogicENN can infer equivalence rule if and only if
If r is an equivalence relation, we have:
Without loss of generality, let show scores of NN for positive and negative triples respectively. To be equivalence, both triples should be true or false simultaneously. Therefore, if then and if then . We conclude that .
From Equation (1) in the paper, we have Therefore, we have
∎
Proof for the Symmetric Relation..
Based on the theorem statement, we want to show that LogicENN can infer symmetric rule if and only if .
If r is an Symmetric relation, we have
To be Symmetric relation, both triples should be true or false simultaneously. Therefore, if then or if then . We conclude .
From Equation (1) in the paper, we have
Therefore, we have:
We conclude that:
∎
Proof for the Implication Relation..
Based on the theorem statement, we want to show that LogicENN can infer implication rule if and only if .
If r is implication rule, we have:
To satisfy the implication rule, if then or if then or . We conclude
From Equation (1) in the paper, we have:
We conclude that:
∎
Proof for the Transitivity Relation..
To prove transitivity, we can use the truth table of the rule. Considering Equation (1), we already assume that denotes True and denotes False.
In the following conditions, the rule is True:
If is True, is True then is True.
If is False, is True, then is True.
If is True, is False, then is True.
If is False, is False, then is True.
Otherwise, the rule is False.
The constraint follows the truth table.
∎
The proof for relations Asymmetric, Negation, Inverse, Reflexive, Irreflexive and Composition are similarly done.
References
 [Akrami et al.2018] F. Akrami, L. Guo, W. Hu, and C. Li. Reevaluating embeddingbased knowledge graph completion methods. In ACMCIKM, 2018.
 [Bordes et al.2013] A. Bordes, N. Usunier, A. GarciaDuran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multirelational data. In NIPS, 2013.
 [Demeester et al.2016] T. Demeester, T. Rocktäschel, and S. Riedel. Lifted rule injection for relation embeddings. arXiv:1606.08359, 2016.
 [Dettmers et al.2018] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel. Convolutional 2d knowledge graph embeddings. In AAAI, 2018.
 [Ding et al.2018] B. Ding, Q. Wang, B. Wang, and L. Guo. Improving knowledge graph embedding using simple constraints. In ACL, 2018.
 [Dong et al.2014] X. Dong, E. Gabrilovich, G. Heitz, W. Horn, Ni Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A webscale approach to probabilistic knowledge fusion. In ACMSIGKDD, 2014.
 [Guan et al.2018] S. Guan, X. Jin, Y. Wang, and X. Cheng. Shared embedding based neural networks for knowledge graph completion. In 27th ACMCIKM, 2018.
 [Guo et al.2016] S. Guo, Q. Wang, L. Wang, B. Wang, and Li Guo. Jointly embedding knowledge graphs and logical rules. In EMNLP, 2016.
 [Guo et al.2018] S. Guo, Q. Wang, L. Wang, B. Wang, and Li Guo. Knowledge graph embedding with iterative guidance from soft rules. In AAAI, 2018.
 [Guu et al.2015] K. Guu, J. Miller, and P. Liang. Traversing knowledge graphs in vector space. In EMNLP, 2015.
 [Han et al.2018] X. Han, C. Zhang, T. Sun, Y. Ji, and Z. Hu. A triplebranch neural network for knowledge graph embedding. IEEE Access, 6, 2018.
 [Huang et al.2000] G.B Huang, Y.Q Chen, and H.A Babri. Classification ability of single hidden layer feedforward neural networks. IEEE TNN, 11(3), 2000.
 [Kazemi and Poole2018] S.M Kazemi and D. Poole. Simple embedding for link prediction in knowledge graphs. arXiv:1802.04868, 2018.
 [Kuttler2011] Kenneth Kuttler. Multivariable calculus, applications and theory. 2011.
 [Lacroix et al.2018] T. Lacroix, N. Usunier, and G. Obozinski. Canonical tensor decomposition for knowledge base completion. arXiv:1806.07297, 2018.
 [Lin et al.2015a] Y. Lin, Z. Liu, H. Luan, M. Sun, S. Rao, and S. Liu. Modeling relation paths for representation learning of knowledge bases. In EMNLP, 2015.
 [Lin et al.2015b] Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, volume 15, 2015.
 [Liu et al.2017] H. Liu, Y. Wu, and Y. Yang. Analogical inference for multirelational embeddings. In ICML, 2017.
 [Minervini et al.] P. Minervini, T. Demeester, T. Rocktäschel, and S. Riedel. Adversarial sets for regularising neural link predictors. arXiv:1707.07596.
 [Minervini et al.2017] P. Minervini, L. Costabello, E. Muñoz, V. Nováček, and P.Y Vandenbussche. Regularizing knowledge graph embeddings via equivalence and inversion axioms. In ECML PKDD, 2017.
 [Neelakantan et al.2015] A. Neelakantan, B. Roth, and A. McCallum. Compositional vector space models for knowledge base completion. arXiv:1504.06662, 2015.
 [Nguyen et al.2018] D.Q Nguyen, T.D Nguyen, D.Q Nguyen, and D. Phung. A novel embedding model for knowledge base completion based on convolutional neural network. In NAACLHLT, 2018.
 [Nickel et al.2012] M. Nickel, V. Tresp, and H.P. Kriegel. Factorizing YAGO: scalable machine learning for linked data. In 21st conf. on World Wide Web. ACM, 2012.
 [Nickel et al.2016] M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. A review of relational machine learning for knowledge graphs. Proc. of IEEE, 104(1), 2016.
 [Rocktäschel et al.2015] T. Rocktäschel, S. Singh, and S. Riedel. Injecting logical background knowledge into embeddings for relation extraction. In NAACLHLT, 2015.
 [Shi and Weninger2017] B. Shi and T. Weninger. Proje: Embedding projection for knowledge graph completion. In AAAI, volume 17, 2017.
 [Socher et al.2013] R. Socher, D. Chen, C.D Manning, and Andrew Ng. Reasoning with neural tensor networks for knowledge base completion. In NIPS, 2013.
 [Sun et al.2019] Z. Sun, Z. Deng, J. Nie, and J. Tang. Factorizing yago: scalable machine learning for linked data. In ICLR, 2019.
 [Trouillon et al.2016] T. Trouillon, J. Welbl, S. Riedel, É. Gaussier, and G. Bouchard. Complex embeddings for simple link prediction. In ICML, 2016.
 [Trouillon et al.2017] T. Trouillon, C. Dance, É. Gaussier, J. Welbl, S. Riedel, and et al. Knowledge graph completion via complex tensor factorization. JMLR, 18(1), 2017.
 [Wang et al.2014] Z. Wang, J. Zhang, J. Feng, and Z. Chen. Knowledge graph embedding by translating on hyperplanes. In AAAI, volume 14, 2014.
 [Wang et al.2017] Q. Wang, Z. Mao, B. Wang, and Li Guo. Knowledge graph embedding: A survey of approaches and applications. IEEETKDE, 29(12), 2017.
 [Wang et al.2018] Y. Wang, R. Gemulla, and H. Li. On multirelational link prediction with bilinear models. In AAAI, 2018.
 [Yang et al.2015] B. Yang, W.t Yih, X. He, J. Gao, and Li Deng. Embedding entities and relations for learning and inference in knowledge bases. In ICLR, 2015.
 [Yoon et al.2016] H.G Yoon, H.J Song, S.B Park, and S.Y Park. A translationbased knowledge graph embedding preserving logical property of relations. In NAACLHLT, 2016.
 [Zhang et al.2019] Shuai Zhang, Yi Tay, Lina Yao, and Qi Liu. Quaternion knowledge graph embedding. arXiv preprint arXiv:1904.10281, 2019.
Comments
There are no comments yet.