InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

by   Shikhar Vashishth, et al.

Most existing knowledge graphs suffer from incompleteness, which can be alleviated by inferring missing links based on known facts. One popular way to accomplish this is to generate low-dimensional embeddings of entities and relations, and use these to make inferences. ConvE, a recently proposed approach, applies convolutional filters on 2D reshapings of entity and relation embeddings in order to capture rich interactions between their components. However, the number of interactions that ConvE can capture is limited. In this paper, we analyze how increasing the number of these interactions affects link prediction performance, and utilize our observations to propose InteractE. InteractE is based on three key ideas – feature permutation, a novel feature reshaping, and circular convolution. Through extensive experiments, we find that InteractE outperforms state-of-the-art convolutional link prediction baselines on FB15k-237. Further, InteractE achieves an MRR score that is 9 7.5 respectively. The results validate our central hypothesis – that increasing feature interaction is beneficial to link prediction performance. We make the source code of InteractE available to encourage reproducible research.


page 2

page 6


Hypernetwork Knowledge Graph Embeddings

Knowledge graphs are large graph-structured databases of facts, which ty...

KGRefiner: Knowledge Graph Refinement for Improving Accuracy of Translational Link Prediction Methods

Link prediction is the task of predicting missing relations between enti...

Disconnected Emerging Knowledge Graph Oriented Inductive Link Prediction

Inductive link prediction (ILP) is to predict links for unseen entities ...

Self-attention Presents Low-dimensional Knowledge Graph Embeddings for Link Prediction

Recently, link prediction problem, also known as knowledge graph complet...

CHERRY: a Computational metHod for accuratE pRediction of virus-pRokarYotic interactions using a graph encoder-decoder model

Prokaryotic viruses, which infect bacteria and archaea, are key players ...

Interaction Embeddings for Prediction and Explanation in Knowledge Graphs

Knowledge graph embedding aims to learn distributed representations for ...

Convolutional Complex Knowledge Graph Embeddings

In this paper, we study the problem of learning continuous vector repres...

Code Repositories


AAAI 2020 - InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

view repo

1 Introduction

Knowledge graphs (KGs) are structured representations of facts, where nodes represent entities and edges represent relationships between them. This can be represented as a collection of triples , each representing a relation between a "subject-entity" and an "object-entity" . Some real-world knowledge graphs include Freebase [freebase], WordNet [wordnet], YAGO [yago], and NELL [nell]. KGs find application in a variety of tasks, such as relation extraction [distant_supervision2009], question answering [qa_kg_1, qa_kg_2], recommender systems [kb-recommender] and dialog systems [kg_in_dialog].

However, most existing KGs are incomplete [kg_incomp1]. The task of link prediction

alleviates this drawback by inferring missing facts based on the known facts in a KG. A popular approach for solving this problem involves learning a low-dimensional representation for all entities and relations and utilizing them to predict new facts. In general, most existing link prediction methods learn to embed KGs by optimizing a score function which assigns higher scores to true facts than invalid ones. These score functions can be classified as

translation distance based [transe, transg, transh] or semantic matching based [hole, analogy].

Figure 1: Overview of InteractE. Given entity and relation embeddings ( and respectively), InteractE generates multiple permutations of these embeddings and reshapes them using a "Checkered" reshaping function (). Depth-wise circular convolution is employed to convolve each of the reshaped permutations (), which are then flattened () and fed to a fully-connected layer to generate the predicted object embedding (). Please refer to Section 5 for details.

Recently, neural networks have also been utilized to learn the score function

[neural_tensor_network, chandrahas2017, conve]. The motivation behind these approaches is that shallow methods like TransE [transe] and DistMult [distmult] are limited in their expressiveness. As noted in [conve], the only way to remedy this is to increase the size of their embeddings, which leads to an enormous increase in the number of parameters and hence limits their scalability to larger knowledge graphs.

Convolutional Neural Networks (CNN) have the advantage of using multiple layers, thus increasing their expressive power, while at the same time remaining parameter-efficient. [conve] exploit these properties and propose ConvE - a model which applies convolutional filters on stacked 2D reshapings of entity and relation embeddings. Through this, they aim to increase the number of interactions between components of these embeddings.

In this paper, we conclusively establish that increasing the number of such interactions is beneficial to link prediction performance, and show that the number of interactions that ConvE can capture is limited. We propose InteractE, a novel CNN based KG embedding approach which aims to further increase the interaction between relation and entity embeddings. Our contributions are summarized as follows:

  1. [itemsep=2pt,parsep=0pt,partopsep=0pt,leftmargin=*,topsep=0.2pt]

  2. We propose InteractE, a method that augments the expressive power of ConvE through three key ideas – feature permutation, "checkered" feature reshaping, and circular convolution.

  3. We provide a precise definition of an interaction, and theoretically analyze InteractE to show that it increases interactions compared to ConvE. Further, we establish a correlation between the number of heterogeneous interactions (refer to Def. 4.2) and link prediction performance.

  4. Through extensive evaluation on various link prediction datasets, we demonstrate InteractE’s effectiveness (Section 9).

We have made available the source code of InteractE and datasets used in the paper as a supplementary material.

2 Related Work

Non-Neural: Starting with TransE [transe]

, there have been multiple proposed approaches that use simple operations like dot products and matrix multiplications to compute a score function. Most approaches embed entities as vectors, whereas for relations, vector

[transe, hole], matrix [distmult, analogy]

and tensor


representations have been explored. For modeling uncertainty of learned representations, Gaussian distributions

[gaussian_kg, transg] have also been utilized. Methods like TransE [transe] and TransH [transh] utilize a translational objective for their score function, while DistMult [distmult] and ComplEx [complex] use a bilinear diagonal based model.

Neural Network based: Recently, Neural Network (NN) based score functions have also been proposed. Neural Tensor Network [neural_tensor_network] combines entity and relation embeddings by a relation-specific tensor which is given as input to a non-linear hidden layer for computing the score. [kg_incomp1, chandrahas2017]

also utilize a Multi-Layer Perceptron for modeling the score function.

Convolution based: Convolutional Neural Networks (CNN) have also been employed for embedding Knowledge Graphs. ConvE [conve] uses convolutional filters over reshaped subject and relation embeddings to compute an output vector and compares this with all other entities in the knowledge graph. sacn_paper propose ConvTransE a variant of the ConvE score function. They eschew 2D reshaping in favor of directly applying convolution on the stacked subject and relation embeddings. Further, they propose SACN which utilizes weighted graph convolution along with ConvTransE.

ConvKB [convkb] is another convolution based method which applies convolutional filters of width 1 on the stacked subject, relation and object embeddings for computing score. As noted in [sacn_paper], although ConvKB was claimed to be superior to ConvE, its performance is not consistent across different datasets and metrics. Further, there have been concerns raised about the validity of its evaluation procedure111 Hence, we do not compare against it in this paper. A survey of all variants of existing KG embedding techniques can be found in [survey2016nickel, survey2017].

3 Background

KG Link Prediction: Given a Knowledge Graph (KG) , where and denote the set of entities and relations, and denotes the triples (facts) of the form , the task of link prediction is to predict new facts such that and , based on the existing facts in KG. Formally, the task can be modeled as a ranking problem, where the goal is to learn a function which assigns higher scores to true or likely facts than invalid ones.

Scoring Function
Table 1: The scoring functions of various knowledge graph embedding methods. Here, except for ComplEx and RotatE, where they are complex vectors , denotes circular-correlation, denotes convolution, represents Hadamard product and denotes depth-wise circular convolution operation.

Most existing KG embedding approaches define an encoding for all entities and relations, i.e., . Then, a score function is defined to measure the validity of triples. Table 1 lists some of the commonly used score functions. Finally, to learn the entity and relation representations, an optimization problem is solved for maximizing the plausibility of the triples in the KG.

ConvE: In this paper, we build upon ConvE [conve], which models interaction between entities and relations using 2D Convolutional Neural Networks (CNN). The score function used is defined as follows:

where, , denote 2D reshapings of , , and denotes the convolution operation. The 2D reshaping enhances the interaction between entity and relation embeddings which has been found to be helpful for learning better representations [hole].

4 Notation and Definitions

Let , where , be an entity and a relation embedding respectively, and let be a convolutional kernel of size . Further, we define that a matrix is a -submatrix of another matrix if such that . We denote this by .

Definition 4.1.

(Reshaping Function) A reshaping function transforms embeddings and into a matrix , where . For conciseness, we abuse notation and represent by . We define three types of reshaping functions.

  • [itemsep=3pt,parsep=3pt,partopsep=3pt,leftmargin=10pt,topsep=3pt]

  • Stack () reshapes each of and into a matrix of shape , and stacks them along their height to yield an matrix (Fig. 2a). This is the reshaping function used in [conve].

  • Alternate () reshapes and into matrices of shape , and stacks rows of and alternately. In other words, as we decrease , the "frequency" with which rows of and alternate increases. We denote as for brevity (Fig. 2b).

  • Chequer () arranges and such that no two adjacent cells are occupied by components of the same embedding (Fig. 2c).

Definition 4.2.

(Interaction) An interaction is defined as a triple , such that is a -submatrix of the reshaped input embeddings; and are distinct components of or . The number of interactions is defined as the cardinality of the set of all possible triples. Note that can be replaced with

for some padding function


An interaction is called heterogeneous if and are components of and respectively, or vice-versa. Otherwise, it is called homogeneous. We denote the number of heterogeneous and homogeneous interactions as and respectively. For example, in a matrix , if there are components of and of , then the number of heterogeneous and homogeneous interactions are: , and . Please note that the sum of total number of heterogenous and homogenous interactions in a reshaping function is constant and is equal to , i.e., .

Figure 2: Different types of reshaping functions we analyze in this paper. Here, , and . Please refer to Section 4 for more details.

5 InteractE Overview

Recent methods [distmult, hole] have demonstrated that expressiveness of a model can be enhanced by increasing the possible interactions between embeddings. ConvE [conve] also exploits the same principle albeit in a limited way, using convolution on 2D reshaped embeddings. InteractE extends this notion of capturing entity and relation feature interactions using the following three ideas:

  • [itemsep=3pt,parsep=3pt,partopsep=3pt,leftmargin=10pt,topsep=3pt]

  • Feature Permutation: Instead of using one fixed order of the input, we utilize multiple permutations to capture more possible interactions.

  • Checkered Reshaping: We substitute simple feature reshaping of ConvE with checked reshaping and prove its superiority over other possibilities.

  • Circular Convolution: Compared to the standard convolution, circular convolution allows to capture more feature interactions as depicted in Figure 3. The convolution is performed in a depth-wise manner [depthwise_convolution] on different input permutations.

6 InteractE Details

In this section, we provide a detailed description of the various components of InteractE. The overall architecture is depicted in Fig. 1. InteractE learns a -dimensional vector representation for each entity and relation in the knowledge graph, where .

6.1 Feature Permutation

To capture a variety of heterogeneous interactions, InteractE first generates -random permutations of both and , denoted by

. Note that with high probability, the sets of interactions within

for different are disjoint. This is evident because the number of distinct interactions across all possible permutations is very large. So, for different permutations, we can expect the total number of interactions to be approximately times the number of interactions for one permutation.

6.2 Checkered Reshaping

Next, we apply the reshaping operation , and define . ConvE [conve] uses as a reshaping function which has limited interaction capturing ability. On the basis of Proposition 7.3, we choose to utilize as the reshaping function in InteractE, which captures maximum heterogeneous interactions between entity and relation features.

6.3 Circular Convolution

Motivated by our analysis in Proposition 7.4, InteractE uses circular convolution, which further increases interactions compared to the standard convolution. This has been successfully applied for tasks like image recognition [omnidirectionalwang2018]. Circular convolution on a -dimensional input with a filter is defined as:

where, denotes modulo and denotes the floor function. Figure 3 and Proposition 7.4 show how circular convolution captures more interactions compared to standard convolution with zero padding.

InteractE stacks each reshaped permutation as a separate channel. For convolving permutations, we apply circular convolution in a depth-wise manner [depthwise_convolution]. Although different sets of filters can be applied for each permutation, in practice we find that sharing filters across channels works better as it allows a single set of kernel weights to be trained on more input instances.

6.4 Score Function

The output of each circular convolution is flattened and concatenated into a vector. InteractE then projects this vector to the embedding space (). Formally, the score function used in InteractE is defined as follows:

where denotes depth-wise circular convolution, denotes vector concatenation, represents the object entity embedding matrix and is a learnable weight matrix. Functions and

are chosen to be ReLU and sigmoid respectively. For training, we use the standard binary cross entropy loss with label smoothing.

Figure 3: Circular convolution induces more interactions than standard convolution. Here, is a input matrix with components . The shaded region depicts where the filter is applied. Please refer to Section 6.3 for more details.

7 Theoretical Analysis

In this section, we analyze multiple variants of 2D reshaping with respect to the number of interactions they induce. We also examine the advantages of using circular padded convolution over the standard convolution.

For simplicity, we restrict our analysis to the case where the output of the reshaping function is a square matrix, i.e., . Note that our results can be extended to the general case as well. Proofs of all propositions herein are included in the supplementary material.

Proposition 7.1.

For any kernel of size , for all if

is odd and

if is even, the following statement holds:

Proposition 7.2.

For any kernel of size and for all (), the following statement holds:

Proposition 7.3.

For any kernel of size and for all reshaping functions , the following statement holds:

Proposition 7.4.

Let , denote zero padding and circular padding functions respectively, for some . Then for any reshaping function ,

FB15k-237 WN18RR YAGO3-10
MRR MR H@10 H@1 MRR MR H@10 H@1 MRR MR H@10 H@1
DistMult [distmult] .241 254 .419 .155 .430 5110 .49 .39 .34 5926 .54 .24
ComplEx [complex] .247 339 .428 .158 .44 5261 .51 .41 .36 6351 .55 .26
R-GCN [r_gcn] .248 - .417 .151 - - - - - - - -
KBGAN [kbgan] .278 - .458 - .214 - .472 - - - - -
KBLRN [kblrn] .309 209 .493 .219 - - - - - - - -
ConvTransE [sacn_paper] .33 - .51 .24 .46 - .52 .43 - - - -
SACN [sacn_paper] .35 - .54 .26 .47 - .54 .43 - - - -
RotatE [rotate] .338 177 .533 .241 .476 3340 .571 .428 .495 1767 .670 .402
ConvE [conve] .325 244 .501 .237 .43 4187 .52 .40 .44 1671 .62 .35
InteractE (Proposed Method) .354 172 .535 .263 .463 5202 .528 .430 .541 2375 .687 .462
Table 2: Link prediction results of several models evaluated on FB15k-237, WN18RR and YAGO3-10. We find that InteractE outperforms all other methods across metrics on FB15k-237 and in out of settings on YAGO3-10. Since InteractE generalizes ConvE, we highlight performance comparison between the two methods specifically in the table above. Please refer to Section 9.1 for more details.

8 Experimental Setup

8.1 Datasets

In our experiments, following [conve, rotate], we evaluate on the three most commonly used link prediction datasets. A summary statistics of the datasets is presented in Table 3.

  • [itemsep=2pt,parsep=0pt,partopsep=0pt,leftmargin=10pt,topsep=2pt]

  • FB15k-237 [toutanova] is a improved version of FB15k [transe] dataset where all inverse relations are deleted to prevent direct inference of test triples by reversing training triples.

  • WN18RR [conve] is a subset of WN18 [transe] derived from WordNet [wordnet], with deleted inverse relations similar to FB15k-237.

  • YAGO3-10 is a subset of YAGO3 [yago] constitutes entities with at least 10 relations. Triples consist of descriptive attributes of people.

8.2 Evaluation protocol

Following [transe]

, we use the filtered setting, i.e., while evaluating on test triples, we filter out all the valid triples from the candidate set, which is generated by either corrupting the head or tail entity of a triple. The performance is reported on the standard evaluation metrics: Mean Reciprocal Rank (MRR), Mean Rank (MR) and Hits@1, and Hits@10. We report average results across

runs. We note that the variance is substantially low on all the metrics and hence omit it.

8.3 Baselines

In our experiments, we compare InteractE against a variety of baselines which can be categorized as:

  • [itemsep=3pt,parsep=0pt,partopsep=0pt,leftmargin=10pt,topsep=2pt]

  • Non-neural: Methods that use simple vector based operations for computing score. For instance, DistMult [distmult], ComplEx [complex], KBGAN [kbgan], KBLRN [kblrn] and RotatE [rotate].

  • Neural: Methods which leverage a non-linear neural network based architecture in their scoring function. This includes R-GCN [r_gcn], ConvE [conve], ConvTransE [sacn_paper], and SACN [sacn_paper].

Dataset # Triples
Train Valid Test
FB15k-237 14,541 237 272,115 17,535 20,466
WN18RR 40,943 11 86,835 3,034 3,134
YAGO3-10 123,182 37 1,079,040 5,000 5,000
Table 3: Details of the datasets used. Please see Section 8.1 for more details.

9 Results

In this section, we attempt to answer the questions below:

  • [itemsep=1pt,topsep=2pt,parsep=0pt,partopsep=0pt,leftmargin=20pt]

  • How does InteractE perform in comparison to the existing approaches? (Section 9.1)

  • What is the effect of different feature reshaping and circular convolution on link prediction performance? (Section 9.2)

  • How does the performace of our model vary with number of feature permutations? (Section 9.3)

  • What is the performance of InteractE on different relation types? (Section 9.4)

(a) FB15k-237 dataset
(b) WN18RR dataset
Figure 4: Performance with different feature reshaping and convolution operation on validation data of FB15k-237 and WN18RR. Stack and Alt denote Stacked and Alternate reshaping as defined in Section 4. As we decrease the number of heterogeneous interactions increases (refer to Proposition 7.2). The results empirically verify our theoretical claim in Section 7 and validate the central thesis of this paper that increasing heterogeneous interactions improves link prediction performance. Please refer to Section 9.2 for more details.

9.1 Performance Comparison

In order to evaluate the effectiveness of InteractE, we compare it against the existing knowledge graph embedding methods listed in Section 8.3. The results on three standard link prediction datasets are summarized in Table 2. The scores of all the baselines are taken directly from the values reported in the papers [conve, rotate, sacn_paper, kbgan, kblrn]. Since our model builds on ConvE, we specifically compare against it, and find that InteractE outperforms ConvE on all metrics for FB15k-237 and WN18RR and on three out of four metrics on YAGO3-10. On an average, InteractE obtains an improvement of %, %, and % on FB15k-237, WN18RR, and YAGO3-10 on MRR over ConvE. This validates our hypothesis that increasing heterogeneous interactions help improve performance on link prediction. For YAGO3-10, we observe that the MR obtained from InteractE is worse than ConvE although it outperforms ConvE on all other metrics. Simliar trend has been observed in [conve, rotate].

Compared to other baseline methods, InteractE outperforms them on FB15k-237 across all the metrics and on out of metrics on YAGO3-10 dataset. The below-par performance of InteractE on WN18RR can be attributed to the fact that this dataset is more suitable for shallow models as it has very low average relation-specific in-degree. This is consistent with the observations of [conve].

9.2 Effect of Feature Reshaping and Circular Convolution

In this section, we empirically test the effectiveness of different reshaping techniques we analyzed in Section 7. For this, we evaluate different variants of InteractE on validation data of FB15k-237 and WN18RR with the number of feature permutations set to . We omit the analysis on YAGO3-10 given its large size. The results are summarized in Figure 4. We find that the performance with Stacked reshaping is the worst, and it improves when we replace it with alternate reshaping. This observation is consistent with our findings in Proposition 7.1. Further, we find that MRR improves on decreasing the value of in alternate reshaping, which empirically validates Proposition 7.2. Finally, we observe that checkered reshaping gives the best performance across all reshaping functions for most scenarios, thus justifying Proposition 7.3.

We also compare the impact of using circular and standard convolution on link prediction performance. The MRR scores are reported in Figure 4. The results show that circular convolution is consistently better than the standard convolution. This also verifies our statement in Proposition 7.4. Overall, we find that increasing interaction helps improve performance on the link prediction task, thus validating the central thesis of our paper.

Figure 5: Performance on the validation data of FB15k-237, WN18RR, and YAGO3-10 with different numbers of feature permutations. We find that although increasing the number of permutations improves performance, it saturates as we exceed a certain limit. Please see Section 9.3 for details.
RotatE ConvE InteractE

Head Pred

1-1 0.498 359 0.593 0.374 223 0.505 0.386 175 0.547
1-N 0.092 614 0.174 0.091 700 0.17 0.106 573 0.192
N-1 0.471 108 0.674 0.444 73 0.644 0.466 69 0.647
N-N 0.261 141 0.476 0.261 158 0.459 0.276 148 0.476

Tail Pred

1-1 0.484 307 0.578 0.366 261 0.51 0.368 308 0.547
1-N 0.749 41 0.674 0.762 33 0.878 0.777 27 0.881
N-1 0.074 578 0.138 0.069 682 0.15 0.074 625 0.141
N-N 0.364 90 0.608 0.375 100 0.603 0.395 92 0.617
Table 4: Link prediction results by relation category on FB15k-237 dataset for RotatE, ConvE, and InteractE. Following (Wang et al., 2014b), the relations are categorized into one-to-one (1-1), one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N). We observe that InteractE is effective at capturing complex relations compared to RotatE. Refer to Section 9.4 for details.

9.3 Effect of Feature Permutations

In this section, we analyze the effect of increasing the number of feature permutations on InteractE’s performance on validation data of FB15k-237, WN18RR, and YAGO3-10. The overall results are summarized in Figure 5. We observe that on increasing the number of permuations although on FB15k-237, MRR remains the same, it improves on WN18RR and YAGO3-10 datasets. However, it degrades as the number of permutations is increased beyond a certain limit. We hypothesize that this is due to over-parameteralization of the model. Moreover, since the number of relevant interactions are finite, increasing the number of permutations could become redundant beyond a limit.

9.4 Evaluation on different Relation Types

In this section, we analyze the performance of InteractE on different relation categories of FB15k-237. We chose FB15k-237 for analysis over other datasets because of its more and diverse set of relations. Following [kg_relation_cat], we classify the relations based on the average number of tails per head and heads per tail into four categories: one-to-one, one-to-many, many-to-one, and many-to-many. The results are presented in Table 4. Overall, we find that InteractE is effective at modeling complex relation types like one-to-many and many-to-many whereas, RotatE captures simple relations like one-to-one better. This demonstrates that an increase in interaction allows the model to capture more complex relationships.

10 Conclusion

In this paper, we propose InteractE, a novel knowledge graph embedding method which alleviates the limitations of ConvE by capturing additional heterogeneous feature interactions. InteractE is able to achieve this by utilizing three central ideas, namely feature permutation, checkered feature reshaping, and circular convolution. Through extensive experiments, we demonstrate that InteractE achieves a consistent improvement on link prediction performance on multiple datasets. We also theoretically analyze the effectiveness of the components of InteractE, and provide empirical validation of our hypothesis that increasing heterogeneous feature interaction is beneficial for link prediction with ConvE. This work demonstrates a possible scope for improving existing knowledge graph embedding methods by leveraging rich heterogenous interactions.