Knowledge Hypergraphs: Extending Knowledge Graphs Beyond Binary Relations

06/01/2019 ∙ by Bahare Fatemi, et al. ∙ Element AI Inc The University of British Columbia 0

Knowledge graphs store facts using relations between pairs of entities. In this work, we address the question of link prediction in knowledge bases where each relation is defined on any number of entities. We represent facts in a knowledge hypergraph: a knowledge graph where relations are defined on two or more entities. While there exist techniques (such as reification) that convert the non-binary relations of a knowledge hypergraph into binary ones, current embedding-based methods for knowledge graph completion do not work well out of the box for knowledge graphs obtained through these techniques. Thus we introduce HypE, a convolution-based embedding method for knowledge hypergraph completion. We also develop public benchmarks and baselines for our task and show experimentally that HypE is more effective than proposed baselines and existing methods.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge Graphs are graph structured knowledge bases that store facts about the world. Such graph structures have applications in several tasks such as search (singhal) and automatic question answering (watson). A large number of knowledge graphs have been created and are publicly available, such as NELL (carlson2010toward), Freebase (bollacker2008freebase), and Google Knowledge Vault (dong2014knowledge). Since accessing and storing all the facts in the world is difficult, knowledge graphs are incomplete; the goal of link prediction (or knowledge graph completion) in knowledge graphs is to predict unknown links or relationships between entities based on existing ones. More precisely, knowledge graphs are directed graphs with labeled edges as relations and nodes as entities. These relations are directed from the head entity to the tail entity. A knowledge graph can be represented as a set of triples, which we denote by , and that represent information as a collection of binary relations.

Embedding-based models (nickel2012factorizing; nguyen2017overview; wang2017knowledge) have proved to be effective for knowledge graph completion. These approaches learn embeddings for entities and relations. To find out if is a fact (i.e. is true), such models define a function that embeds relation and entities and

, and produces the probability that

is a fact. Such embedding-based methods are successful, but they make the strong assumption that all relations are binary (defined between exactly two entities). m-TransH (m-TransH) observe that in Freebase more than rd of the entities participate in non-binary relations (defined on more than two entities). We observe in addition, that % of the relations in Freebase are non-binary.

In this paper, we define a Knowledge Hypergraph as a generalization of a knowledge graph where relations are defined on two or more entities. We then introduce HypE, an embedding-based method for knowledge hypergraph completion that predicts new relations among entities of the hypergraph. HypE uses a new representation for an entity when it appears in a fact based on learned position-dependent convolution filters and entity embeddings. While convolutions are used mainly in vision tasks, ConvE ConvE and balazevic2018hypernetwork balazevic2018hypernetwork motivate their use beyond vision by highlighting that convolutions are parameter-efficient, fast to compute on a GPU, and have various robust methods to prevent overfitting. We evaluate the proposed method on standard binary and non-binary datasets.

The contributions of this work are: (1) HypE, an embedding-based method for knowledge hypergraph completion that outperforms the baselines for beyond-binary knowledge graphs, (2) a set of baselines for knowledge hypergraph completion, obtained by extending current embedding-based methods and introducing new ones, and (3) two new knowledge hypergraphs obtained from subsets of Freebase, and that can serve as a new benchmark for evaluating knowledge hypergraph completion methods.

2 Motivation and Related Work

Knowledge hypergraph completion is a relatively under-explored area with most of the effective models designed for binary relations. We motivate our current work on link prediction for relations defined on more than two entities by outlining that simply adjusting current approaches to work with hypergraphs do not yield satisfactory results. Existing models can be used in the non-binary setting in either of two possible ways: (1) extending known models to work with non-binary relational data, or (2) converting the non-binary relations into binary ones (using methods such as reification or star-to-clique (m-TransH)), and then applying known knowledge graph completion methods.

(a) DEGREE_FROM_UNIVERSITY defined on three facts.
(b) Reifying non-binary relations with three additional entities.
(c) Converting non-binary relations into cliques.
Figure 1: Converting non-binary relations into binary ones. In this example, the three facts in the original graph (a) show that Turing received his PhD from Princeton and his undergraduate degree from King’s College Cambridge. Figures (b) and (c) show two methods of converting this ternary relation into three binary ones.

In the first case, the only known example that extends a known model to work directly with non-binary relations is m-TransH m-TransH, which is an extension of TransH (TransH). We describe m-TransH briefly later in this section, and compare its performance to that of our model in Section 6.

The second case is about adjusting the dataset to work with current knowledge graph models. We describe two approaches that could be used to convert the non-binary relations of a knowledge hypergraph into binary. The first approach is reification: to reify means to “make into an entity”. In order to reify a fact with a relation defined on entities, we first create a new entity and then create facts, each defining a relation between the new entity and each of the entities in the given fact. See Figure 0(b). The second approach is star-to-clique, which converts a fact defined on entities into facts with distinct relations between all pairwise entities in the fact. See Figure 0(c).

Both conversion approaches have their caveats when current embedding-based methods are applied to the resulting graphs. Consider the example in Figure 1. The three facts in this example (Figure 0(a)) pertain to the relation DEGREE_FROM_UNIVERSITY, and show that Turing received his PhD from Princeton and his undergraduate degree from King’s College Cambridge, while Michelle Obama received her undergraduate degree from Princeton. When we reify the hypergraph in this example (Figure 0(b)), we add three new reified entities. Even as this reified knowledge graph has only binary relations and can be used to train an existing model for knowledge graph completion, at test time, we first need to reify the test samples and define a way to embed the newly created entities – about which we have very little information (see Section 5.2 for an example of how we define embeddings for reified entities and how this method compares to our results). On the other hand, when we transform non-binary relations in a knowledge hypergraph into binary through the star-to-clique method, we lose some of the information that we otherwise had in the original hypergraph. In Figure 0(c), we can tell that Turing has graduate and undergraduate degrees and that he attended Princeton and King’s College Cambridge; but it is no longer clear which degree was granted by which institution.

Other existing methods that relate to our work in this paper can be grouped into three main categories: knowledge graph completion, knowledge hypergraph completion, and learning on hypergraphs. In the remainder of this section, we briefly discuss these approaches.

Knowledge graph completion. Embedding-based models have proved effective for knowledge graphs where all relations are binary. These approaches can be grouped in three main categories: translational, bilinear, and deep models. Translational approaches (TransE; TransH), represent relations as translations in the embedding space. For instance, if a triple is true, then the embedding of the head entity plus the embedding of the relation is close to the embedding of the tail. Bilinear approaches (DistMult; trouillon2016complex; kazemi2018simple) define the score of a triple as the product (where and

are the vector embeddings of the head and tail entities, and

is the matrix embedding of the relation between them). Finally, deep models (nickel2011three; socher2013reasoning)

use neural networks to learn embeddings for each of head, relation, and tail, and compute a score for every triple.

Knowledge hypergraph completion. There is a large family of link prediction models based on soft first-order logic rules (richardson2006markov; de2007problog; kazemi2014relational). While these models can easily handle variable arity relations and have the advantage of being interpretable, they are known to only learn a subset of patterns that exist in knowledge graphs, and thus are limited in their learning capacity (nickel2016review). The model presented in this paper is different from such completion methods, as our method is embedding-based, and consequently is more powerful than soft-rule approaches. The embedding-based work that is closest to our work is m-TransH (m-TransH) which extends TransH (TransH) to knowledge hypergraph completion. kazemi2018simple (Proposition 2) prove that TransH and other variants of translational approaches are not fully expressive and have restrictions in modeling relations. Similar to TransH, m-TransH is not fully expressive as it inherits the restrictions of TransH in modeling relations. We show in Section 4.1 that our proposed model is fully expressive and compare it to m-TransH in Section 6.

Learning on hypergraphs.

Hypergraph learning has been employed to model high-order correlations among data in many computer vision tasks, such as in video object segmentation 

(huang2009video) and in modeling image relationships and image ranking (huang2010image). There is also a line of work extending graph neural network frameworks to hypergraph neural networks (HGNN) and hypergraph convolution networks (HGCN). On the other hand, graph neural network models are designed for settings where the hypergraph is undirected, with edges that are not labeled (no relations). Knowledge hypergraphs have a different setup, in which predicting a link between (ordered) entities is also a function of the relation combining them. As there is no clear or easy way of extending these graph neural network models to our knowledge hypergraph setting, we do not consider them as baselines for our experiments.

3 Definition and Notation

A world consists of a finite set of entities , a finite set of relations , and a set of tuples defined over and . Each tuple in is of the form where is a relation and each is an entity, for all . We define the arity of a relation as the number of arguments that the relation takes and is fixed for each relation. A world specifies what is true: all the tuples in are true, and the tuples that are not in are false. A knowledge hypergraph consists of the entities and relations of the world, and a subset of the tuples . Link prediction in knowledge hypergraphs is the problem of predicting the missing tuples in , that is, finding the tuples .

An embedding

is a function that converts an entity or a relation into a vector (or sometimes a higher order tensor) over a field (typically the real numbers) We use bold lower-case for vectors, that is,

is an embedding of entity , and is an embedding of a relation .

Let be a set of vectors. The variadic function outputs the concatenation of its input vectors. The 1D convolution operator  takes as input a vector and a convolution weight filter , and outputs the convolution of with the filters . We define the variadic function to be the sum of the element-wise product of its input vectors, namely where each vector has the same length, and is the -th element of vector .

For the task of knowledge graph completion, an embedding-based model defines a function that takes a tuple as input, and generates a prediction, e.g., a probability (or score) of the tuple being true. A model is fully expressive if given any complete world (full assignment of truth values to all tuples), there exists an assignment of values to the embeddings of the entities and relations that accurately separates the tuples that are true in the world from those that are false.

4 HypE: a Knowledge Hypergraph Embedding Method

In this work we propose HypE, a novel embedding-based method for link prediction in knowledge hypergraphs. The idea at the core of our model is that the way an entity representation is used to make predictions is affected by the role that the entity plays in a given relation. In the example in Figure 1, Turing plays the role of a student at a university, but he may have a different role (e.g. ‘professor’) in another relation. This means that the way we use Turing’s embedding may need to be different for computing predictions for each of these roles.

In several embedding-based methods for knowledge graph completion, such as canonical polyadic CP; lacroix2018canonical, ComplEx trouillon2016complex, and SimplE kazemi2018simple, the prediction depends on the position of an entity in a relation. In particular, SimplE learns two embedding vectors and for an entity  and two embedding vectors and for a relation , and computes the score of a triple as . SimplE can be viewed as a special case of HypE. In what follows, we first provide our formulation of SimplE in terms of convolutions with fixed position-dependent filters; we then build on this formulation to lay out the details of HypE.

SimplE embeds each entity  as a single vector and each relation  as a single vector

. It considers four convolutional filters with stride 

as and for head and and for tail. When an entity  appears as head, SimplE uses and when an entity  appears as tail, it uses . In this setting, SimplE computes the score of as .

Instead of using fixed position-dependent filters, HypE learns the filters from the data. The main advantage of learning the filters is that it facilitates extending the formulation to beyond binary, as we can have different (learned) filters for each position. Thus, our model learns embeddings for entities and relations, as well as convolutional weight filters that transform the entity embeddings depending on the position of each in a given relation. For each fact, the transformed entity embeddings are then combined with the embedding of the relation to produce a score, e.g., a probability value that the input tuple is true. The architecture of HypE is summarized in Figure 2.

Let , , , and denote the number of filters per position, the filter-length, the embedding dimension and the stride of the convolution, respectively. Let be the convolutional filters associated with an entity at position , and let be the th row of . We denote by the projection matrix, where is the feature map size. For a given tuple, define to be a function that returns a vector of size based on the entity embedding and it’s position in the tuple. Thus, each entity embedding appearing at position in a given tuple is convolved with the set of position-specific filters to give feature maps of size . All feature maps corresponding to an entity are concatenated to a vector of size and projected to the embedding space by multiplying it by . The projected vectors of entities and the embedding of the relation are then combined by an inner-product to define the score function:

(a) Function
(b) Function
Figure 2: Visualization of HypE architecture. (a) function gets an entity embedding and the position the entity appears in the given tuple and returns a vector. (b) function gets as input a tuple. The transformed entity embeddings are then combined with .

4.1 Full Expressivity

Full expressivity of models has been the focus of several studies (simpleplus; trouillon2017knowledge; xu2018powerful). A model that is not fully expressive can easily overfit to the training data. The following theorem establishes the full expressivity of HypE. We defer its proof to the Appendix.

Theorem (Expressivity)

Let be a set of true tuples defined over entities and relations , and let be the maximum arity of the relations in . There exists a HypE model with embedding vectors of size at most that assigns to the tuples in and to others.

4.2 Objective Function and Training

To learn a HypE model, we use stochastic gradient descent with mini-batches. In each learning iteration, we iteratively take in a batch of positive tuples from the knowledge hypergraph. As we only have positive instances available, we need also to train our model on negative instances. For this purpose, for each positive instance, we produce a set of negative instances. For negative sample generation, we follow the contrastive approach of Bordes et al. 

TransE for knowledge graphs and extend it to knowledge hypergraphs: for each tuple, we produce a set of negative samples of size by replacing each of the entities with random entities in the tuple, one at a time. Here,

is the ratio of negative samples in our training set, and is a hyperparameter.

Given a knowledge hypergraph defined on , we let , , and denote the train, test, and validation sets, respectively, so that . For any tuple in , we let be a function that generate a set of negative samples through the process described above. We define the following cross entropy loss, which is a combination of softmax and negative log likelihood loss, and has been shown to be effective for link prediction (baselines-strike):

Here, {} represents relation embeddings, {} represents entity embeddings, and is the function given by equation (1) that maps a tuple to a score.

5 Experimental Setup

In this section, we introduce the datasets and the baselines we use to compare to our proposed model. At the end of the section, we discuss the evaluation metrics and implementation details.

5.1 Datasets

We conduct experiments on a total of different datasets (two containing only binary relations, and three with relations of arity to ). For the experiments on datasets with binary relations, we use two standard benchmarks for knowledge graph completion: WN18 (WN18) and FB15k (TransE). WN18 is a subset of Wordnet (miller1995wordnet) and FB15k is a subset of Freebase (bollacker2008freebase). We use the train, validation, and test split proposed by TransE.

The experiments on knowledge hypergraph completion are conducted on three datasets. The first is JF17K proposed by m-TransH; as no validation set is proposed for JF17K, we randomly select 20% of the train set as validation. We also create two datasets FB-auto and m-FB15K from Freebase. Note first that Freebase is a reified dataset; that is, it is created from a knowledge base having facts with relations defined on two or more entities. To obtain a knowledge hypergraph from Freebase, we perform an inverse reification process by following the steps below. Table 1 summarizes the statistics of the datasets.

  1. [label=(),leftmargin=*]

  2. From Freebase, remove the facts that have relations defined on a single entity, or that contain numbers or enumeration as entities.

  3. Convert the triples in Freebase that share the same entity into facts in . For example, the triples , , and , which were originally created by the addition of the (unique) reified entity , now represent fact in .

  4. Create the FB-auto dataset by selecting the facts from whose subject is ‘automotive’.

  5. Create the m-FB15K dataset by following a strategy similar to that proposed by TransE (TransE): select the facts in that pertain to entities present in the Wikilinks database wikilinks.

  6. Split the facts in each of FB-auto and m-FB15K randomly into train, test, and validation sets.

Dataset #train #valid #test arities
WN18 40,943 18 141,442 5,000 5,000 {2}
FB15k 14,951 1,345 483,142 50,000 59,071 {2}
JF17K 29,177 327 77,733 24,915 {2, 3, 4, 5, 6, 7}
FB-auto 3,410 8 6,778 2,255 2,180 {2, 4, 5}
m-FB15K 10,314 71 415,375 39,348 38,797 {2, 3, 4, 5}
Table 1: Statistics on the datasets.

5.2 Baselines

To compare our results to that of existing work, we first need to come up with some baselines for knowledge hypergraph completion. We achieve this by either extending current models on knowledge graph completion or by introducing new ones. The baselines we introduce in this work are grouped into the following three categories: (1) methods that work with binary relations and that are easily extendable to higher-arity: r-SimplE, m-DistMult, and m-CP; (2) simple (but non-trivial) extensions of current methods: m-SimplE, Shift1Left; and (3) existing methods that can handle higher-arity relations: m-TransH. Below we give some details about these proposed baselines:

r-SimplE: To test how well a model trained on reified data performs in practice, we converted higher-arity relations in the train set to binary relations through reification. We then use the SimplE model (that we call r-SimplE) to train and test on this reified data. In this setting, at test time higher-arity relations are first reified to a set of binary relations; this process creates new auxiliary entities for which the model has no learned embeddings. To embed the auxiliary entities for the prediction step, we use the observation we have about them at test time. For example, a higher-arity relation is reified at test time by being replaced by three facts: , , and . When predicting the tail entity of , we use the other two reified facts to learn an embedding for entity . Because is added only to help represent the higher-arity relations as a set of binary relations, we only do tail prediction for reified relations.

m-DistMult: DistMult (DistMult) defines a score function . To accommodate non-binary relations, we redefine this score function as .

m-CP: Canonical Polyadic (CP) decomposition (CP) embeds each entity as two vectors and , and each relation as a single vector . CP defines the score function . We extend CP to a variant (m-CP) that accommodates non-binary relations, and which embeds each entity as different vectors , where . m-CP computes the score of a tuple as .

m-SimplE: SimplE (kazemi2018simple) embeds each entity as two vectors and , and each relation as two vectors and . We reformulate SimplE as embedding each and as vectors and , and defining the score as . Here, shifts vector to the left by steps and returns length of vector . The encoding of SimplE is a special instance of the above encoding, with and . The m-SimplE score function is defined as where .

Shift1Left: Shift1Left works similar to m-SimplE. Shift1Left shifts entity embeddings to the left and computes the score with .

5.3 Evaluation Metrics

Given a knowledge hypergraph on , we evaluate various completion methods using a train and test set and . We use two evaluation metrics: Hit@t and Mean Reciprocal Rank (MRR). Both these measures rely on the ranking of a tuple within a set of corrupted tuples. For each tuple in and each entity position in the tuple, we generate corrupted tuples by replacing the entity with each of the entities in . For example, by corrupting entity , we would obtain a new tuple where . Let the set of corrupted tuples, plus , be denoted by . Let be the ranking of within based on the score for each . In an ideal knowledge hypergraph completion method, the rank is among all corrupted tuples. We compute the MRR as where is the number of prediction tasks. Hit@t measures the proportion of tuples in that rank among top in their corresponding corrupted sets. We follow TransE (TransE) and remove all corrupted tuples that are in from our computation of MRR and Hit@t.

5.4 Implementation Details

We implement HypE and the baselines in PyTorch 

(pytorch). We use Adagrad (adagrad) as the optimizer and dropout (srivastava2014dropout)

to regularize our model and baselines. We tune our hyperparameters over the validation set, and fix the maximum number of epochs to

and batch size to . We set the embedding size and negative ratio to and respectively. We compute the MRR of models over the validation set every epochs and select the epoch that results the best. The learning rate and dropout rate of all models are tuned. HypE has , and as hyperparameters. We select the hyperparameters of HypE and baselines via the same grid search based on MRR on the validation. The code of the proposed model, the baselines, and the datasets will be available upon acceptance of the paper.

JF17K FB-auto m-FB15K
Model MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
r-SimplE 0.102 0.069 0.112 0.168 0.106 0.082 0.115 0.147 0.051 0.042 0.054 0.070
m-DistMult 0.490 0.398 0.539 0.659 0.789 0.751 0.816 0.850 0.700 0.627 0.736 0.841
m-CP 0.521 0.443 0.560 0.665 0.797 0.768 0.815 0.855 0.744 0.687 0.769 0.857
Shift1Left 0.528 0.444 0.571 0.685 0.799 0.770 0.826 0.851 0.729 0.664 0.760 0.854
m-SimplE 0.530 0.447 0.573 0.687 0.800 0.771 0.820 0.851 0.727 0.661 0.759 0.856
m-TransH (m-TransH) 0.775 0.770 0.782 0.783 0.808 0.775 0.812 0.862 0.723 0.718 0.720 0.725
HypE (Ours) 0.827 0.799 0.858 0.859 0.907 0.839 0.963 1.00 0.795 0.730 0.815 0.875
Table 2: Knowledge hypergraph completion results on JF17K, FB-auto and m-FB15K for baselines and the proposed method. The prefixes ‘r’ and ‘m’ in the model names stand for reification and multi-arity respectively. Our method outperforms the baselines on all datasets.

6 Experiments

In this section, we evaluate HypE on binary and non-binary facts. Our method clearly outperforms m-TransH and the proposed baselines. As ablation study, we compute the breakdown performance of the models on different arities. To assess the performance of our method on binary relations, we evaluate it on WN18 and FB15K. We test our model on WN18 and FB15K, as the methods we compare against report results on only these datasets.

6.1 Knowledge Hypergraph Completion Results

Table 2 shows the knowledge hypergraph completion results for the proposed baselines and HypE across three datasets. Our model outperforms the proposed baselines on JF17K, FB-auto, and m-FB15K by a large margin. These results represent the clear advantage of HypE when higher-arity relations are available. The results also show that reification for the r-SimplE model does not work well; this is probably because the reification process introduces auxiliary entities that appear in very few facts, based on which the model is not able to learn an appropriate embedding. Comparing the results on r-SimplE and m-SimplE we can also see that extending a model works better than reification when higher-arity relations are present.

6.2 Ablation Study on Different Arities

For each of the baselines and HypE, we break down the performance across relations with different arities. Table 3 shows Hit@10 of the models for each arity in JF17K. We observe that HypE outperforms the baselines in all arities except arity . Its improved performance for relations with arity more than 2 may be one reason why its performance on binary relations improves as well. Note that HypE is designed to handle relations of any arity.

Model 2 3 4 5 6 All
r-SimplE 0.478 0.025 0.015 0.022 0.000 0.168
m-DistMult 0.484 0.680 0.851 0.960 0.792 0.659
m-CP 0.465 0.705 0.840 0.968 0.969 0.665
Shift1Left 0.497 0.725 0.856 0.974 0.271 0.685
m-SimplE 0.498 0.718 0.857 0.976 0.583 0.687
m-TransH (m-TransH) 0.748 0.865 0.744 0.964 0.803 0.783
HypE (Ours) 0.906 0.870 0.972 0.863 1.00 0.859
Table 3: Breakdown performance of Hit@10 across relations with different arities on JF17K.

6.3 Knowledge Graph Completion Results

Table 4 shows link prediction results on WN18 and FB15K. Baseline results are taken from the original papers except that of m-TransH, which we implement it ourself. To be fair when comparing our model to the baselines, we follow the kazemi2018simple (kazemi2018simple) setup with the same grid search approach: we set , , and so our models have the same number of parameters. This makes our our results directly comparable to knowledge graph completion methods, which sohw that HypE outperforms m-TransH on WN18 and FB15K. As we show in Section 4, SimplE can be formulated as a special case of HypE. This is also reflected in the results, as SimplE and HypE get comparable outcomes in the binary setting when they have the same number of parameters.

WN18 FB15k
Model MRR Hit@1 Hit@3 Hit@10 MRR Hit@1 Hit@3 Hit@10
CP (CP) 0.074 0.049 0.080 0.125 0.326 0.219 0.376 0.532
TransH (TransH) - - - 0.867 - - - 0.585
m-TransH (m-TransH) 0.671 0.495 0.839 0.923 0.351 0.228 0.427 0.559
DistMult (DistMult) 0.822 0.728 0.914 0.936 0.654 0.546 0.733 0.824
SimplE (kazemi2018simple) 0.942 0.939 0.944 0.947 0.727 0.660 0.773 0.838
HypE (Ours) 0.934 0.927 0.940 0.944 0.725 0.648 0.777 0.856
Table 4: Knowledge graph completion results on WN18 and FB15K for baselines and our model. Our model performs similar to the best baselines for knowledge graphs with binary relations.

7 Conclusion and Future Work

In this paper, we represent facts as a knowledge hypergraph: a graph where labeled edges are defined on two or more nodes. We propose HypE, a knowledge hypergraph completion method that embeds entities and relations, and predicts new links in a knowledge hypergraph. We introduce baselines for completing knowledge hypergraphs by extending current methods and introducing new ones. We also introduce two datasets for evaluating knowledge hypergraph completion methods by compiling subsets of Freebase. Based on our benchmarks, HypE achieves results comparable to the baselines for knowledge graph completion (having only binary relations), and outperforms the state of the art by a large margin for knowledge hypergraphs (having relations that are defined on two or more entities).


Appendix A Appendix

Proof of Theorem 1

Theorem 1 (Expressivity) Let be a set of true tuples defined over entities and relations , and let be the maximum arity of the relations in . Then there exists a HypE model with embedding vectors each of size at most that assigns to the tuples in and to tuples not in .


To prove the theorem, we show an assignment of embedding values for each of the entities and relations in such that the scoring function of HypE is as follows:

We begin the proof by first describing the embeddings of each of the entities and relations in HypE; we then proceed to show that with such an embedding, HypE can represent any world accurately.

Let us first assume that and let be the th fact in . We let each entity be represented with a vector of length in which the th block of -bits is the one-hot representation of in fact : if appears in fact at position , then the th bit of the th block is set to , and to otherwise. Each relation is then represented as a vector of length whose th bit is equal to if fact is defined on relation , and otherwise.

HypE defines different convolutional weight filters for each entity position within a tuple. As we have at most possible positions, we define each convolutional filter as a vector of length where the th bit is set to and all others to , for each . When the scoring function is applied to some tuple , for each entity position in , convolution filter is applied to the entity at position in the tuple as a first step; the function is then applied to the resulting vector and the relation embedding to obtain a score.

Given any tuple , we want to show that if and otherwise.

Figure 3: An example of an embedding where , and is the third fact in

First assume that is the th fact in that is defined on relation and entities where is the entity at position . Convolving each with results in a vector of length where the th bit is equal to (since both and the th block of have a at the th position) (See Figure 3. Then, as a first step, function computes the element-wise multiplication between the embedding of relation (that has at position ) and all of the convolved entity vectors (each having at position ); this results in a vector of length where the th bit is set to and all other bits set to . Finally, ) sums the outcome of the resulting products to give us a score of .

To show that when , we prove the contrapositive, namely that if , then must be a fact in . We proceed by contradiction. Assume that there exists a tuple such that . This means that at the time of computing the element-wise product in the function, there was a position at which all input vectors to had a value of . This can happen only when (1) applying the convolution filter to each of the entities in produces a vector having at position , and (2) the embedding of relation has at position .

The first case can happen only if all entities of appear in the th fact ; the second case happens only if relation appears in . But if all entities of as well as its relation appear in fact , then , contradicting our assumption. Therefore, if , then must be a fact in .

To complete the proof, we consider the case when . In this case, since there are no facts, all entities and relations are represented by zero-vectors of length . Then, for any tuple , . This completes the proof.