5* Knowledge Graph Embeddings with Projective Transformations

06/08/2020 ∙ by Mojtaba Nayyeri, et al. ∙ University of Oxford University of Bonn Fraunhofer 0

Performing link prediction using knowledge graph embedding (KGE) models is a popular approach for knowledge graph completion. Such link predictions are performed by measuring the likelihood of links in the graph via a transformation function that maps nodes via edges into a vector space. Since the complex structure of the real world is reflected in multi-relational knowledge graphs, the transformation functions need to be able to represent this complexity. However, most of the existing transformation functions in embedding models have been designed in Euclidean geometry and only cover one or two simple transformations. Therefore, they are prone to underfitting and limited in their ability to embed complex graph structures. The area of projective geometry, however, fully covers inversion, reflection, translation, rotation, and homothety transformations. We propose a novel KGE model, which supports those transformations and subsumes other state-of-the-art models. The model has several favorable theoretical properties and outperforms existing approaches on widely used link prediction benchmarks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Knowledge graphs (KGs) have been successful in a range of AI tasks including question answering, data integration, and recommender systems. The main characteristic of KGs lies in their graph-based knowledge representation structure in the form of (head,relation,tail) triples, where head and tail are entities (nodes) with a relation (edge) between them. The usage of this graph structure addresses many of the previous challenges of machine learning for heterogeneous data with complex structure. However, KGs are usually incomplete, which directly affects the performance of learning models on various downstream learning tasks.

One of the approaches to deal with the knowledge graph incompleteness problem is to predict the missing links based on the existed ones. This can be done via knowledge graph embeddings (KGEs). Every KGE model uses a transformation function to map entities of the graph through relations in a vector space to score the plausibility of triples via a score function. The performance of KGE models heavily relies on the design of their score function that in turn defines the type of transformation they support. Such transformations distinguish the extent to which a model is able to learn complex motifs and patterns formed by combinations of the nodes and edges in the KG.

A systematic analysis of already existing KGEs shows that most of them have been designed in Euclidean geometry and usually support a single transformation type – often translation or rotation. This limits their ability in embedding complex graph structures. A brief overview of state-of-the-art KGE models and their support for different transformation types is given in Table 1. While all existing models cover at most two transformation types, projective geometry provides a uniform way for simultaneously representing five transformation types namely translation, rotation, homothety, inversion, and reflection. The combination of such transformation types results in various transformation functions (parabolic, circular, elliptic, hyperbolic, and loxodromic). Following this, projective transformations subsumes all five possible transformation functions.

Models Tran. Rot. Hom. Inv. Refl.
TransE
RotatE
ComplEx
QuatE
E
Table 1: Supported Transformations of KGEs.

Our core contribution is a new five-star embedding model, i.e. a model that simultaneously supports these five transformation types and consequently various-shaped transformation functions. Furthermore, we formally show that this model, dubbed E, is (a) fully expressive (as defined in wang2018multi ), (b) subsumes the KGE models DistMult, RotatE, pRotatE, TransE, and ComplEx; (c) allows to learn composition, inverse, reflexive and symmetric relation patterns. Our evaluation of standard link prediction benchmarks shows that E outperforms existing models.

2 Preliminaries and Background

2.1 Knowledge Graph Embeddings

A KG is a multi-relational directed graph as where are the set of nodes (entities) and edges (relations between entities) respectively. The set contains all triples as (head, relation, tail), e.g. (Paris, CapitalOf, France).

In order to apply learning methods on KGs, certain models are employed to transform KGs into a vector space. Knowledge Graph Embeddings (KGEs) are one of the most used techniques, which are based on learning vector representations of entities () and relations () of a KG. Specifically, a vector representation denoted by () is learned by the model per triple , where , and is a vector space. TransE bordes2013transe considers , in ComplEx complex2016trouillon and RotatE (complex space) is used and in QuatE quate2019zhang (quaternion space). In this paper, we choose a projective space to embed the graph i.e.  (a complex projective line which is introduced later).

Most KGE models are defined via a relation-specific transformation function which maps head entities to tail entities, i.e. . On top of such a transformation function, the score function is defined to measure the plausibility for triples: . Generally, the formulation of any score function can be either or .

2.2 Projective Geometry

Projective geometry uses homogeneous coordinates which represent -dimensional coordinates with numbers (i.e. use one additional parameter). For example, a point in 2D Cartesian coordinates, becomes in homogeneous coordinates where . In the case of 1-dimensional real numbers, becomes where . The key elements of projective geometry are as follows:

A projective line is a space in which a projective geometry is defined. A projective geometry requires a point at infinity in order to satisfy the axiom of “two parallel lines intersect in infinity”. Therefore, an extended line (where is a real line) is realized with and a point at infinity (which topologically is a circle). More concretely, the projective line is a set with an additional member denoting the point at infinity. The projective line is real () when . In case of , where is complex space, the set denotes the complex projective line .

The Riemann Sphere is an extended complex plane with a point at infinity. More precisely, it is built on a plane of complex numbers wrapped around a sphere where poles denote and . In projective geometry, every complex line is a Riemann sphere. The Riemann sphere is employed as a tool for projective transformations as shown in Figure 1.

A Projective Transformation is the mapping of the Riemann sphere to itself. Let be the homogeneous coordinates of a point in . A projective transformation in is expressed by a matrix multiplication richter2011perspectives ; salomon2007transformations as such that

(1)

where the matrix must be invertible (). By identifying with a projective transformation is represented by a fractional expression through a sequence of homogenization, transformation, and dehomogenization as

(2)

where the mapping is defined as

(3)

The resulting mapping introduced in Equation 3 describes all Möbius transformations.

The Möbius Group is the set of all Möbius transformations which is a projective linear group , i.e., the group of all invertible matrices with the operation of matrix multiplication on a projective space. The group is denoted by as it is the automorphism group of the Riemann sphere or equivalently

(a) circular
(b) elliptic
(c) hyperbolic
(d) loxodromic
(e) parabolic
Figure 1: Default transformation functions with Riemann Sphere (first row) and the Möbius shape for each transformation after projection on a complex plane (second row) are shown.

2.3 Variants of Möbius Transformations

Every Möbius transformation has at most two fixed points on the Riemann sphere obtained by solving richter2011perspectives which gives

(4)

Depending on the number of fixed points, Möbius transformations form parabolic or circular (one fixed point), elliptic as well as hyperbolic, and loxodromic (two fixed points) transformation functions (see Figure 1-upper row, and Table 2 for detailed conditions). All transformations in each group form a subgroup which is isomorphic to the group of all matrices mentioned in the row Iso in Table 2.

The illustration in the lower row of Figure 1 gives insights about the way the Möbius transformation induces the five transformation types (translation, rotation, inversion, reflection and homothety). Given a grid, the transformation is performed by (a) a stereographic projection from Complex plane to Riemann sphere, (b) moving the sphere, (c) stereographic projection from sphere to plane. Each transformation has a characteristic constant which determines sparsity/density of the transformation. is an expansion factor which indicates how the fixed point is repulsive, and the second fixed point is attractive. is a rotation factor, determining the degree to which a transformation rotates the plane counter-clockwise around and clockwise around

Function Parabolic Circular Elliptic Hyperbolic Loxodromic
Condition 15cm
() 15cm
15cm
15cm
15cm
Isomorphic
Table 2: Types of Möbius transformations and their conditions.

3 Related Work

KGE models can be classified according to their embedding space. We will first cover KGEs operating in Euclidean space and then describe related work for other geometric spaces.

Euclidean Knowledge Graph Embedding Models A large number of KGE models such as TransE bordes2013transe and its variants ji2015knowledge ; lin2015learning ; wang2014knowledge as well as RotatE sun2019rotate

are designed using translational or rotational (Hadamard product) score functions in Euclidean space. The score and loss functions of these models optimize the embedding vectors in a way that maximise the plausibility of triples, which is measured by the distance between rotated/translated head and tail vectors. Some embedding models such as DisMult 

yang2014embeddingDistmult , ComplEx complex2016trouillon , QuatE quate2019zhang , and RESCAL nickel2011three

, including our proposed model, are designed based on element-wise multiplication of transformed head and tail. In this case, the plausibility of triples is measured based on the angle of transformed head and tail. A third category of KGE models are those designed on top of Neural networks (NN) as score function such as ConvE 

dettmers2018convolutional and NTN socher2013reasoning .

Non-Euclidean Knowledge Graph Embedding Models The aforementioned KGE models are limited to Euclidean space, which limits their ability to embed complex structures. Some recent efforts investigated other spaces for embeddings of structures - often simpler structures than KGs. For example, the hyperbolic space has been extensively studied in scale-free networks. In recent work, learning continuous hierarchies from unstructured similarity scores using the Lorentz model was investigated nickel2018learning . In balazevic2019multi , an embedding model dubbed MuRP is proposed that embeds multi-relational KGs on a Poincaré ball ji2016knowledge

. MuRP only focuses on resolving the problem of embedding on KGs with multiple simultaneous hierarchies. Overall, while the advantages of projective geometry are eminent in a wide variety of application domains, including computer vision and robotics, to our knowledge no investigation has focused on it within the context of knowledge graph embeddings.

4 Method

Our method 5E inherits the five main pillars of projective transformation, namely translation, rotation, homothety, inversion and reflection. The pipeline for performing the transformation includes the following steps: (1) element-wise stereographic projection in order to map the head entity from a complex plane into a point on a Riemann sphere, (2) relation-specific transformation to move the Riemann sphere into a new position and/or direction; (3) stereographic projection to project the mapped head from the Riemann sphere to a complex plane, (4) selection of complex inner product between the transformed head and the tail.

4.1 Model Formulation

Embedding of Knowledge Graphs on a Complex Projective Line Let be the embedding dimension. Given a triple , the head and tail entities are embedded into a dimensional complex projective line i.e. . A relation is embedded into a dimensional vector where each element is a matrix. contains four complex vectors and . With , we refer to the th element of respectively.

Relation-specific Transformation In Section 2.2, we showed that for a projective transformation on the complex projective line, there exists an equivalent transformation on the Riemann sphere. We present our model formulation using both perspectives as this allows to understand them more comprehensively.

Möbius Representation of Transformation: We use a relation-specific Möbius transformation to map the head entity () from a source to a target complex plane (). The transformation is performed using stereographic projection and transformation () on/from the Riemann sphere. To do so, we compute to specify the element-wise transformation:

(5)

This results in the relation-specific transformed head entity

Projective Representation of Transformation: Using homogeneous coordinates, we can also represent the Möbius transformation from Equation 5 as a projective transformation:

(6)

where the matrix and the subsequent matrices of are invertible i.e. . The matrix representation of Equation 6 is where and is a vector with all the elements being 1.

Score Function The correctness of triples in a KG is the similarity between the relation-specific transformed head and tail . The model aims to minimize the angle between and tail , i.e. their product () is maximized for positive triples. For sampled negative triples, it is conversely minimized. Overall, the score function for E is

(7)

where is the function that returns the real part of the complex number .

4.2 Theoretical Analysis

We first show that E is a composition of translation, rotation, homothety, inversion and reflection transformations. We then prove that E is fully expressive and subsumes various popular and state-of-the-art KGE models namely TransE, DistMult, ComplEx, RotatE, and pRotatE. Further details, including all proofs, are in the supplementary material.

Möbius – Composition of Five Transformations The Möbius transformation in Equation 5 is a composition of a series of five subsequent transformations and as shown in kisil2012geometry .

(8)

where (translation by ), (inversion and reflection w.r.t. real axis), (homothety and rotation) and (translation by . This shows that E is capable of performing 5 transformations simultaneously.

Subsumption of Other KGE Models

Definition 1 (from wang2018multi ).

A model subsumes a model when any scoring over triples of a KG measured by model can also be obtained by model .

We can formally show that E subsumes various state-of-the-art models:

Proposition 1.

E with variants of its score function subsumes DistMult, pRotatE, RotatE, TransE and ComplEx. Specifically, E subsumes DistMult, ComplEx and pRotatE with its original score function and subsumes RotatE and TransE with score function (changed inner product to distance).

Definition 2 (from kazemi2018simple ).

A model is fully expressive if there exist assignments to the embeddings of the entities and relations, that accurately separate correct triples from incorrect ones for any given ground truth.

Corollary 1.

The E model is fully expressive.

Inference of Patterns
For relations which exhibit patterns in the form of where premise can be a conjunction of several triples, a model is said to be able to infer those if the implication holds for the score function, i.e. if the score of all triples in the premise is positive then the score for the conclusion must be positive. We investigated the inference ability of 5E for specific patterns including reflexive, symmetric, inverse relations and composition.

Proposition 2.

Let be relations and (e.g. UncleOf) a composition of and E infers composition with

Proposition 3.

Let be the inverse of . E infers this pattern with

Proposition 4.

Let be symmetric. E infers the symmetric pattern if

Proposition 5.

Let be a reflexive relation. In dimension , E infers reflexive patterns with distinct representations of entities if the fixed points are non-identical.

TransE only infers composition and inverse patterns, and RotatE is capable of inferring all the mentioned patterns but it is not fully expressive. ComplEx infers these patterns and is fully expressive. However, it has less flexibility in learning complex structures due to using only rotation and homothety.

Discussion on Other Model Properties

5E inherits various important properties of projective transformation as well as Möbius transformations. Because the projective linear group is isomorphic to the Möbius group, i.e.,  kisil2012geometry , the properties which are mentioned for Equation 6 are also valid for Equation 5. We investigate the inherited properties of 5E from two perspectives: capturing local similarities of nodes, and capturing structural groups.

Capturing Local Similarities The similarity of nodes in a KG is local, i.e. nodes of a neighborhood are more likely to be semantically more similar faerman2018lasagne ; hamilton2017representation than nodes at higher distance. A projective transformation is a bijective conformal mapping, i.e. it preserves angle locally but not necessarily the length. It also preserves orientation after mapping kisil2012geometry . Therefore, 5E is capable of capturing similarity by preserving angle locally via a relation-specific transformation of nodes.

Furthermore, the map ( is a generalized linear group, which transfers the matrix into a Möbius transformation is a group homomorphism. If then becomes limited to only perform a mapping from the special linear group to a Möbius group that preserves volume and orientation.

In the context of KGs, after a relation-specific transformation (Equation 6 or equivalently Equation 5) of nodes in the head position to nodes in tail position, the relative distance of nodes can be preserved. From this ability, we expect that 5E is able to propagate the structural similarity from one group of nodes to another.

Capturing Structural Groups When going beyond by changing the determinant to , the volume and orientation are changed after transformation. Therefore, 5E is more flexible than all of the current KGEs on KGs with various graph structures as those are not able to change volume and orientation. Additionally, the characteristic of a projective transformation in mapping line to circle or circle to line kisil2012geometry increases the flexibility of the model. This enables covering various shaped structural transformations (see Section 5). This strong flexibility is obtained by properly mixing various transformation types mentioned in Equation 8 and Table 1.

5 Experiments and Results

Experimental Setup Following the best practices of evaluations for embedding models, we consider the most-used metrics namely Mean Reciprocal Rank (MRR) and Hits@n. We evaluated our model on four widely used benchmark datasets namely FB15k, FB15k-237, WN18, and WN18RR. We compare against the best performing models on those benchmarks namely TransE bordes2013transe , RotatE sun2019rotate , TuckEr tucker2019balavzevic , ComplEx complex2016trouillon , QuatE quate2019zhang , MuRP balazevic2019multi , ConvE dettmers2018convolutional and SimplE kazemi2018simple . We developed our model on top of a standard framework lacroix2018canonical

and applied 1-N scoring loss with N3 regularization, and added reverse counterparts of each triple to the train set. All details for the metrics, training datasets and hyperparameters are in the supplementary material.

width=center Model WN18 WN18RR MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10 TransE 0.495 0.113 0.888 0.943 0.226 - - 0.501 RotatE 0.949 0.944 0.952 0.959 0.476 0.428 0.492 0.571 TuckEr 0.953 0.949 0.955 0.958 0.470 0.443 0.482 0.526 ComplEx 0.941 0.936 0.945 0.940 0.440 0.410 0.460 0.510 QuatE 0.950 0.944 0.954 0.960 0.482 0.436 0.499 0.572 SimplE 0.942 0.939 0.944 0.947 - - - - ConvE 0.943 0.935 0.946 0.956 0.430 0.400 0.440 0.520 MuRP - - - - 0.481 0.440 0.495 0.566 E d = 500 0.952 0.947 0.955 0.962 0.491 0.444 0.506 0.589 E d = 100 0.950 0.945 0.953 0.959 0.469 0.410 0.496 0.583 Model FB15k FB15k-237 MRR Hits@1 Hits@3 Hits@10 MRR Hits@1 Hits@3 Hits@10 TransE 0.463 0.297 0.578 0.749 0.294 - - 0.465 RotatE 0.699 0.585 0.788 0.872 0.327 0.233 0.363 0.517 TuckEr 0.795 0.741 0.833 0.892 0.358 0.266 0.394 0.544 ComplEx 0.692 0.599 0.759 0.840 0.247 0.158 0.275 0.428 QuatE 0.833 0.800 0.859 0.900 0.366 0.271 0.401 0.556 SimplE 0.727 0.660 0.773 0.838 - - - - ConvE 0.657 0.558 0.723 0.831 0.325 0.237 0.356 0.501 MuRP - - - - 0.335 0.243 0.367 0.518 E d = 500 0.816 0.775 0.843 0.890 0.359 0.265 0.395 0.547 E d = 100 0.732 0.658 0.780 0.859 0.348 0.257 0.382 0.533

Table 3: Link prediction results on WN18 and WN18RR as well as FB15k and FB15k-237.

Results and Discussion. The evaluation results are shown in Table 3, which includes results for 5E with embedding dimensions of 100 and 500. Results for other models are taken from quate2019zhang except for TuckER and MuRP which are taken from tucker2019balavzevic and balazevic2019multi . We first look at the WN18 and WN18RR benchmarks. Our model outperforms all state-of-the-art models across all metrics in WN18RR. This is visible in comparisons of the results for example in Hits@10 for which 5E gets around 0.590 whereas TransE as a translation-based model performs 0.501, RotatE as a rotation-based model gets 0.571, and Tucker shows 0.526. In WN18, our model outperforms other models for Hits@3 and Hits@10 while being close to best for MRR and Hits@1. Here, it should be considered that the only model performing better - QuatE - used an embedding dimension of 1000. Generally, we can observe that 5E obtains positive results with a low embedding dimension of 100 (lowest in all settings by others) on WN18.

On the FB15k datasets, we observe that 5E outperforms TransE, RotatE, ComplEx SimplE and MuRP on FB15K-237. Our model performs close to TuckEr. QuatE outperforms our model, which may be due to its higher embedding dimension (1000). The same pattern can be seen on FB15K, except for TuckEr, where 5E outperforms the model with a considerable margin on MRR, Hits@1,3.

(a) Original Grid
(b) hasPart relation
(c) partOf relation
(d) hypernym
(e) hyponym
Figure 2: Learned 5E embeddings for a selected relations in WN18RR.

Learned Transformation Types. Each relation in the KG is represented as projective transformations in 5E (one projective transformation per dimension). Figure 2 shows the transformation types learned by 5E in WN18RR relations, in a grid view. The original and plain view of the grid is given in sub-graph (a) for comparisons of the changes after the transformations, and (b) to (e) show specific relations in WN18. Here we highlight the analysis of the results on some example relations:

Inversion: In sub-graph (b), the lines (same-color points) in the original grid are mapped to circle or curve (see Section 4.2), after a relation-specific transformation by the hasPart relation. It is also visible in sub-graph (d) and (e) for hypernym and hyponym relations.
Rotation and Reflection: By comparing the direction of the lines with same color (e.g., red) in the original grid and in all examples of the transformed grids, we conclude that the learned transformation covers rotation, for example in hypernym and hyponym. We can also interpret the results for the hasPart relation as counter-clockwise rotation and then reflection w.r.t. the real axis.
Translation: In sub-graph (b), there is a movement in the real and imaginary axis of the grid towards down and slightly right for hasPart relation, which represents translation. However, this is not the case for hypernym relation.
Homothety: Semantically, the pairs (hypernym, hypernym) and (hasPart partOf) form inverse patterns (see Corollary 3). We see that the transformed grid of hypernym and hyponym are different only w.r.t. rotation. The scale is not changed, so the determinants of the two projective matrices are 1 (no homothety) (see Section 4.2). Comparing hasPart and partOf grids, the scale is changed, so the determinant of those two projection matrices should not be equal to one. This shows both of those transformations cover homothety.

Learned Transformation Functions. Figure 3 illustrates the results of learned transformation functions for various relations in WN18RR. Sub-figure (a) and (b) refer to the hyponym relation. However, the depicted shape of transformation function differs for hyperbolic and elliptic transformations. This confirms the flexibility of the model in embedding various graph structures as well as diversity in density/sparsity of flow (e.g., hyponym relation). We also observed that when two pairs of relations form inverse patterns (in the same dimension), the model mainly learns the same transformation functions but with different directions.

(a) Hyperbolic/hyponym
(b) Elliptic/hyponym
(c) Loxodromic/hypernym
(d) Circular/memberOf
Figure 3: Learned 5E transformation functions for relations in WN18RR.

6 Conclusion

In this paper, we introduce a new knowledge graph embedding model which operates on the complete set of projective transformations. We build the model on well researched generic mathematical foundations and could indeed show that it subsumes other state-of-the-art embedding models. Furthermore, we prove that the model is fully expressive. By supporting a wider range of transformations than previous models, it can embed KGs with more complex structures, supports a wider range of relational patterns and can suitably handle areas of the KG with varying density. Our experimental evaluation on four well established datasets shows that the model outperforms multiple recent strong baselines.

References