Multi-relational Poincaré Graph Embeddings

by   Ivana Balazevic, et al.

Hyperbolic embeddings have recently gained attention in machine learning due to their ability to represent hierarchical data more accurately and succinctly than their Euclidean analogues. However, multi-relational knowledge graphs often exhibit multiple simultaneous hierarchies, which current hyperbolic models do not capture. To address this, we propose a model that embeds multi-relational graph data in the Poincaré ball model of hyperbolic space. Our Multi-Relational Poincaré model (MuRP) learns relation-specific parameters to transform entity embeddings by Möbius matrix-vector multiplication and Möbius addition. Experiments on the hierarchical WN18RR knowledge graph show that our multi-relational Poincaré embeddings outperform their Euclidean counterpart and existing embedding methods on the link prediction task, particularly at lower dimensionality.



There are no comments yet.


page 1

page 2

page 3

page 4


Hyperbolic Hierarchical Knowledge Graph Embeddings for Link Prediction in Low Dimensions

Knowledge graph embeddings (KGE) have been validated as powerful methods...

BiQUE: Biquaternionic Embeddings of Knowledge Graphs

Knowledge graph embeddings (KGEs) compactly encode multi-relational know...

A Relational Tucker Decomposition for Multi-Relational Link Prediction

We propose the Relational Tucker3 (RT) decomposition for multi-relationa...

Multi-modal Entity Alignment in Hyperbolic Space

Many AI-related tasks involve the interactions of data in multiple modal...

DyERNIE: Dynamic Evolution of Riemannian Manifold Embeddings for Temporal Knowledge Graph Completion

There has recently been increasing interest in learning representations ...

Relational Learning Analysis of Social Politics using Knowledge Graph Embedding

Knowledge Graphs (KGs) have gained considerable attention recently from ...

Fine-Grained Entity Typing in Hyperbolic Space

How can we represent hierarchical information present in large type inve...

Code Repositories


Multi-relational Poincaré Graph Embeddings

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Hyperbolic space can be thought of as a continuous analogue of discrete trees, making it suitable for modelling hierarchical data structures Sarkar (2011); De Sa et al. (2018). Various types of hierarchical data have recently been embedded in hyperbolic space Nickel and Kiela (2017, 2018); Gulcehre et al. (2019); Tifrea et al. (2019)

, requiring relatively few dimensions and achieving promising results on downstream tasks. This demonstrates the advantage of modelling tree-like structures in spaces with constant negative curvature (hyperbolic) over zero-curvature spaces (Euclidean). More recently, tools needed to construct hyperbolic neural networks have been developed

Ganea et al. (2018a); Bécigneul and Ganea (2019), facilitating the use of hyperbolic embeddings in downstream tasks.

Certain data structures, such as knowledge graphs, often exhibit multiple hierarchies simultaneously. For example, lion is near the top of the animal food chain but near the bottom in a tree of taxonomic mammal types Miller (1995). Despite the widespread use of hyperbolic geometry in representation learning, the only existing approach to embedding hierarchical multi-relational graph data in hyperbolic space Suzuki et al. (2019) does not outperform Euclidean models. The difficulty with representing multi-relational data in hyperbolic space lies in finding a way to represent entities (nodes), shared across relations, such that they form a different hierarchy under different relations, e.g. nodes near the root of the tree under one relation may be leaf nodes under another. Further, many state-of-the-art approaches to modelling multi-relational data, such as DistMult Yang et al. (2015), ComplEx Trouillon et al. (2016), and TuckER Balažević et al. (2019) (i.e. bilinear models), rely on inner product as a similarity measure and there is no clear correspondence to the Euclidean inner product in hyperbolic space Tifrea et al. (2019) by which these models can be converted. Existing translational approaches that use Euclidean distance to measure similarity, such as TransE Bordes et al. (2013) and STransE Nguyen et al. (2016), can be converted to the hyperbolic domain, but do not currently compete with the bilinear models in terms of predictive performance. However, it has recently been shown in the closely related field of word embeddings Allen and Hospedales (2019) that the difference (i.e. relation) between word pairs that form analogies manifests as a vector offset, justifying a translational approach to modelling relations.

In this paper, we propose MuRP, a theoretically inspired method to embed hierarchical multi-relational data in the Poincaré ball model of hyperbolic space. By considering the surface area of a hypersphere of increasing radius centered at a particular point, Euclidean space can be seen to “grow” polynomially, whereas in hyperbolic space the equivalent growth is exponential De Sa et al. (2018). Therefore, moving outwards from the root of a tree, there is more “room” to separate leaf nodes in hyperbolic space than in Euclidean. MuRP learns relation-specific parameters that transform entity embeddings by Möbius matrix-vector multiplication and Möbius addition Ungar (2001). The model outperforms not only its Euclidean counterpart, but also current state-of-the-art models on the link prediction task on the hierarchical WN18RR dataset. We also show that our Poincaré embeddings require far fewer dimensions than Euclidean embeddings to achieve comparable performance. We visualize the learned embeddings and analyze the properties of the Poincaré model compared to its Euclidean analogue, such as convergence rate, performance per relation, and influence of embedding dimensionality.

2 Background and preliminaries

Multi-relation link prediction  A knowledge graph is a multi-relational graph representation of a collection of facts (or triples) of the form , where denotes the set of entities and denotes the set of binary relations between them. The presence of indicates that subject entity is related to object entity by relation . In a multi-relational graph representation of , nodes correspond to entities and typed directed edges represent relations, i.e. nodes for and are linked by a directed edge of type if and only if . Given a set of facts , the task of multi-relational link prediction is to predict triples that are true in . A perfect encoding of would simply recall known facts. However, knowledge graphs are typically incomplete, so the aim is to infer other facts that are true but missing from . Typically, a score function is learned, that assigns a score

to each triple, indicating the strength of prediction that a particular triple corresponds to a true fact. A non-linearity, such as the logistic sigmoid function, is often used to convert the score to a predicted probability

of the triple being true.

Knowledge graph relations exhibit multiple properties, such as symmetry, asymmetry, and transitivity. Certain knowledge graph relations, such as “hypernym” and “has_part”, induce a hierarchical structure over entities, suggesting that embedding them in hyperbolic rather than Euclidean space may lead to improved representations Sarkar (2011); Nickel and Kiela (2017, 2018); Ganea et al. (2018b); Tifrea et al. (2019). Based on this intuition, we focus on embedding multi-relational knowledge graph data in hyperbolic space.

(a) Poincaré disk geodesics.

(b) Model decision boundary.

(c) Spheres of influence.
Figure 1: (a) Geodesics in the Poincaré disk, indicating the shortest paths between pairs of points. (b) The model predicts the triple as true and as false. (c) Each entity embedding has a sphere of influence, whose radius is determined by the embedding’s bias.

Hyperbolic geometry of the Poincaré ball  The Poincaré ball model is one of five isometric models of hyperbolic geometry Cannon et al. (1997), each offering different perspectives for performing mathematical operations in hyperbolic space. The isometry means there exists a one-to-one distance-preserving mapping from the metric space of one model onto that of another , where are sets and distance functions, or metrics, providing a notion of equivalence between the models.

The Poincaré ball of radius is a -dimensional manifold equipped with the Riemannian metric which is conformal to the Euclidean metric (i.e. angle-preserving with respect to the Euclidean space Ganea et al. (2018a)) with the conformal factor , i.e. . The distance between two points is measured along a geodesic (i.e. shortest path between the points, see Figure 0(a)) and is given by:


where denotes the Euclidean norm and represents Möbius addition Ungar (2001); Ganea et al. (2018a):


with being the Euclidean inner product.

Each point has a tangent space , a -dimensional vector space, that is a local first-order approximation of the manifold around , which for the Poincaré ball is a -dimensional Euclidean space, i.e. . The exponential map allows one to move on the manifold from in the direction of a vector , tangential to at . The inverse is the logarithmic map . For the Poincaré ball, these are defined Ganea et al. (2018a) as:


Ganea et al. (2018a) show that matrix-vector multiplication in hyperbolic space (Möbius matrix-vector multiplication) can be obtained by projecting a point onto the tangent space at with , performing matrix multiplication by in the Euclidean tangent space, and projecting back to via the exponential map at , i.e.:


3 Related work

3.1 Hyperbolic geometry

Embedding hierarchical data in hyperbolic space has recently gained popularity in representation learning. Nickel and Kiela (2017) first embedded the transitive closure111Each node in a directed graph is connected not only to its children, but to every descendant, i.e. all nodes to which there exists a directed path from the starting node. of the WordNet noun hierarchy, in the Poincaré ball, showing that low-dimensional hyperbolic embeddings can significantly outperform higher-dimensional Euclidean embeddings in terms of both representation capacity and generalization ability. The same authors subsequently embedded hierarchical data in the Lorentz model of hyperbolic geometry Nickel and Kiela (2018).

Ganea et al. (2018a)

introduced Hyperbolic Neural Networks, connecting hyperbolic geometry with deep learning. They build on the definitions for Möbius addition, Möbius scalar multiplication, exponential and logarithmic maps of

Ungar (2001) to derive expressions for linear layers, bias translation and application of non-linearity in the Poincaré ball. Hyperbolic analogues of several other algorithms have been developed since, such as Poincaré Glove Tifrea et al. (2019) and Hyperbolic Attention Networks Gulcehre et al. (2019). More recently, Gu et al. (2019) note that data can be non-uniformly hierarchical and learn embeddings on a product manifold with components of different curvature: spherical, hyperbolic and Euclidean. To our knowledge, only Riemannian TransE Suzuki et al. (2019) seeks to embed multi-relational data in hyperbolic space, but the Riemannian translation method fails to outperform Euclidean baselines.

3.2 Link prediction for knowledge graphs

Bilinear models

typically represent relations as linear transformations acting on entity vectors. An early model, RESCAL

Nickel et al. (2011), optimizes a score function , containing the bilinear product between the subject entity embedding , a full rank relation matrix and the object entity embedding . RESCAL is prone to overfitting due to the number of parameters per relation being quadratic relative to the number per entity. DistMult Yang et al. (2015) is a special case of RESCAL with diagonal relation matrices, reducing parameters per relation and controlling overfitting. However, due to its symmetry, DistMult cannot model asymmetric relations. ComplEx Trouillon et al. (2016) extends DistMult to the complex domain, enabling asymmetry to be modelled. TuckER Balažević et al. (2019)

performs a Tucker decomposition of the tensor of triples, which enables information sharing between different relations via the core tensor. The authors show each of the linear models above to be a special case of TuckER.

Translational models regard a relation as a translation (or vector offset) from the subject to the object entity embeddings. These models include TransE Bordes et al. (2013) and its many successors, e.g. FTransE Feng et al. (2016), STransE Nguyen et al. (2016). The score function for translational models typically considers Euclidean distance between the translated subject entity embedding and the object entity embedding.

4 Multi-relational Poincaré embeddings

A set of entities can form different hierarchies under different relations. In the WordNet knowledge graph Miller (1995), the “hypernym”, “has_part” and “member_meronym” relations each induce different hierarchies over the same set of entities. For example, the noun chair is a parent node to different chair types (e.g. folding_chair, armchair) under the relation “hypernym” and both chair and its types are parent nodes to parts of a typical chair (e.g. backrest, leg) under the relation “has_part”. An ideal embedding model should capture all hierarchies simultaneously.

Score function  Bilinear multi-relational models measure similarity between the subject entity embedding (after relation-specific transformation) and an object entity embedding using the Euclidean inner product Nickel et al. (2011); Yang et al. (2015); Trouillon et al. (2016); Balažević et al. (2019). However, a clear correspondence to the Euclidean inner product does not exist in hyperbolic space Tifrea et al. (2019). The Euclidean inner product can be expressed as a function of Euclidean distances and norms, i.e. , . Noting this, in Poincaré Glove, Tifrea et al. (2019) absorb squared norms into biases and replace the Euclidean with the Poincaré distance to obtain the hyperbolic version of Glove Pennington et al. (2014).

Separately, it has recently been shown in the closely related field of word embeddings that statistics pertaining to analogies naturally contain linear structures Allen and Hospedales (2019), explaining why similar linear structure appears amongst word embeddings of Word2Vec Mikolov et al. (2013a, b); Levy and Goldberg (2014). Analogies are word relationships of the form “ is to as is to ”, such as “man is to woman as king is to queen”, and are in principle not restricted to two pairs (e.g. “…as brother is to sister”). It can be seen that analogies have much in common with relations in multi-relational graphs, as a difference between pairs of words (or entities) common to all pairs, e.g. if and hold, then we could say “ is to as is to ”. Of particular relevance is the demonstration that the common difference, i.e. relation, between the word pairs (e.g. (man, woman) and (king, queen)) manifests as a common vector offset Allen and Hospedales (2019)

, suggesting justifying the previously heuristic translational approach to modelling relations.

Inspired by these two ideas, we define the basis score function for multi-relational graph embedding:


where is a distance function, are the embeddings and scalar biases of the subject and object entities and respectively. is a diagonal relation matrix and a translation vector (i.e. vector offset) of relation . and represent the subject and object entity embeddings after applying the respective relation-specific transformations, a stretch by to and a translation by to .

Hyperbolic model  Taking the hyperbolic analogue of Equation 6, we define the score function for our Multi-Relational Poincaré (MuRP) model as:


where are hyperbolic embeddings of the subject and object entities and respectively, and is a hyperbolic translation vector of relation . The relation-adjusted subject entity embedding is obtained by Möbius matrix-vector multiplication: the original subject entity embedding is projected to the tangent space of the Poincaré ball at with , transformed by the diagonal relation matrix , and then projected back to the Poincaré ball by . The relation-adjusted object entity embedding is obtained by Möbius addition of the relation vector to the object entity embedding . Since the relation matrix is diagonal, the number of parameters of MuRP increases linearly with the number of entities and relations, making it scalable to large knowledge graphs. To obtain the predicted probability of a fact being true, we apply the logistic sigmoid to the score, i.e. .

To directly compare the properties of hyperbolic embeddings with the Euclidean, we implement the Euclidean version of Equation 6 with . We refer to this model as Multi-Relational Euclidean (MuRE) model.

Geometric intuition  We see from Equation 6 that the biases determine the radius of a hypersphere decision boundary centered at . Entities and are predicted to be related by if relation-adjusted falls within a hypershpere of radius (see Figure 0(b)). Since biases are subject and object entity-specific, each subject-object pair induces a different decision boundary. The relation-specific parameters and determine the position of the relation-adjusted embeddings, but the radius of the entity-specific decision boundary is independent of the relation. The score function in Equation 6 resembles the score functions of existing translational models Bordes et al. (2013); Feng et al. (2016); Nguyen et al. (2016), with the main difference being the entity-specific biases, which can be seen to change the geometry of the model. Rather than considering an entity as a point in space, each bias defines an entity-specific sphere of influence surrounding the center given by the embedding vector (see Figure 0(c)). The overlap between spheres measures relatedness between entities. We can thus think of each relation as moving the spheres of influence in space, so that only the spheres of subject and object entities that are connected under that relation overlap.

4.1 Training and Riemannian optimization

To train both models, we generate negative samples for each true triple , where we corrupt either the subject or the object entity with a randomly chosen entity from the set of all entities . Both models are trained to minimize the Bernoulli negative log-likelihood loss:


where is the predicted probability, is the binary label indicating whether a sample is positive or negative and is the number of training samples.

For fairness of comparison, we optimize the Euclidean model using stochastic gradient descent (SGD) and the hyperbolic model using

Riemannian stochastic gradient descent (RSGD) Bonnabel (2013). We note that the Riemannian equivalent of adaptive optimization methods has recently been developed Bécigneul and Ganea (2019), but leave replacing SGD and RSGD with their adaptive equivalent to future work. To compute the Riemannian gradient , the Euclidean gradient is multiplied by the inverse of the Poincaré metric tensor:


Instead of the Euclidean update step , a first order approximation of the true Riemannian update, we use the exponential map at to project the gradient onto its corresponding geodesic on the Poincaré ball and compute the Riemannian update:


where denotes the learning rate.

5 Experiments

To evaluate both Poincaré and Euclidean models, we first test their performance on the knowledge graph link prediction task using standard WN18RR and FB15k-237 datasets:

FB15k-237Toutanova et al. (2015) is a subset of Freebase, a database of real world facts, created from FB15k Bordes et al. (2013) by removing the inverse of many relations from validation and test sets to make the dataset more challenging. FB15k-237 contains 14,541 entities and 237 relations.

WN18RRDettmers et al. (2018) is a subset of WordNet, a hierarchical database of relations between words, created in the same way as FB15k-237 from WN18 Bordes et al. (2013). WN18RR contains 40,943 entities and 11 relations.

We evaluate each triple from the test set as in Bordes et al. (2013): we generate (where denotes number of entities in the dataset) evaluation triples for each test triple by keeping the subject entity and relation fixed and replacing the object entity with all possible entities and similarly keeping and fixed and varying . The scores obtained for each evaluation triple are ranked. All true triples are removed from the evaluation triples apart from the current test triple, i.e. the commonly used filtered setting Bordes et al. (2013)

. We evaluate our models using the evaluation metrics standard across the link prediction literature: mean reciprocal rank (MRR) and hits@

, . Mean reciprocal rank is the average of the inverse of a mean rank assigned to the true triple over all evaluation triples. Hits@ measures the percentage of times the true triple appears in the top ranked evaluation triples.

5.1 Implementation details

We implement both models in PyTorch and make our code publicly available.

222 We choose the learning rate from by MRR on the validation set and find that the best learning rate is for WN18RR and for FB15k-237 for both models. We initialize all embeddings near the origin where distances are small in hyperbolic space, similar to Nickel and Kiela (2017). We set the batch size to 128 and the number of negative samples to . In all experiments, we set the curvature of MuRP to , since preliminary experiments showed that any material change reduced performance.

5.2 Link prediction results

Table 1 shows the results obtained for both datasets. As expected, MuRE performs slightly better on the non-hierarchical FB15k-237 dataset, whereas MuRP outperforms on WN18RR which contains hierarchical relations (as shown in Section 5.3). Both MuRE and MuRP outperform previous state-of-the-art models on WN18RR on all metrics apart from hits@1, where MuRP obtains second best overall result. In fact, even at relatively low embedding dimensionality (), this is maintained, demonstrating the ability of hyperbolic models to succinctly represent multiple hierarchies. On FB15k-237, MuRE is outperformed only by TuckER Balažević et al. (2019), a model capable of multi-task learning between relations, which is highly advantageous on that dataset due to a large number of relations compared to WN18RR and thus relatively little data per relation in some cases.

WN18RR FB15k-237
MRR Hits@10 Hits@3 Hits@1 MRR Hits@10 Hits@3 Hits@1
TransE Bordes et al. (2013)
DistMult Yang et al. (2015)
ComplEx Trouillon et al. (2016)
Neural LP Yang et al. (2017)
MINERVA Das et al. (2018)
ConvE Dettmers et al. (2018)
ComplEx-N3 Lacroix et al. (2018)
M-Walk Shen et al. (2018)
TuckER Balažević et al. (2019)
RotatE Sun et al. (2019)
Table 1: Link prediction results on WN18RR and FB15k-237. Best results in bold and underlined, second best in bold. We report results for ComplEx-N3 Lacroix et al. (2018) at to ensure comparability with MuRE and MuRP in terms of the overall number of parameters (original paper reports results at ). The RotatE Sun et al. (2019) results are reported without their self-adversarial negative sampling (see Appendix H in the original paper) for fair comparison, given that it is not specific to that model only.

5.3 MuRE vs MuRP

Effect of dimensionality  We compare the MRR achieved by MuRE and MuRP on WN18RR for embeddings of different dimensionalities . As expected, the difference between MRRs is greatest at lower embedding dimensionality (see Figure 1(a)).

Convergence rate  Figure 1(b)

shows the MRR per epoch for MuRE and MuRP on the WN18RR training and validation sets, showing that MuRP also converges faster.

(a) MRR per embedding dimensionality.
(b) MRR covergence rate per epoch.
Figure 2: (a) MRR log-log graph for MuRE and MuRP for different embeddings sizes on WN18RR. (b) Comparison of the MRR convergence rate for MuRE and MuRP on the WN18RR training (dashed line) and validation (solid line) sets with embeddings of size and learning rate 50.

Performance per relation  Since not every relation in WN18RR induces a hierarchical structure over the entities, we report the Krackhardt hierarchy score (Khs) Krackhardt (2014) of the entity graph formed by each relation to obtain a measure of the hierarchy induced by each relation. The score is defined only for directed networks and measures the proportion of node pairs where there exists a directed path , but not (see Appendix A for further details). The score takes a value of one for all directed acyclic graphs, and zero for cycles and cliques. We also report the length of the longest path (i.e. tree depth) for hierarchical relations as both need to be considered. To gain insight as to which relations benefit most from embedding entities in hyperbolic space, we compare Hits@10 per relation of MuRE and MuRP for entity embeddings of low dimensionality (). From Table 2 we see that both models achieve comparable performance on non-hierarchical, symmetric relations with the Krackhardt hierarchy score 0, such as “similar_to” and “verb_group”, whereas MuRP generally outperforms MuRE on hierarchical relations. We also see that the difference between the performances of MuRE and MuRP is generally larger for relations that form deeper trees, fitting the hypothesis that hyperbolic space is of most benefit for modelling hierarchical relations.

Computing the Krackhardt hierarchy score for FB15k-237, we find that of the relations have , however, the average of longest path lengths over those relations is with only relations having paths longer than 2, meaning that the vast majority of relational sub-graphs consist of directed edges between pairs of nodes, rather than a tree.

Relation Name MuRE MuRP Khs Longest Path
Table 2: Comparison of hits@10 per relation for MuRE and MuRP on WN18RR for .

Biases vs embedding vector norms  We plot the norms versus the biases for MuRP and MuRE in Figure 3. This shows an overall correlation between embedding vector norm and bias (or radius of the sphere of influence) for both MuRE and MuRP. This makes sense intuitively, as the sphere of influence increases to “fill out the space” in regions that are less cluttered, i.e. further from the origin.

Figure 3: Scatter plot of norms vs biases for MuRP (left) and MuRE (right). Entities with larger embedding vector norms generally have larger biases for both MuRE and MuRP.
(a) MuRP
(b) MuRE
Figure 4: Learned 40-dimensional MuRP and MuRE embeddings for WN18RR relation “has_part”, projected to 2 dimensions. indicates the subject entity embedding, indicates true positive object entities predicted by the model, true negatives, false positives and false negatives. Lightly shaded blue and red points indicate object entity embeddings before applying the relation-specific transformation. The line in the left figure indicates the boundary of the Poincaré disk. The supposed false positives predicted by MuRP are actually true facts missing from the dataset (e.g. malaysia).

Spatial layout  In Figure 4, we show a 40-dimensional subject embedding for the word asia and a random subset of 1500 object embeddings for the hierarchical WN18RR relation “has_part”, projected to 2 dimensions so that distances and angles of object entity embeddings relative to the subject entity embedding are preserved (see Appendix B for details of the projection method). We show subject and object entity embeddings before and after relation-specific transformation. For both MuRE and MuRP, we see that applying the relation-specific transformation separates true object entities from false ones. However, in the Poincaré model, where distances increase further from the origin, embeddings are moved further towards the boundary of the disk, where, loosely speaking, there is more space to separate and therefore distinguish them.

Quality of learned embeddings  Here we analyze the false positives and false negatives predicted by both models. MuRP predicts 15 false positives and 0 false negatives, whereas MuRE predicts only 2 false positives and 1 false negative, so seemingly performs better. However, inspecting the false positives predicted by MuRP, we find they are all countries on the Asian continent (e.g. sri_lanka, palestine, malaysia, sakartvelo, thailand), so are actually correct, but missing from the dataset. MuRE’s predicted false positives (philippines and singapore) are both also correct but missing, whereas the false negative (bahrain) is indeed falsely predicted. We note that this suggests current evaluation methods may be unreliable.

6 Conclusion and future work

We introduce a novel, theoretically inspired, translational method for embedding multi-relational graph data in the Poincaré ball model of hyperbolic geometry. Our multi-relational Poincaré model MuRP learns relation-specific parameters to transform entity embeddings by Möbius matrix-vector multiplication and Möbius addition. We show that MuRP outperforms its Euclidean counterpart MuRE and existing models on the link prediction task on the hierarchical WN18RR knowledge graph dataset, and requires far lower dimensionality to achieve comparable performance to its Euclidean analogue. We analyze various properties of the Poincaré model compared to its Euclidean analogue and provide insight through a visualization of the learned embeddings.

Future work may include investigating the impact of recently introduced Riemannian adaptive optimization methods compared to Riemannian SGD. Also, given not all relations in a knowledge graph are hierarchical, we may look into combining the Euclidean and hyperbolic models to produce mixed-curvature embeddings that best fit the curvature of the data.


We thank Rik Sarkar, Ivan Titov, Jonathan Mallinson and Eryk Kopczyński for helpful comments on this manuscript. Ivana Balažević and Carl Allen were supported by the Centre for Doctoral Training in Data Science, funded by EPSRC (grant EP/L016427/1) and the University of Edinburgh.


  • Allen and Hospedales [2019] Carl Allen and Timothy Hospedales. Analogies Explained: Towards Understanding Word Embeddings. In International Conference on Machine Learning, 2019.
  • Balažević et al. [2019] Ivana Balažević, Carl Allen, and Timothy M Hospedales. TuckER: Tensor Factorization for Knowledge Graph Completion. arXiv preprint arXiv:1901.09590, 2019.
  • Bécigneul and Ganea [2019] Gary Bécigneul and Octavian-Eugen Ganea. Riemannian Adaptive Optimization Methods. In International Conference on Learning Representation, 2019.
  • Bonnabel [2013] Silvere Bonnabel. Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control, 2013.
  • Bordes et al. [2013] Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems, 2013.
  • Cannon et al. [1997] James W Cannon, William J Floyd, Richard Kenyon, Walter R Parry, et al. Hyperbolic Geometry. Flavors of Geometry, 31:59–115, 1997.
  • Das et al. [2018] Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, and Andrew McCallum.

    Go for a Walk and Arrive at the Answer: Reasoning over Paths in Knowledge Bases Using Reinforcement Learning.

    In International Conference on Learning Representations, 2018.
  • De Sa et al. [2018] Christopher De Sa, Albert Gu, Christopher Ré, and Frederic Sala. Representation Tradeoffs for Hyperbolic Embeddings. In International Conference on Machine Learning, 2018.
  • Dettmers et al. [2018] Tim Dettmers, Pasquale Minervini, Pontus Stenetorp, and Sebastian Riedel. Convolutional 2D Knowledge Graph Embeddings. In

    Association for the Advancement of Artificial Intelligence

    , 2018.
  • Feng et al. [2016] Jun Feng, Minlie Huang, Mingdong Wang, Mantong Zhou, Yu Hao, and Xiaoyan Zhu. Knowledge Graph Embedding by Flexible Translation. In KR, pages 557–560, 2016.
  • Ganea et al. [2018a] Octavian Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic Neural Networks. In Advances in Neural Information Processing Systems, 2018a.
  • Ganea et al. [2018b] Octavian-Eugen Ganea, Gary Bécigneul, and Thomas Hofmann. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. In International Conference on Machine Learning, 2018b.
  • Gu et al. [2019] Albert Gu, Frederic Sala, Beliz Gunel, and Christopher Ré. Learning Mixed-Curvature Representations in Product Spaces. In International Conference on Learning Representations, 2019.
  • Gulcehre et al. [2019] Caglar Gulcehre, Misha Denil, Mateusz Malinowski, Ali Razavi, Razvan Pascanu, Karl Moritz Hermann, Peter Battaglia, Victor Bapst, David Raposo, Adam Santoro, and Nando de Freitas. Hyperbolic Attention Networks. In International Conference on Learning Representations, 2019.
  • Krackhardt [2014] David Krackhardt. Graph Theoretical Dimensions of Informal Organizations. In Computational organization theory. Psychology Press, 2014.
  • Lacroix et al. [2018] Timothée Lacroix, Nicolas Usunier, and Guillaume Obozinski. Canonical Tensor Decomposition for Knowledge Base Completion. In International Conference on Machine Learning, 2018.
  • Levy and Goldberg [2014] Omer Levy and Yoav Goldberg. Linguistic Regularities in Sparse and Explicit Word Representations. In Conference on Computational Natural Language Learning, 2014.
  • Mikolov et al. [2013a] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, 2013a.
  • Mikolov et al. [2013b] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013b.
  • Miller [1995] George A Miller. WordNet: a Lexical Database for English. Communications of the ACM, 1995.
  • Nguyen et al. [2016] Dat Quoc Nguyen, Kairit Sirts, Lizhen Qu, and Mark Johnson. STransE: a Novel Embedding Model of Entities and Relationships in Knowledge Bases. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016.
  • Nickel et al. [2011] Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. A Three-Way Model for Collective Learning on Multi-Relational Data. In International Conference on Machine Learning, 2011.
  • Nickel and Kiela [2017] Maximillian Nickel and Douwe Kiela. Poincaré Embeddings For Learning Hierarchical Representations. In Advances in Neural Information Processing Systems, 2017.
  • Nickel and Kiela [2018] Maximillian Nickel and Douwe Kiela. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. In International Conference on Machine Learning, 2018.
  • Pennington et al. [2014] Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. In

    Empirical Methods in Natural Language Processing

    , 2014.
  • Sarkar [2011] Rik Sarkar. Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane. In International Symposium on Graph Drawing, 2011.
  • Shen et al. [2018] Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, and Jianfeng Gao. M-Walk: Learning to Walk over Graphs using Monte Carlo Tree Search. In Advances in Neural Information Processing Systems, 2018.
  • Sun et al. [2019] Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In International Conference on Learning Representations, 2019.
  • Suzuki et al. [2019] Atsushi Suzuki, Yosuke Enokida, and Kenji Yamanishi. Riemannian TransE: Multi-relational Graph Embedding in Non-Euclidean Space, 2019. URL
  • Tifrea et al. [2019] Alexandru Tifrea, Gary Bécigneul, and Octavian-Eugen Ganea. Poincaré GloVe: Hyperbolic Word Embeddings. In International Conference on Learning Representations, 2019.
  • Toutanova et al. [2015] Kristina Toutanova, Danqi Chen, Patrick Pantel, Hoifung Poon, Pallavi Choudhury, and Michael Gamon. Representing Text for Joint Embedding of Text and Knowledge Bases. In Empirical Methods in Natural Language Processing, 2015.
  • Trouillon et al. [2016] Théo Trouillon, Johannes Welbl, Sebastian Riedel, Éric Gaussier, and Guillaume Bouchard. Complex Embeddings for Simple Link Prediction. In International Conference on Machine Learning, 2016.
  • Ungar [2001] Abraham A Ungar. Hyperbolic Trigonometry and its Application in the Poincaré Ball Model of Hyperbolic Geometry. Computers & Mathematics with Applications, 41(1-2):135–147, 2001.
  • Yang et al. [2015] Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In International Conference on Learning Representations, 2015.
  • Yang et al. [2017] Fan Yang, Zhilin Yang, and William W Cohen. Differentiable Learning of Logical Rules for Knowledge Base Reasoning. In Advances in Neural Information Processing Systems, 2017.

Appendix A Krackhardt hierarchy score

Let be the binary reachability matrix of a directed graph with nodes, with if there exists a directed path from node to node and otherwise. The Krackhardt hierarchy score of Krackhardt [2014] is defined as:


Appendix B Dimensionality reduction method

To project high-dimensional embeddings to 2 dimensions for visualization purposes, we use the following method to compute dimensions for projection of entity :

  • , where is the original high-dimensional subject entity embedding and is the number of object entity embeddings.

  • .

This projects the reference subject entity embedding onto the -axis () and all object entity embeddings are positioned relative to it, according to their component aligned with the subject entity and their “remaining” component .