Hyperbolic Disk Embeddings for Directed Acyclic Graphs

02/12/2019 ∙ by Ryota Suzuki, et al. ∙ 0

Obtaining continuous representations of structural data such as directed acyclic graphs (DAGs) has gained attention in machine learning and artificial intelligence. However, embedding complex DAGs in which both ancestors and descendants of nodes are exponentially increasing is difficult. Tackling in this problem, we develop Disk Embeddings, which is a framework for embedding DAGs into quasi-metric spaces. Existing state-of-the-art methods, Order Embeddings and Hyperbolic Entailment Cones, are instances of Disk Embedding in Euclidean space and spheres respectively. Furthermore, we propose a novel method Hyperbolic Disk Embeddings to handle exponential growth of relations. The results of our experiments show that our Disk Embedding models outperform existing methods especially in complex DAGs other than trees.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Methods for obtaining appropriate feature representations of a symbolic objects are currently a major concern in machine learning and artificial intelligence. Studies exploiting embedding for highly diverse domains or tasks have recently been reported for graphs Grover & Leskovec (2016); Goyal & Ferrara (2018); Cai et al. (2018), knowledge bases Nickel et al. (2011); Bordes et al. (2013); Wang et al. (2017), and social networks Hoff et al. (2002); Cui et al. (2018).

In particular, studies aiming to embed linguistic instances into geometric spaces have led to substantial innovations in natural language processing (NLP)

Mikolov et al. (2013); Pennington et al. (2014); Kiros et al. (2015). Currently, word embedding methods are indispensable for NLP tasks, because it has been shown that the use of pre-learned word embeddings for model initialization leads to improvements in various of tasks Kim (2014)

. In these methods, symbolic objects are embedded in low-dimensional Euclidean vector spaces such that the symmetric distances between semantically similar concepts are small.

In this paper, we focus on modeling asymmetric transitive relations and directed acyclic graphs (DAGs). Hierarchies are well-studied asymmetric transitive relations that can be expressed using a tree structure. The tree structure has a single root node, and the number of children increases exponentially according to the distance from the root. Asymmetric relations that do not exhibit such properties cannot be expressed as a tree structure, but they can be represented as DAGs, which are used extensively to model dependencies between objects and information flow. For example, in genealogy, a family tree can be interpreted as a DAG, where an edge represents a parent child relationship and a node denotes a family member Kirkpatrick (2011). Similarly, the commit objects of a distributed revision system (e.g., Git) also form a DAG111https://git-scm.com/docs/user-manual. In citation networks, citation graphs can be regarded as DAGs, with a document as a node and the citation relationship between documents as an edge Price (1965). In causality, DAGs have been used to control confounding in epidemiological studies Robins (1987); Merchant & Pitiphat (2002) and as a means to study the formal understanding of causation Spirtes et al. (2000); Pearl (2003); Dawid (2010).

Recently, a few methods for embedding asymmetric transitive relations have been proposed. Nickel & Kiela reported pioneering research on embedding symbolic objects in hyperbolic spaces rather than Euclidean spaces and proposed Poincaré Embedding Nickel & Kiela (2017, 2018). This approach is based on the fact that hyperbolic spaces can embed any weighted tree while primarily preserving their metric Gromov (1987); Bowditch (2005); Sarkar (2011). Vendrov et al. developed Order Embedding, which attempts to embed the partially ordered structure of a hierarchy in Euclidean spaces Vendrov et al. (2016). This method embeds relations among symbolic objects in reserved product orders in Euclidean spaces, which is a type of partial ordering. Objects are represented as inclusive relations of orthants in Euclidean spaces. Furthermore, inheriting Poincaré Embeddings and Order Embeddings, Ganea & Hofmann proposed Hyperbolic Entailment Cones, which embed symbolic objects as convex cones in hyperbolic spaces Ganea et al. (2018). The authors used polar coordinates of the Poincaré ball and designed their cones such that the number of descendants increases exponentially as one moves going away from the origin. Although the researchers reported that their approach can handle any DAG, their method is not suitable for embedding complex graphs in which the number of both ancestors and descendants grows exponentially, and their experiments were conducted using only hierarchical graphs. Recently, Dong et al. suggested a concept of embedding hierarchy to -disks (balls) in the Euclidean space Dong et al. (2019). However, their method cannot be applied to generic DAGs, as it is a bottom up algorithm that can only be applied to trees, which is previously known to be expressed by disks in a Euclidean plane Stapleton et al. (2011).

In this paper, we propose Disk Embedding, a general framework for embedding DAGs in metric spaces, or more generally, quasi-metric spaces. Disk Embedding can be considered as a generalization of the aforementioned state-of-the-art methods: Order Embedding Vendrov et al. (2016) and Hyperbolic Entailment Cones Ganea et al. (2018) are equivalent to Disk Embedding in Euclidean spaces and spheres, respectively. Moreover, extending this framework to a hyperbolic geometry, we propose Hyperbolic Disk Embedding

. Because this method maintains the translational symmetry of hyperbolic spaces and uses exponential growth as a function of the radius, it can process data with exponential increase in both ancestors and descendants. Furthermore, we construct a learning theory for general Disk Embedding using the Riemannian stochastic gradient descent (RSGD) and derive closed-form expressions of the RSGD for frequently used geometries, including Euclidean, spherical, and hyperbolic geometries.

Experimentally, we demonstrate that our Disk Embedding models outperform all of the baseline methods, especially for DAGs other than trees. We used three methods to investigate the efficiency of our approach: Poincaré Embeddings Nickel & Kiela (2017), Order Embeddings Vendrov et al. (2016) and Hyperbolic Entailment Cones Ganea & Hofmann (2017).

Our contributions are as follows:

  • To embed DAGs in metric spaces, we propose a general framework, Disk Embedding, and systemize its learning method.

  • We prove that the existing state-of-the-art methods can be interpreted as special cases of Disk Embedding.

  • We propose Hyperbolic Disk Embedding by extending our Disk Embedding to graphs with bi-directional exponential growth.

  • Through experiments, we confirm that Disk Embedding models outperforms the existing methods, particularly for general DAGs.

2 Mathematical preliminaries

2.1 Partially ordered sets

As discussed by Nickel & Kiela (2018), a concept hierarchy forms a partially ordered set (poset), which is a set equipped with reflexive, transitive, and antisymmetric binary relations . We extend this idea for application to a general DAG. Considering the reachability from one node to another in a DAG, we obtain a partial ordering.

Partial ordering is essentially equivalent to the inclusive relation between certain subsets called the lower cone (or lower closure), .

Proposition 1

Let be a poset. Then, holds if and only if .

Embedding DAGs in continuous space can be interpreted as mapping DAG nodes to lower cones with a certain volume that contains its descendants.

Order isomorphism. A pair of poset is order isomorphic if there exists a bijection that preserves the ordering, i.e., . We further consider that two embedded expressions of a DAGs are equivalent if they are order isomorphic.

2.2 Metric and quasi-metric spaces

A metric space is a set in which a metric function satisfying the following four properties is defined: non-negativity: , identity of indiscernibles: , subadditivity: , symmetry: .

For generalization, we consider a quasi-metric as a natural extension of a metric that satisfies the above properties except for the symmetry Goubault-Larrecq (2013a, b). In a Euclidean space , we can obtain various quasi-metrics as follows:

Proposition 2 (Polyhedral quasi-metric)

Let be a finite set of vectors in such that . Let

(1)

Then is a quasi-metric in .

The assumption in Prop. 2 is equivalent to the condition that convex polytope spanned by contains the origin in its interior. The shape of disk for fixed and is a polytope whose facets are perpendicular to the vectors of . For instance, let be a standard basis and ; then, becomes a uniform distance , whose disks form hypercubes.

2.3 Formal disks

Let be a quasi-metric space and consider a closed disk , which is a closed set in the sense of topology induced by . In considering a Euclidean space with ordinary Euclidean distances, the inclusive relation of two closed disks is characterized as follows:

(2)

Because this relation is a set inclusion relation, it forms a poset, as discussed above.

In a general metric space, we introduce formal disks (balls)222 This is generally called a formal ball, but we call it a formal disk in this paper, even in high-dimensional spaces for clarity. , which were first introduced by Weihrauch & Schreiber and studied as a computational model of metric space Blanck (1997); Edalat & Heckmann (1998); Heckmann (1999). Formal disks are also naturally extended to quasi-metric spaces Goubault-Larrecq (2013b). Let be a quasi-metric space, where a formal disk is defined as a pair of center and radius . The binary relation of formal disks holds such that

(3)

is a partial ordering Edalat & Heckmann (1998). When defining a partial order with (3), it is straightforward to determine that the radii of formal disks need not be positive.

We define as a collection of generalized formal disks Tsuiki & Hattori ; Goubault-Larrecq (2013a) that are allowed to have negative radius. Lower cones of formal disks are shown in Figure 1. The cut of the lower cone at is a closed disk in X. The negative radius of generalized formal disk can be regarded as the radius of the cut of the upper cone.

The following properties hold for generalized formal disks.

Translational symmetry.

For all ,

(4)
Reversibility.

If is symmetric (i.e., is metric),

(5)

These properties reflect the reversibility of the graph and symmetry between generations, which are important when embedding DAGs in posets of formal disks.

Figure 1: Lower cones (solid blue lines) of generalized formal disks in . holds and has a negative radius .

2.4 Riemannian manifold

A Riemannian manifold is a manifold with a collection of inner products , called a Riemann metric. Let be a smooth curve, where a length of is calculated by:

The infimum of the curve length from to , which becomes metric (and, consequently, quasi-metric) at .

Geodesic: If Riemannian manifold is complete, for every two points and , there exists a curve that connects and with a minimal length, which is called a geodesic. Geodesics are Riemannian analogs of straight lines in Euclidean spaces.

Exponential map: An exponential map is defined as , where is a unique geodesic satisfying with an initial tangent vector . The map can also be interpreted as the destination reached after a unit time when traveling along the geodesic from at an initial velocity of . Therefore, holds, where .

2.5 Hyperbolic geometry

A hyperbolic geometry is a uniquely characterized complete, simply connected geometry with constant negative curvature. Hyperbolic geometries are described by several models that are identical up to isometry. The frequently used models in machine learning include the Poincaré ball model Nickel & Kiela (2017); Ganea et al. (2018) and the Lorentz model Nickel & Kiela (2018).

Poincaré ball model

is a Riemannian manifold defined on the interior of an -dimensional unit ball, where distances are calculated as

(6)

and its geodesics are arcs that perpendicularly intersect the boundary sphere.

Lorentz model

is defined on -dimensional hyperboloid embedded in -dimensional Euclidean space, where distance is calculated as

(7)

where .

Translational symmetry

Hyperbolic geometry is symmetric with respect to translations, i.e., geometric properties including distances do not change even if all points are translated in the same time. In the Poincaré ball model, a translation that maps the origin to is obtained as

(8)

3 Disk Embedding models

In this section, we introduce Disk Embeddings

as a general platform for embedding DAGs in quasi-metric spaces. We first define Disk Embeddings and evaluate its representability. Second, we introduce Hyperbolic Disk Embeddings and discuss its qualitative nature. Third, we derive loss functions and an optimization method to establish an expression for Disk Embeddings and finally obtain closed-form expressions for some commonly used geometries.

3.1 Disk Embeddings and its representability

Let be a set of entities with partial ordering relations , and let be a quasi-metric space. Disk Embeddings are defined as a map such that if and only if .

We can use various type of quasi-metric spaces , where can represent spherical or hyperbolic spaces, and can be a unique quasi-metric. Assuming that is an ordinary 2D Euclidean plane and that formal disks have positive radii, Disk Embeddings become an inclusive relation of closed disks, which is equivalent to an Euler diagram with circles, except that only inclusions are meaningful and intersections do not make sense (Figure 2). Stapleton et al. studied the drawability of 2D Euler diagrams and demonstrated that a certain class of diagrams, termed pierced diagrams, can be drawn with circles Stapleton et al. (2011). This result suggests that our Disk Embeddings will be effective for certain classes of problems, even for only three dimensions (two for centers and one for radii). As Dong et al. mentioned, the tree structure can be obviously embedded in -disks because it is the simplest pierced diagram, in which none of circles intersect each other Dong et al. (2019). In higher Euclidean spaces, Disk Embeddings have a greater representability than 2D because all graphs that are representable in 2D can also be embedded in higher dimensions via a natural injection s.t. .

3.2 Hyperbolic Disk Embeddings

We now introduce Disk Embeddings for a hyperbolic geometry. In a hyperbolic geometry of two or more dimensions, it is known that the area of the disk increases as an the exponential function of the radius. Thus, the lower cone section, shown in Figure 1, increases exponentially as the generation moves from parent to child in the DAG. In addition, it should be noted that both the inner and outer sides of the lower cones becomes wider. For instance, when , the region also increases exponentially. This property of the Hyperbolic Disk Embeddings is suitable for embedding graphs in which the number of descendants increases rapidly.

Considering the reversibility of Disk Embeddings (5), the above property is establishe for not only the descendant but also the ancestor direction. In other words, not only lower cones but also upper cones show an exponential extension. In addition, considering translational symmetry in hyperbolic spaces (8) and along radial axis (4), the same result holds for lower and upper cones starting from any node in the graph. Thus, in Hyperbolic Disk Embeddings, complex threads that repeatedly intersect and separate from each other in a complex DAG can be accurately expressed.

Figure 2: Disk Embeddings in 2D Euclidean space with non-negative radii are equivalent to 2D Euler diagrams of circles.

3.3 Loss functions

We define a protrusion parameter between two formal disks as

(9)

Because implies , an appropriate representation can be obtained by learning such that is small for positive examples and large for negative examples.

Although various loss functions can be used, we adopt the following margin loss in this study, similar to that adopted by Vendrov et al. (2016); Ganea et al. (2018):

(10)

where and is an arbitrary energy function such that if and only if . Naturally, we can simply set

(11)

in which case, the loss function corresponds to the loss functions used by Vendrov et al. (2016) and Ganea et al. (2018), except that the gradient of them vanishes at in negative samples (see Section 3.4).

3.4 Riemannian optimization

Given that is a general Riemannian manifold, we must account for the Riemannian metric when optimizing the loss function. In this case, we utilize the RSGD method Bonnabel (2013), which is similar to the ordinary SGD except that the Riemannian gradient is used instead of the gradient and an exponential map is used for updates.

To execute the RSGD on , we must determine the Riemannian metric in the product space and calculate the corresponding Riemannian gradient. To maintain translational symmetry Eq.(4) along the radial axis, the Riemannian gradient on should have the following form

(12)

where is theRiemannian gradient operator on and and are positive constants. Furthermore, multiplying and by a common positive corresponds to changing the learning ratio ; thus, we can set without a loss of generality. Then the update formula of RSGD for parameters is given as follows:

(13)

Although this is not always the case for general Disk Embeddings, if the quasi-metric for formal disks is equal to the distance induced from the Riemannian metric in , the gradient of has special form:

(14)

where and are the initial and final tangent vectors of the unit speed geodesic connecting from to . In addition, for frequently used geometries, we here present closed form expressions as follows.

Euclidean geometry.

The RSGD in Euclidean spaces is equivalent to the standard SGD, where the gradient of the distance is and the exponential map is .

Spherical geometry.

The RSGD on an -sphere with spherical distance is conducted using the following formulae,

Hyperbolic geometry.

We use the Lorentz model Nickel & Kiela (2018) for Hyperbolic Disk Embeddings because the RSGD in the Lorentz model involves considerably simpler formulae and has a greater numerical stability compared to the Poincaré ball model. In the Lorentz model, the gradient of the distance and the exponential map are computed as follows,

4 Equivalence of Disk Embedding models


Figure 3: Equivalence of Disk Embeddings and existing methods. (a) Order Embeddings Vendrov et al. (2016) and (b) Hyperbolic Entailment Cones Ganea et al. (2018).

In this section, we illustrate the relationship between our Disk Embeddings and the current state-of-the-art methods. We demonstrate that the embedding methods for DAGs in metric spaces are equivalent to Disk Embeddings by projecting lower cones into appropriate subspaces.

4.1 Order Embeddings Vendrov et al. (2016)

Vendrov et al. proposed Order Embeddings, in which a partial ordering relation is embedded in reversed product order on ,

(15)

In Order Embeddings, the shape of the lower cone is orthant.

As shown in Figure 3(a

), we consider a projection onto a hyperplane

that is isometric to . The shape of the cross-section of the lower cone with is -simplex333 1-simplex is a line segment, 2-simplex is a regular triangle, 3-simplex is a regular tetrahedron, and so on. Thus, we can consider the relation (15) as an inclusive relation between corresponding simplexes. By using a polyhedral quasi-metric (1), we show that this is equivalent to Disk Embeddings in with an additional constraint , which forces all entities to be descendants of the origin.

Theorem 1

Order Embeddings (15) is order isomorphic to Euclidean Disk Embeddings with quasi-metric via a smooth map ,

(16)

where is a projection matrix onto , and .

The energy function used for their loss function is

(17)

which cannot be directly expressed using defined in (9), but can be well approximated by a lower bound.

Theorem 2

Energy function (17) has a lower bound:

(18)

and the equality holds if and only if .

In contrast to (11), (18) has a quadratic form, which may cause exponential decay instead of crossing zero and a vanishing gradient in even for negative samples, making the optimization inefficient.

4.2 Hyperbolic Entailment Cones Ganea et al. (2018)

Ganea et al. (2018) developed a method for embedding cones extending in the radial direction of the Poincaré ball model. The embedding relation is expressed as follows

(19)

where is the angle and is the angle between the axis and the generatrix.

The authors focused on the polar coordinates of the Poincaré ball, in which rotational symmetry around the origin is assumed and the position in the hierarchical structure is mapped to the radial component. Thus, they implicitly assumed a non-trivial separation of the hyperbolic space into . To illustrate this point, we consider projections of entailment cones onto the border of a Poincaré ball . As shown in Figure 3(b), the projections of the entailment cones form disks in an -sphere, and relation (19) is represented as the inclusion of corresponding disks.

Theorem 3

Hyperbolic Cones (19) are order isomorphic to Disk Embeddings on -sphere via a smooth map: ,

(20)

where .

Their energy function is given as

(21)

and is also represented in the format of Disk Embeddings, by using only distances between centers and the radii of formal disks.

Theorem 4

Energy function (21) has the following form:

(22)

where

(23)

and .

As can be easily seen from (22), the energy is linearly approximated by:

(24)

around for fixed . Equation (24) is similar to (11) except that the coefficient and gradient vanishing at as observed for (17).

4.3 Vector Embedding models

Ordinary embedding models based on similarity, e.g, word2vec Mikolov et al. (2013) and Poincaré Embeddings Nickel & Kiela (2017, 2018), can be seen as a application of Disk Embeddings in which radius information is negelected.

Nickel & Kiela argued that when embedding in hyperbolic spaces, general concepts can be obtained closer to the origin by learning with loss function distances between points. However, because a hyperbolic space has translational symmetry, there are no special point and any point can be the origin. Thus, simultaneously translating all of the points does not change the loss function, and which node is closer to the origin is determined only by the initial vectors. Furthermore, an approach in which distances from the origin are interpreted as levels in a hierarchy is not suitable for complex DAGs in which both ancestors and descendants grow exponentially, with no single root.

5 Experiments

In this section we evaluate Disk Embedding models for various metric spaces including Euclidean, spherical and hyperbolic geometries.

Dataset Nodes Edges Ancestors Descendants
WordNet 82,115 743,086 9.2 82114.0
Table 1: Dataset statistics. The average number of ancestors (descendants) of leaf (root) nodes are shown.

Embedding Dimension Embedding Dimension
Percentage of Transitive Closure (Non-basic) Edges in Training
0% 10% 25% 50% 0% 10% 25% 50%
WordNet nouns
Our Euclidean Disk Embeddings 35.6% 38.9% 42.5% 45.1% 45.6% 54.0% 65.8% 72.0%
Our Hyperbolic Disk Embeddings 32.9% 69.1% 81.3% 83.1% 36.5% 79.7% 90.5% 94.2%
Our Spherical Disk Embeddings 37.5% 84.8% 90.5% 93.4% 42.0% 86.4% 91.5% 93.9%
Hyperblic Entailment Cones 29.2% 80.0% 87.1% 92.8% 32.4% 84.9% 90.8% 93.8%
Order Embeddings 34.4% 70.6% 75.9% 82.1% 43.0% 69.7% 79.4% 84.1%
Poincaré Embeddings 28.1% 69.4% 78.3% 83.9% 29.0% 71.5% 82.1% 85.4%
WordNet nouns reversed
Our Euclidean Disk Embeddings 35.4% 38.7% 42.3% 44.6% 46.6% 55.9% 67.3% 70.6%
Our Hyperbolic Disk Embeddings 30.8% 49.0% 66.8% 78.5% 32.1% 53.7% 79.1% 88.2%
Our Spherical Disk Embeddings 34.8% 59.0% 76.8% 84.9% 38.0% 60.6% 83.1% 90.1%
Hyperblic Entailment Cones 17.3% 57.5% 71.8% 75.7% 20.5% 61.9% 73.1% 75.8%
Order Embeddings 32.9% 33.8% 34.8% 35.8% 34.7% 36.7% 38.8% 41.4%
Poincaré Embeddings 26.0% 48.4% 48.8% 51.4% 27.4% 49.7% 50.9% 51.9%
Table 2: Test F1 results for various models. Hyperbolic Entailment Cones is proposed by Ganea et al. (2018), Order Embeddings is proposed by Vendrov et al. (2016) and Poincaré Embeddings is proposed by Nickel & Kiela (2017).

5.1 Datasets

For evaluation we use the following network datasets.

WordNet ®

Miller (1995)444 https://wordnet.princeton.edu/ A large lexical database that provides hypernymy relations. In our experiments, we used a noun closure for evaluating hierarchical data.

The statistics for the dataset are shown in Table 1. The WordNet noun dataset is an example of a tree-like hierarchical network, characterized by a highly limited number of ancestors, whereas the number of descendants is considerably large. We also used data obtained by inverting the relations in WordNet noun dataset. Because the reverse of a DAG is a DAG, this data is considered to be an example of non-tree structural DAG.

5.2 Training and evaluation details

We conducted learning by using pairs of nodes connected by edges of these graph data as positive examples such that . Because these data only have positive pairs, we randomly sampled negative pairs for each iteration of RSGD.

For evaluating the learned expression, we use an F1 score for a binary classification of whether a randomly selected pair satisfies the transitive relation , in other words, to assess whether there exists a directed path from to in the DAG.

5.3 Baselines

We also evaluated Poincaré Embeddings Nickel & Kiela (2017), Order Embeddings Vendrov et al. (2016) and Hyperbolic Entailment Cones Ganea et al. (2018) as baseline methods. For these baseline methods, we used the implementation repoted by Ganea et al.555 https://github.com/dalab/hyperbolic_cones

. In addition, experimental conditions such as hyperparameters are designed to be nearly similar to those of the experiments conducted by

Ganea et al.. Considering that Hyperbolic Cones Ganea et al. (2018) use Poincaré Embeddings Nickel & Kiela (2017) for pretraining, we also apply this approch to Spherical Disk Embeddings (which is equivalent to Hyperbolic Cones as shown in Theorem 3) to make a fair comparison. Although Poincaré Embeddings Nickel & Kiela (2017)

is a method used for learning symmetric relations based on similarity, it can also be used to estimate asymmetric relations by using a heuristic score,

(25)

The parameter is determined by maximizing the F1 score for validation data which are sampled separately from the test data.

5.4 Results and discussion

Table 2 shows the F1 score on each dataset. As shown in Section 3.4, we proved equivalence between our Spherical Disk Embeddings and Hyperbolic Entailment Cones. It is observed that our Spherical Disk Embeddings reaches almost the same result of Hyperbolic Entailment Cones with WordNet nouns. The slight improvement of our model can be explained the change of the loss function. WordNet nouns reversed is generated from WordNet by reversing directions of all edges, nevertheless, it is an example of DAGs. In this data, our Disk Embeddings models obviously outperformed other existing methods because our methods maintain reversibility (Eq. (5)) while existing methods implicitly assume hierarchical structure.

6 Conclusion

We introduced Disk Embeddings, which is a new framework for embedding DAGs in quasi-metric spaces to generalize the state-of-the-art methods. Furthermore, extending this framework to a hyperbolic geometry, we propose Hyperbolic Disk Embedding. Experimentally we demonstrate that our methods ourperform all of the baseline methods, especially for DAGs other than tree.

For future work, large-scale experiments for complex DAGs such such as citation networks is desired, in which both ancestors and descendants increase rapidly, and exponential nature of Hyperbolic Disk Embedding will demonstrate its core.

For reproducibility, our source code for the experiments are publicly available online666 GitHub URL will be here in camera ready version. For review, see the anonymized version in supplemental materials. . The datasets we used are also available online. See Section 5.1.

References

  • Blanck (1997) Blanck, J. Domain Representability of Metric Spaces. Annals of Pure and Applied Logic, 83(3):225–247, 1997. ISSN 0168-0072. doi: 10.1016/S0168-0072(96)00017-6.
  • Bonnabel (2013) Bonnabel, S. Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
  • Bordes et al. (2013) Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., and Yakhnenko, O. Translating Embeddings for Modeling Multi-Relational Data. In Advances in Neural Information Processing Systems 26 (NIPS), pp. 2787–2795, 2013.
  • Bowditch (2005) Bowditch, B. H. A Course on Geometric Group Theory, 2005.
  • Cai et al. (2018) Cai, H., Zheng, V. W., and Chang, K. A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications. IEEE Transactions on Knowledge and Data Engineering, 2018.
  • Cui et al. (2018) Cui, P., Wang, X., Pei, J., and Zhu, W. A Survey on Network Embedding. IEEE Transactions on Knowledge and Data Engineering, 2018.
  • Dawid (2010) Dawid, A. P. Beware of the DAG! In Proceedings of Workshop on Causality: Objectives and Assessment at NIPS 2008, volume 6, pp. 59–86, 2010.
  • Dong et al. (2019) Dong, T., Cremers, O., Jin, H., Li, J., Bauckhage, C., Cremers, A. B., Speicher, D., and Zimmermann, J. Encoding Category Trees Into Word-Embeddings using Geometric Approach. In Proceedings of the Seventh International Conference on Learning Representations (ICLR), to appear, 2019. URL https://openreview.net/forum?id=rJlWOj0qF7.
  • Edalat & Heckmann (1998) Edalat, A. and Heckmann, R. A Computational Model for Metric Spaces. Theoretical Computer Science, 193(1):53–73, 1998. ISSN 0304-3975. doi: 10.1016/S0304-3975(96)00243-5.
  • Ganea & Hofmann (2017) Ganea, O.-E. and Hofmann, T. Deep Joint Entity Disambiguation with Local Neural Attention. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 2619–2629. Association for Computational Linguistics, 2017.
  • Ganea et al. (2018) Ganea, O.-E., Bécigneul, G., and Hofmann, T. Hyperbolic Entailment Cones for Learning Hierarchical Embeddings. In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
  • Goubault-Larrecq (2013a) Goubault-Larrecq, J. Metrics, Quasi-metrics, Hemi-metrics, pp. 203––259. New Mathematical Monographs. Cambridge University Press, 2013a. doi: 10.1017/CBO9781139524438.006.
  • Goubault-Larrecq (2013b) Goubault-Larrecq, J. A Few Pearls in the Theory of Quasi-Metric Spaces. In Proceedings of the 28th Summer Conference on Topology and Its Applications, Nipissing University, North Bay, Ontario, Canada, 2013b.
  • Goyal & Ferrara (2018) Goyal, P. and Ferrara, E. Graph Embedding Techniques, Applications, and Performance: A Survey. Knowledge-Based Systems, 151:78–94, 2018.
  • Gromov (1987) Gromov, M. Hyperbolic Groups. In Essays in Group Theory, pp. 75–263. Springer, 1987.
  • Grover & Leskovec (2016) Grover, A. and Leskovec, J. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 855–864. ACM, 2016.
  • Heckmann (1999) Heckmann, R. Approximation of Metric Spaces by Partial Metric Spaces. Applied Categorical Structures, 7(1):71–83, 1999. ISSN 1572-9095. doi: 10.1023/A:1008684018933.
  • Hoff et al. (2002) Hoff, P. D., Raftery, A. E., and Handcock, M. S. Latent Apace Approaches to Social Network Analysis. Journal of the American Statistical Association, 97(460):1090–1098, 2002.
  • Kim (2014) Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, 2014.
  • Kirkpatrick (2011) Kirkpatrick, B. B. Haplotypes Versus Genotypes on Pedigrees. Algorithms for Molecular Biology, 6(1):10, 2011.
  • Kiros et al. (2015) Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., and Fidler, S. Skip-Thought Vectors. In Advances in Neural Information Processing Systems 27 (NIPS), pp. 3294–3302, 2015.
  • Merchant & Pitiphat (2002) Merchant, A. T. and Pitiphat, W. Directed Acyclic Graphs (DAGs): An Aid to Assess Confounding in Dental Research. Community dentistry and oral epidemiology, 30(6):399–404, 2002.
  • Mikolov et al. (2013) Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26 (NeurIPS), pp. 3111–3119. 2013.
  • Miller (1995) Miller, G. A. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39–41, 1995.
  • Nickel & Kiela (2017) Nickel, M. and Kiela, D. Poincaré Embeddings for Learning Hierarchical Representations. In Advances in Neural Information Processing Systems 30 (NeurIPS), pp. 6338–6347, 2017.
  • Nickel & Kiela (2018) Nickel, M. and Kiela, D. Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry. In Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
  • Nickel et al. (2011) Nickel, M., Tresp, V., and Kriegel, H.-P. A Three-Way Model for Collective Learning on Multi-Relational Data. In Proceedings of the 28th International Conference on Machine Learning (ICML), volume 11, pp. 809–816, 2011.
  • Pearl (2003) Pearl, J. Causality: Models, Reasoning, and Inference. Econometric Theory, 19(675-685):46, 2003.
  • Pennington et al. (2014) Pennington, J., Socher, R., and Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, 2014.
  • Price (1965) Price, D. J. D. S. Networks of Scientific Papers. Science, pp. 510–515, 1965.
  • Robins (1987) Robins, J. A Graphical Approach to the Identification and Estimation of Causal Parameters in Mortality Studies with Sustained Exposure Periods. Journal of chronic diseases, 40:139S–161S, 1987.
  • Sarkar (2011) Sarkar, R. Low Distortion Delaunay Embedding of Trees in Hyperbolic Plane. In International Symposium on Graph Drawing, pp. 355–366. Springer, 2011.
  • Spirtes et al. (2000) Spirtes, P., Glymour, C. N., Scheines, R., Heckerman, D., Meek, C., Cooper, G., and Richardson, T. Causation, Prediction, and Search. MIT press, 2000.
  • Stapleton et al. (2011) Stapleton, G., Zhang, L., Howse, J., and Rodgers, P. Drawing Euler Diagrams with Circles: The Theory of Piercings. 17(7):1020–1032, 2011. ISSN 1077-2626. doi: 10.1109/TVCG.2010.119.
  • (35) Tsuiki, H. and Hattori, Y. Lawson Topology of the Space of Formal Balls and the Hyperbolic Topology. Theoretical Computer Science, 405(1):198–205. ISSN 0304-3975. doi: 10.1016/j.tcs.2008.06.034.
  • Vendrov et al. (2016) Vendrov, I., Kiros, R., Fidler, S., and Urtasun, R. Order-Embeddings of Images and Language. In Proceedings of the 4th International Conference on Learning Representations (ICLR), 2016.
  • Wang et al. (2017) Wang, Q., Mao, Z., Wang, B., and Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Transactions on Knowledge and Data Engineering, 29(12):2724–2743, 2017.
  • Weihrauch & Schreiber (1981) Weihrauch, K. and Schreiber, U. Embedding Metric Spaces into CPO’s. Theoretical Computer Science, 16(1):5–24, 1981. ISSN 0304-3975. doi: 10.1016/0304-3975(81)90027-X.

Appendix A Proof of Proposition 1

From the definition of , if and only if . Then, we will show that .

This is obvious because holds.

For arbitrary , follows the definition of . Likewise, follows . Then, holds because of the transitivity, which implies that .

Appendix B Proof of Proposition 2

b.1 Non-negativity

We will demonstrate this proposition by contradiction. Assume ; then, holds for all . From the assumption , there exists such that . Therefore,

(B.1)

Considering and , leads to a contradiction.

b.2 Identity of indiscernibles

If , holds for all . Considering and in (B.1), we obtain ; then, .

b.3 Subadditivity

Appendix C Proof of Theorem 1

Condition (15) is equivalent to . Thus, we will show that if .

Let ; then,

Here, considering , we find

(C.2)

Appendix D Proof of Theorem 2

By using a uniform norm in (17) instead of a Euclidean norm,

(D.3)

We used (C.2) for the third equation of (D.3). From the inequality between the uniform norm and the Euclidean norm , we find

The equality holds if

i.e.,

Appendix E Proof of Theorem 3

We first prove Theorem 4 and then use our results to prove Theorem 3. Thus, see Sec. F first.

By eliminating from (F.5) and (F.11), we obtain

which is followed by (20).

The equivalence of ordering (3) and (19) is directly derived from Theorem 4 since

(E.4)

Appendix F Proof of Theorem 4


Figure 4: Hyperbolic Entailment cones.

To obtain (22), we present in Figure 4 as a function of , , and . Let , , and be

and and be

(F.5)
(F.6)

Assume that in the Euclidean triangle ; then, . Thus,

(F.7)

By applying the law of cosines to , it is shown that

(F.8)

By removing from (F.7) and (F.8) and substituting (F.5), we have

(F.9)

In addition, from the assumption of Hyperbolic Cones Ganea et al. (2018),

(F.10)

Comparing the right-hand side of equations (F.9) and (F.10), we have

(F.11)

where .

In the same manner, we have

(F.12)

By substituting (F.11) into (F.10),

(F.13)

Applying the law of sines and the law of cosines to the hyperbolic triangle , we have

(F.14)
(F.15)

By eliminating from (F.14) and (F.15), and substituting (F.11) and (F.12), it is finally shown that

(F.16)

where


Figure 5:

Marginal ReLU loss function.

Appendix G Euclidean Entailment Cones

Similar to Hyperbolic Cones Ganea et al. (2018), Euclidean Cones are also considered as Disk Embeddings. Here, we will show that in Euclidean entailment cones is also represented by Rx, Ry and D.

Let , , and be

(G.17), (G.18), and (G.19) are determined by applying the law of sines to and :

(G.17)
(G.18)
(G.19)

Moreover, for Euclid entailment cones,

(G.20)
(G.21)

By applying the law of cosines to , we obtain :

(G.23)

We represent as , , , and . By eliminating , , , and from (G.17) to (G.23), it is finally shown that

(G.24)

where and .

Appendix H Marginal ReLU

In Figure 5, we illustrate the loss function we used and shown in Eq.(10).

Appendix I Loss functions for Hyperbolic Entailment Cones in Disk Embedding format

In Figure 6, we illustrate values of energy function (22) for with fixed .


Figure 6: Values of for with fixed .