Low-rank approximations of hyperbolic embeddings

03/18/2019 ∙ by Pratik Jawanpuria, et al. ∙ 0

The hyperbolic manifold is a smooth manifold of negative constant curvature. While the hyperbolic manifold is well-studied in the literature, it has gained interest in the machine learning and natural language processing communities lately due to its usefulness in modeling continuous hierarchies. Tasks with hierarchical structures are ubiquitous in those fields and there is a general interest to learning hyperbolic representations or embeddings of such tasks. Additionally, these embeddings of related tasks may also share a low-rank subspace. In this work, we propose to learn hyperbolic embeddings such that they also lie in a low-dimensional subspace. In particular, we consider the problem of learning a low-rank factorization of hyperbolic embeddings. We cast these problems as manifold optimization problems and propose computationally efficient algorithms. Empirical results illustrate the efficacy of the proposed approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Learning hyperbolic representation of entities have gained recent interest in the machine learning community [1, 2, 3, 4, 5]. In particular, hyperbolic embeddings have been shown to be well-suited for various natural language processing problems [3, 6, 7]

that require modeling hierarchical structures such as knowledge graphs, hypernymy hierarchies, organization hierarchy, and taxonomies, among others. The reason being learning representations in the hyperbolic space provides a principled approach for integrating structural information encoded in such (discrete) entities into continuous space.

The hyperbolic space is a non-Euclidean space and has constant negative curvature. The latter property enables it to grow exponentially even in dimension as low as two. Hence, the hyperbolic space has been considered to model trees and complex networks, among others [8, 9]. Figure 1(a) is an example of representing a part of mammal taxonomy tree in a hyperbolic space (two-dimensional Poincaré ball). Hyperbolic embeddings (numerical representations of tasks) have been considered in several applications such as question answering system [10], recommender systems [11, 12], link prediction [13, 14], natural language inference [15], vertex classification [16], and machine translation [17].

In this paper, we consider the setting in which an additional low-rank structure may also exist among the learned hyperbolic embeddings. Such a setting may arise when the hierarchical entities are closely related. We propose to learn a low-rank approximation of the given (high dimensional) hyperbolic embeddings. Conceptually, we model high dimensional hyperbolic embeddings as a product of a low-dimensional subspace and low-dimensional hyperbolic embeddings. The optimization problem is cast on the product of the Stiefel and hyperbolic manifolds. We develop an efficient Riemannian trust-region algorithm for solving it. We evaluate the proposed approach on real-world datasets: on the problem of reconstructing taxonomy hierarchies from the embeddings. We observe that the performance of proposed approach match the original embeddings even in low-rank settings.

The outline of the paper is as follows. Section 2 discusses two popular models of representing hyperbolic space in the Euclidean setting. In Section 3, we present our formulation to approximate given hyperbolic embeddings in a low-rank setting. The optimization algorithm is discussed in Section 4. The experimental results are presented in Section 5 and Section 6 concludes the paper.

2 Background

In this section, we briefly discuss the basic concepts of hyperbolic geometry. Interested readers may refer [18, 19] for more details.

The hyperbolic space (of dimension ) is a Riemannian manifold with a constant negative sectional curvature. Similar to the Euclidean or spherical spaces, it is isotropic. However, the Euclidean space is flat (zero curvature) and the spherical space is positively curved. As a result of negative curvature, the circumference and area of a circle in hyperbolic space grow exponentially with the radius. In contrast, the circumference and area of a circle in the hyperbolic space grow linearly and quadratically, respectively, in Euclidean setting. Hence, hyperbolic spaces expand faster than the Euclidean spaces. Informally, hyperbolic spaces may be viewed as a continuous counterpart to discrete trees as the metric properties of a two-dimensional hyperbolic space and a -ary tree (a tree with branching factor ) are similar. Hence, trees can be embedded into a two-dimensional hyperbolic space while keeping the overall distortion arbitrarily small. In contrast, Euclidean spaces cannot attain this result even with unbounded number of dimensions.

Since hyperbolic models cannot be represented within Euclidean space without distortion, several (equivalent) models exist for representing hyperbolic spaces for computation purpose. The models are conformal to the Euclidean space and points in one model can be transformed to be represented in another model, while preserving geometric properties such as isometry. However, no model captures all the properties of the hyperbolic geometry. Two hyperbolic models, in particular, have received much interest recently in the machine learning community: the Poincaré ball model and the hyperboloid model.

[width=0.33]poincare-hierarchy-mammals.jpg [width=0.33]poincare-disk-2.pdf [width=0.34]hyperboloid_new.pdf (a) (b) (c)

Figure 1: (a) An example of the hyperbolic space (-dimensional Poincaré ball model ) being used to represent a mammal taxonomy. This taxonomy is a part of WordNet [20]; (b) A tree is embedded in . The two subtrees from the root are regular trees. All the edges have the same hyperbolic length, computed using (1); (c) The Poincaré disk () may be viewed as a stereoscopic projection of the hyperboloid model (). Points and lie on and points and are their projections, respectively, onto the . The maroon curve is the geodesic between and , which projects to the blue geodesic path between and on . Figure best viewed in color.

2.1 Poincaré ball model

The Poincaré ball is a -dimensional hyperbolic space defined as the interior of the -dimensional unit (Euclidean) ball:

where denotes the Euclidean norm. The distance between two points in the Poincaré ball model is given by

(1)

and the Poincaré norm is given by

We observe that the distance between a pair of points near the boundary of the Poincaré ball (Euclidean norm close to unity) grows much faster than distance between the points close to the center (Euclidean norm close to zero). In addition, the distance within the Poincaré ball varies smoothly with respect to points and . These properties are helpful embedding discrete hierarchical structures such as trees in hyperbolic spaces and obtain continuous embeddings which respect the tree metric structure. For instance, the origin of the Poincaré ball may be mapped to the root node of the tree as the root node is relatively closer to all other nodes (points). The leaf nodes can be places near the boundary to ensure they are relatively distant from other leaf nodes. Additionally, the shortest path between a pair of points is usually via a point closer to the origin, just as the shortest path between two nodes in a tree is via their parent nodes. Figure 1(b) shows a Poincaré disk () embedding a tree with two regular subtrees.

2.2 Hyperboloid model

Let such that and and . The Lorentz scalar product

of two vectors

and is defined as

(2)

where is a -dimensional diagonal matrix

is the -dimensional zero column vector, and is the

-dimensional identity matrix.

The hyperboloid model, also known as the Lorentz model of hyperbolic geometry, is given by

The model represents the upper sheet of an -dimensional hyperboloid. From the constraint set, it can be observed that if , then .

The distance between two points in the hyperboloid model is given by

(3)

As stated earlier, both the Poincaré ball and the hyperboloid models are equivalent and a mapping exists from one model to another [21]. Points on the hyperboloid can be mapped to the Poincaré ball by

The reverse mapping, is defined as follows:

Figure 1(c) shows a two-dimensional hyperboloid model . It can be observed that the Poincaré disk is obtained as a stereoscopic projection of .

3 Low-rank parameterization in
hyperbolic space

As discussed earlier, hyperbolic embeddings are typically suitable for representing elements of hierarchical structures such as nodes of trees [3] and complex networks [8] to name a few. When the task involves closely related hierarchical concepts, additional low-rank structure may also exist among such hyperbolic embeddings. In this section, we propose a novel low-rank parameterization for hyperbolic embeddings. It should be noted that, unlike the Euclidean embeddings, incorporating a low-rank structure in the hyperbolic framework is non-trivial because of the hyperboloid constraints.

Let be a matrix whose columns represent -dimensional hyperbolic embeddings corresponding to elements from a given hierarchical structure. For notational convenience, we represent and its -th column as follows:

We propose to approximate as a low-dimensional hyperbolic embedding such that shares a latent low-dimensional subspace with (corresponding to ), for all . Mathematically, we propose the following -rank approximation for :

where , , , and . We discuss below the consequences of the proposed model.

Firstly, we obtain . This is because implies

follows from the above equality as . Secondly, the matrix (corresponding to ) is modeled as a low-rank matrix as we approximate as , where . Thirdly, the space complexity of embeddings reduces from (for ) to (for and ).

We propose to learn the proposed low-rank paramterization of by solving the optimization problem:

(4)
subject to

where is a loss function that measures the quality of the proposed approximation. Let function denote the objective function in (4), i.e., .

We discuss the following three choices of :

  1. :
    we penalize the Euclidean distance between and . This is because and are determined from the hyperboloid constraint given and , respectively. We obtain a closed-form solution of (4) with this loss function and the solution involves computing a rank-singular value decomposition of . In Section 5, we denote this approach by the term Method-1.

  2. :
    we penalize the Euclidean distance between the (full) hyperbolic embeddings (matrices) and . This approach is denoted by the term Method-2 in Section 5.

  3. :
    since the columns of and are hyperbolic embeddings, we penalize the hyperbolic distance (3) between the corresponding embeddings. We denote it by the term Method-3.

It should be noted that the problem (LABEL:eqn:generalOptimizationf) is a nonlinear and non-convex optimization problem, but has well-studied structured constraints. In particular, the structured constraints are cast has Riemannian manifolds. In the next section, we propose a Riemannian trust-region algorithm for solving (4) with the loss function discussed in options 2) and 3) above.

4 Optimization

It should be noted that the variable in (4) belongs to the Stiefel manifold [22] and the variable belongs to the -dimensional hyperbolic manifold for all . Consequently, the constraint set of the proposed optimization problem (4) is a smooth manifold , which is the Cartesian product of the Stiefel and hyperbolic manifolds of dimension . The problem (4), therefore, now boils down to the manifold optimization problem:

(5)

where has the representation and is a smooth function.

We tackle the problem (5) in the Riemannian optimization framework that translates it into an unconstrained optimization problem over the nonlinear manifold , now endowed with a Riemannian geometry [23]. In particular, the Riemannian geometry on manifolds imposes a metric (inner product) structure on , which in turn allows to generalize notions like the shortest distance between points (on the manifold) or the translation of vectors on manifolds. Following this framework many of the standard nonlinear optimization algorithms in the Euclidean space, e.g., steepest descent and trust-regions, generalize well to Riemannian manifolds in a systematic manner. The Riemannian framework allows to develop computationally efficient algorithms on manifolds [23].

Both the Stiefel and hyperbolic manifolds are Riemannian manifolds, and their geometries have been individually well-studied in the literature [13, 23]. Subsequently, the manifold of interest also has a Riemannian structure.

Below we list some of the basic optimization-related notions that are required to solve (5) with the Riemannian trust-region algorithm that exploits second-order information. The development of those notions follow the general treatment of manifold optimization discussed in [23, Chapter 7]. The Stiefel manifold related expressions follow from [23]. The hyperobolic related expressions follow from [13].

4.1 Metric and tangent space notions

Optimization on is worked out on the tangent space, which is the linearization of at a specific point. It is a vector space associated with each element of the manifold.

As is a product space, its tangent space is also the product space of the tangent spaces of the Stiefel and hyperbolic manifolds. The characterization of the tangent space has the form:

(6)

where extracts the symmetric part of a matrix.

As discussed above, to impose a Riemannian structure on , a smooth metric (inner product) definition is required at each element of the manifold. A natural choice of the metric on is the summation of the individual Riemannian metrics on the Stiefel and hyperbolic manifolds. More precisely, we have

(7)

where , , is the standard inner product, and is the Lorentz inner product (2).

It should be emphasized that the metric in (7) endows the manifold with a Riemannian structure and allows to develop various other notions of optimization in a straightforward manner.

One important ingredient required in optimization is the notion of an orthogonal projection operator from the space to the tangent space . Exploiting the product and Riemannian structure of , the projection operator characterization is obtained as the Cartesian product of the individual tangent space projection operator on the Stiefel and hyperbolic manifolds, both of which are well known. Specifically, if , then its projection onto the tangent space is given by

(8)

4.2 Retraction

An optimization algorithm on manifold requires computation of search direction and then following along it. While the computation of the search direction follows from the notions in Section 4.1, in this section we develop the notion of “moving” along a search direction on the manifold. This is characterized by the retraction operation, which is the generalization of the the exponential map (that follows the geodesic) on the manifold. The retraction operator takes in a tangent vector at and outputs an element on the manifold by approximating the geodesic [23, Definition 4.1.1].

Exploiting the product space of , a natural expression of the retraction operator is obtained by the Cartesian product of the individual retraction operations on the Stiefel and hyperbolic manifolds. If , then the retraction operation is given by

(9)

where , for all , extracts the orthogonal factor of a matrix, i.e., .

4.3 Riemannian gradient and Hessian computations

Finally, we require the expressions of the Riemannian gradient and Hessian of on . To this end, we first compute the derivatives of in the Euclidean space. Let be the first derivative of and its Euclidean directional derivative along is . The expressions of the partial derivatives for the squared Euclidean distance based loss functions mentioned in Section 3 are straightforward to compute. When the loss function is based on the squared hyperbolic distance (3), the expressions for and are discussed in [24].

Once the partial derivatives of are known, converting them to their Riemannian counterparts on follows from the theory of Riemannian optimization [23, Chapter 3]. The expressions are

Riemannian gradient (10)
Riemannian Hessian

where is the orthogonal projection operator (8).

4.4 Riemannian trust-region algorithm

The Riemannian trust-region (TR) algorithm approximates the function with a second-order model at every iteration. The second-order model (which is called the trust-region sub-problem) makes use of the Riemannian gradient and Hessian computations as shown in Section 4.3. The trust-region sub-problem is then solved efficiently (using an iterative quadratic optimization solver, e.g., with the truncated conjugate gradient algorithm) to obtain a candidate search direction. If the candidate search leads to an appreciable decrease in the function , then it is accepted else it is rejected [23, Chapter 7]. Algorithm 1 summarizes the key steps of the proposed trust-region algorithm for solving (5).

Input: -dimensional hyperbolic embeddings and rank .
Initialize .
repeat
     1: Compute .
     2: Riemannian TR step: compute a search direction which minimizes the trust region sub-problem. It makes use of and its directional derivative, and their Riemannian counterparts (10).
     3: Update (retraction step) from (9).
until convergence
Output: and .
Algorithm 1 Riemannian trust-region algorithm for (5)

4.5 Computational complexity

The manifold-related ingredients cost . For example, the computation of the Riemannian gradient in (10) involves only the tangent space projection operation that costs . Similarly, the retraction operation costs .

The computations of and its derivatives cost (for all the three choices of the loss function in Section 3). The overall computational cost per iteration of our implementation is, therefore, .

4.6 Numerical implementation

We use the Matlab toolbox Manopt [25] to implement Algorithm 1 for (5). Manopt comes with a well-implemented generic Riemannian trust-region solver, which can be used appropriately to solve (5) by providing the necessary optimization-related ingredients mentioned earlier. The Matlab codes are available at https://pratikjawanpuria.com.

5 Experiments

In this section, we evaluate the performance of the proposed low-rank parameterization of hyperbolic embeddings. In particular, we compare the quality of the low-rank hyperbolic embeddings obtained by minimizing the three different loss functions discussed in Section 3.

Experimental setup and evaluation metric

We are provided with the hyperbolic embeddings corresponding to a hierarchical entity such as nodes of a tree or a graph. We also have the ground truth information of the given tree (or graph). Let represents the ground truth, where is the set of nodes and be the set of edges between the nodes (). Hyperbolic embeddings can be employed to reconstruct the ground truth since a low hyperbolic distance (3

) between a pair of nodes implies a high probability of an edge between them. However, such a reconstruction may also incorporate errors such as missing out on an edge or adding a non-existent edge.

We measure the quality of the hyperbolic embeddings as follows: let and be a pair of nodes in such that . Let and be the hyperbolic embeddings corresponding to and , respectively. We compute the hyperbolic distance (3) and rank it among the distance corresponding to all untrue edges from , i.e., . We then compute the mean average precision (MAP) of the ranking. The MAP score is a commonly employed metric for evaluating graph embeddings [3, 13, 26, 27]. Overall, we compare the quality of the proposed low-rank approximation by comparing the MAP score of the original high dimensional embeddings and the low-rank embeddings.

We obtain the original hyperbolic embeddings from the implementation provided by [3]. It should be noted that [3] learns the hyperbolic embeddings from the Poincaré model and we employ the transformation discussed in Section 2 to obtain embeddings corresponding to the hyperboloid model. It should be mentioned that though [13] directly learns hyperbolic embeddings from the hyperboloid model, its implementation is not available.

Rank Method-1 Method-2 Method-3
Table 1: Mean average precision (MAP) score obtained by the proposed approaches on the mammal dataset.

Datasets

We perform experiments on the mammal and noun subtrees of the WordNet database[20]. WordNet is a lexical database and among other things, it also provides relations between pairs of concepts.

The ‘mammal’ dataset has mammal as the root node, with ‘is-a’ (hypernymy) relationship defining the edges. As an example, it has relationships such as ‘rodent’ is-a ‘mammal’, ‘squirrel’ is-a ‘rodent’, etc. Hence, there exists an edge from the ‘mammal’ node to ‘rodent’ node and from ‘rodent’ node to ‘squirrel’ node. The WordNet mammal subtree consists of nodes and edges. A part of this subtree is displayed in Figure 1(a).

Similarly, the ‘noun’ dataset is also a subtree of WordNet database. Examples in this subtree include ‘photograph’ is-a ‘object’, ’bronchitis’ is-a ‘disease’, ‘disease’ is-a ‘entity’, etc. It consists of nodes and edges.

Results

We compare the performance of the proposed low-rank approximation of hyperbolic embeddings with the three loss functions discussed in Section 3. Table 1 reports the results on the mammal dataset with different values of rank . The original -dimensional hyperbolic embeddings for the mammal subtree achieve a MAP score of . We observe that all the three methods are able to obtain MAP scores very close to the original embeddings with rank . In addition, Method-1 and Method-2 perform well even in very low-rank setting (). This hints that penalizing with the Euclidean distance may be more suitable than compared to the hyperbolic distance (3) for approximating hyperbolic embeddings when the given rank is very small.

The results on the noun dataset are reported in Table 2. This dataset is challenging because of its scale and relatively low reconstruction performance of the original hyperbolic embeddings. The original -dimensional hyperbolic embeddings for the noun subtree achieve a MAP score of . We observe that at rank our methods are able to get within of the performance obtained by the original embeddings.

Rank Method-1 Method-2 Method-3
Table 2: Mean average precision (MAP) score obtained by the proposed approaches on the noun dataset.

6 Conclusion and Future work

Recently, hyperbolic embeddings have gained popularity in many machine learning applications because of their ability to model complex networks. In this paper, we have looked at scenarios where hyperbolic embeddings are potentially high dimensional and how to compress them using a low-rank factorization model. While low-rank decomposition of Euclidean embeddings are well-known, that of hyperbolic embeddings has not been well-studied. To this end, we have proposed a systematic approach to compute low-rank approximations of hyperbolic embeddings. Our approach allows to decompose a high dimensional hyperbolic embedding () into a product of low-dimensional subspace () and a smaller dimensional hyperbolic embedding ().

We modeled the learning problem as an optimization problem on manifolds. Various optimization-related notions were presented to implement a Riemannian trust-region algorithm. Our experiments showed the benefit of the proposed low-rank approximations on real-world datasets.

As a future research direction, we would like to explore how low-rank hyperbolic embeddings are useful in downstream applications. Another research direction could be on developing methods to compute a “good” rank of hyperbolic embeddings.

References

  • [1] M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst.

    Geometric deep learning: Going beyond euclidean data.

    IEEE Signal Processing Magazine, 34(4):18–42, 2017.
  • [2] A. Muscoloni, J. M. Thomas, S. Ciucci, G. Bianconi, and C. V. Cannistraci. Machine learning meets complex networks via coalescent embedding in the hyperbolic space. Nature Communications, 8(1):1615, 2017.
  • [3] M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. In Neural Information Processing Systems Conference (NIPS), 2017.
  • [4] F. Sala, C. D. Sa, A. Gu, and C. Ré. Representation tradeoffs for hyperbolic embeddings. In International Conference on Machine Learning (ICML), 2018.
  • [5] A. Gu, F. Sala, B. Gunel, and C. Ré. Learning mixed-curvature representations in product spaces. In International Conference on Learning Representations (ICLR), 2019.
  • [6] B. Dhingra, C. J. Shallue, M. Norouzi, A. M. Dai, and G. E. Dahl. Embedding text in hyperbolic spaces. In Twelfth Workshop on Graph-Based Methods for Natural Language Processing (ACL), pages 59–69, 2018.
  • [7] A. Tifrea, G. Bécigneul, and O.-E. Ganea. Poincaré glove: Hyperbolic word embeddings. In International Conference on Learning Representations (ICLR), 2019.
  • [8] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguñá. Hyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.
  • [9] M. Hamann. On the tree-likeness of hyperbolic spaces. Mathematical Proceedings of the Cambridge Philosophical Society, 164(2):345–361, 2018.
  • [10] Y. Tay, L. A. Tuan, and S. C. Hui. Hyperbolic representation learning for fast and efficient neural question answering. In Web Search and Data Mining (WSDM), 2018.
  • [11] T. D. Q. Vinh, Y. Tay, S. Zhang, G. Cong, and X.-L. Li. Hyperbolic recommender systems. Technical report, arXiv:1809.01703, 2018.
  • [12] B. P. Chamberlain, S. R. Hardwick, D. R. Wardrope, F. Dzogang, F. Daolio, and S. Vargas. Scalable hyperbolic recommender systems. Technical report, arXiv:1902.08648, 2019.
  • [13] M. Nickel and D. Kiela. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In International Conference on Machine Learning (ICML), 2018.
  • [14] O.-E. Ganea, G. Bécigneul, and T. Hofmann. Hyperbolic entailment cones for learning hierarchical embeddings. In International Conference on Machine Learning (ICML), 2018.
  • [15] O.-E. Ganea, G. Bécigneul, and T. Hofmann.

    Hyperbolic neural networks.

    In Neural Information Processing Systems Conference (NIPS), 2018.
  • [16] B. P. Chamberlain, J. R. Clough, and M. P. Deisenroth. Neural embeddings of graphs in hyperbolic space. Technical report, arXiv:1705.10359, 2017.
  • [17] C. Gulcehre, M. Denil, M. Malinowski, A. Razavi, R. Pascanu, K. M. Hermann, P. Battaglia, V. Bapst, D. Raposo, A. Santoro, and N. Freitas. Hyperbolic attention networks. In International Conference on Learning Representations (ICLR), 2019.
  • [18] J. Anderson. Hyperbolic Geometry. Springer-Verlag London, 2005.
  • [19] J. Ratcliffe. Foundations of Hyperbolic Manifolds. Springer-Verlag New York, 2006.
  • [20] G. A. Miller. Wordnet: A lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
  • [21] B. Wilson and M. Leimeister. Gradient descent in hyperbolic space. Technical report, arXiv:1805.08207, 2018.
  • [22] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
  • [23] P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008.
  • [24] X. Pennec. Hessian of the riemannian squared distance. Technical report, Université Côte d’Azur and Inria Sophia-Antipolis Méditerranée, 2017.
  • [25] N. Boumal, B. Mishra, P.-A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15(Apr):1455–1459, 2014.
  • [26] A. Bordes, N. Usunier, A. García-Durán, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems Conference (NIPS), 2013.
  • [27] M. Nickel, L. Rosasco, and T. A. Poggio. Holographic embeddings of knowledge graphs. In

    AAAI Conference on Artificial Intelligence

    , 2016.