1 Introduction
Learning hyperbolic representation of entities have gained recent interest in the machine learning community [1, 2, 3, 4, 5]. In particular, hyperbolic embeddings have been shown to be wellsuited for various natural language processing problems [3, 6, 7]
that require modeling hierarchical structures such as knowledge graphs, hypernymy hierarchies, organization hierarchy, and taxonomies, among others. The reason being learning representations in the hyperbolic space provides a principled approach for integrating structural information encoded in such (discrete) entities into continuous space.
The hyperbolic space is a nonEuclidean space and has constant negative curvature. The latter property enables it to grow exponentially even in dimension as low as two. Hence, the hyperbolic space has been considered to model trees and complex networks, among others [8, 9]. Figure 1(a) is an example of representing a part of mammal taxonomy tree in a hyperbolic space (twodimensional Poincaré ball). Hyperbolic embeddings (numerical representations of tasks) have been considered in several applications such as question answering system [10], recommender systems [11, 12], link prediction [13, 14], natural language inference [15], vertex classification [16], and machine translation [17].
In this paper, we consider the setting in which an additional lowrank structure may also exist among the learned hyperbolic embeddings. Such a setting may arise when the hierarchical entities are closely related. We propose to learn a lowrank approximation of the given (high dimensional) hyperbolic embeddings. Conceptually, we model high dimensional hyperbolic embeddings as a product of a lowdimensional subspace and lowdimensional hyperbolic embeddings. The optimization problem is cast on the product of the Stiefel and hyperbolic manifolds. We develop an efficient Riemannian trustregion algorithm for solving it. We evaluate the proposed approach on realworld datasets: on the problem of reconstructing taxonomy hierarchies from the embeddings. We observe that the performance of proposed approach match the original embeddings even in lowrank settings.
The outline of the paper is as follows. Section 2 discusses two popular models of representing hyperbolic space in the Euclidean setting. In Section 3, we present our formulation to approximate given hyperbolic embeddings in a lowrank setting. The optimization algorithm is discussed in Section 4. The experimental results are presented in Section 5 and Section 6 concludes the paper.
2 Background
In this section, we briefly discuss the basic concepts of hyperbolic geometry. Interested readers may refer [18, 19] for more details.
The hyperbolic space (of dimension ) is a Riemannian manifold with a constant negative sectional curvature. Similar to the Euclidean or spherical spaces, it is isotropic. However, the Euclidean space is flat (zero curvature) and the spherical space is positively curved. As a result of negative curvature, the circumference and area of a circle in hyperbolic space grow exponentially with the radius. In contrast, the circumference and area of a circle in the hyperbolic space grow linearly and quadratically, respectively, in Euclidean setting. Hence, hyperbolic spaces expand faster than the Euclidean spaces. Informally, hyperbolic spaces may be viewed as a continuous counterpart to discrete trees as the metric properties of a twodimensional hyperbolic space and a ary tree (a tree with branching factor ) are similar. Hence, trees can be embedded into a twodimensional hyperbolic space while keeping the overall distortion arbitrarily small. In contrast, Euclidean spaces cannot attain this result even with unbounded number of dimensions.
Since hyperbolic models cannot be represented within Euclidean space without distortion, several (equivalent) models exist for representing hyperbolic spaces for computation purpose. The models are conformal to the Euclidean space and points in one model can be transformed to be represented in another model, while preserving geometric properties such as isometry. However, no model captures all the properties of the hyperbolic geometry. Two hyperbolic models, in particular, have received much interest recently in the machine learning community: the Poincaré ball model and the hyperboloid model.
2.1 Poincaré ball model
The Poincaré ball is a dimensional hyperbolic space defined as the interior of the dimensional unit (Euclidean) ball:
where denotes the Euclidean norm. The distance between two points in the Poincaré ball model is given by
(1) 
and the Poincaré norm is given by
We observe that the distance between a pair of points near the boundary of the Poincaré ball (Euclidean norm close to unity) grows much faster than distance between the points close to the center (Euclidean norm close to zero). In addition, the distance within the Poincaré ball varies smoothly with respect to points and . These properties are helpful embedding discrete hierarchical structures such as trees in hyperbolic spaces and obtain continuous embeddings which respect the tree metric structure. For instance, the origin of the Poincaré ball may be mapped to the root node of the tree as the root node is relatively closer to all other nodes (points). The leaf nodes can be places near the boundary to ensure they are relatively distant from other leaf nodes. Additionally, the shortest path between a pair of points is usually via a point closer to the origin, just as the shortest path between two nodes in a tree is via their parent nodes. Figure 1(b) shows a Poincaré disk () embedding a tree with two regular subtrees.
2.2 Hyperboloid model
Let such that and and . The Lorentz scalar product
of two vectors
and is defined as(2) 
where is a dimensional diagonal matrix
is the dimensional zero column vector, and is the
dimensional identity matrix.
The hyperboloid model, also known as the Lorentz model of hyperbolic geometry, is given by
The model represents the upper sheet of an dimensional hyperboloid. From the constraint set, it can be observed that if , then .
The distance between two points in the hyperboloid model is given by
(3) 
As stated earlier, both the Poincaré ball and the hyperboloid models are equivalent and a mapping exists from one model to another [21]. Points on the hyperboloid can be mapped to the Poincaré ball by
The reverse mapping, is defined as follows:
Figure 1(c) shows a twodimensional hyperboloid model . It can be observed that the Poincaré disk is obtained as a stereoscopic projection of .
3 Lowrank parameterization in
hyperbolic space
As discussed earlier, hyperbolic embeddings are typically suitable for representing elements of hierarchical structures such as nodes of trees [3] and complex networks [8] to name a few. When the task involves closely related hierarchical concepts, additional lowrank structure may also exist among such hyperbolic embeddings. In this section, we propose a novel lowrank parameterization for hyperbolic embeddings. It should be noted that, unlike the Euclidean embeddings, incorporating a lowrank structure in the hyperbolic framework is nontrivial because of the hyperboloid constraints.
Let be a matrix whose columns represent dimensional hyperbolic embeddings corresponding to elements from a given hierarchical structure. For notational convenience, we represent and its th column as follows:
We propose to approximate as a lowdimensional hyperbolic embedding such that shares a latent lowdimensional subspace with (corresponding to ), for all . Mathematically, we propose the following rank approximation for :
where , , , and . We discuss below the consequences of the proposed model.
Firstly, we obtain . This is because implies
follows from the above equality as . Secondly, the matrix (corresponding to ) is modeled as a lowrank matrix as we approximate as , where . Thirdly, the space complexity of embeddings reduces from (for ) to (for and ).
We propose to learn the proposed lowrank paramterization of by solving the optimization problem:
(4)  
subject to  
where is a loss function that measures the quality of the proposed approximation. Let function denote the objective function in (4), i.e., .
We discuss the following three choices of :

:
we penalize the Euclidean distance between and . This is because and are determined from the hyperboloid constraint given and , respectively. We obtain a closedform solution of (4) with this loss function and the solution involves computing a ranksingular value decomposition of . In Section 5, we denote this approach by the term Method1. 
:
we penalize the Euclidean distance between the (full) hyperbolic embeddings (matrices) and . This approach is denoted by the term Method2 in Section 5. 
:
since the columns of and are hyperbolic embeddings, we penalize the hyperbolic distance (3) between the corresponding embeddings. We denote it by the term Method3.
It should be noted that the problem (LABEL:eqn:generalOptimizationf) is a nonlinear and nonconvex optimization problem, but has wellstudied structured constraints. In particular, the structured constraints are cast has Riemannian manifolds. In the next section, we propose a Riemannian trustregion algorithm for solving (4) with the loss function discussed in options 2) and 3) above.
4 Optimization
It should be noted that the variable in (4) belongs to the Stiefel manifold [22] and the variable belongs to the dimensional hyperbolic manifold for all . Consequently, the constraint set of the proposed optimization problem (4) is a smooth manifold , which is the Cartesian product of the Stiefel and hyperbolic manifolds of dimension . The problem (4), therefore, now boils down to the manifold optimization problem:
(5) 
where has the representation and is a smooth function.
We tackle the problem (5) in the Riemannian optimization framework that translates it into an unconstrained optimization problem over the nonlinear manifold , now endowed with a Riemannian geometry [23]. In particular, the Riemannian geometry on manifolds imposes a metric (inner product) structure on , which in turn allows to generalize notions like the shortest distance between points (on the manifold) or the translation of vectors on manifolds. Following this framework many of the standard nonlinear optimization algorithms in the Euclidean space, e.g., steepest descent and trustregions, generalize well to Riemannian manifolds in a systematic manner. The Riemannian framework allows to develop computationally efficient algorithms on manifolds [23].
Both the Stiefel and hyperbolic manifolds are Riemannian manifolds, and their geometries have been individually wellstudied in the literature [13, 23]. Subsequently, the manifold of interest also has a Riemannian structure.
Below we list some of the basic optimizationrelated notions that are required to solve (5) with the Riemannian trustregion algorithm that exploits secondorder information. The development of those notions follow the general treatment of manifold optimization discussed in [23, Chapter 7]. The Stiefel manifold related expressions follow from [23]. The hyperobolic related expressions follow from [13].
4.1 Metric and tangent space notions
Optimization on is worked out on the tangent space, which is the linearization of at a specific point. It is a vector space associated with each element of the manifold.
As is a product space, its tangent space is also the product space of the tangent spaces of the Stiefel and hyperbolic manifolds. The characterization of the tangent space has the form:
(6) 
where extracts the symmetric part of a matrix.
As discussed above, to impose a Riemannian structure on , a smooth metric (inner product) definition is required at each element of the manifold. A natural choice of the metric on is the summation of the individual Riemannian metrics on the Stiefel and hyperbolic manifolds. More precisely, we have
(7) 
where , , is the standard inner product, and is the Lorentz inner product (2).
It should be emphasized that the metric in (7) endows the manifold with a Riemannian structure and allows to develop various other notions of optimization in a straightforward manner.
One important ingredient required in optimization is the notion of an orthogonal projection operator from the space to the tangent space . Exploiting the product and Riemannian structure of , the projection operator characterization is obtained as the Cartesian product of the individual tangent space projection operator on the Stiefel and hyperbolic manifolds, both of which are well known. Specifically, if , then its projection onto the tangent space is given by
(8)  
4.2 Retraction
An optimization algorithm on manifold requires computation of search direction and then following along it. While the computation of the search direction follows from the notions in Section 4.1, in this section we develop the notion of “moving” along a search direction on the manifold. This is characterized by the retraction operation, which is the generalization of the the exponential map (that follows the geodesic) on the manifold. The retraction operator takes in a tangent vector at and outputs an element on the manifold by approximating the geodesic [23, Definition 4.1.1].
Exploiting the product space of , a natural expression of the retraction operator is obtained by the Cartesian product of the individual retraction operations on the Stiefel and hyperbolic manifolds. If , then the retraction operation is given by
(9)  
where , for all , extracts the orthogonal factor of a matrix, i.e., .
4.3 Riemannian gradient and Hessian computations
Finally, we require the expressions of the Riemannian gradient and Hessian of on . To this end, we first compute the derivatives of in the Euclidean space. Let be the first derivative of and its Euclidean directional derivative along is . The expressions of the partial derivatives for the squared Euclidean distance based loss functions mentioned in Section 3 are straightforward to compute. When the loss function is based on the squared hyperbolic distance (3), the expressions for and are discussed in [24].
4.4 Riemannian trustregion algorithm
The Riemannian trustregion (TR) algorithm approximates the function with a secondorder model at every iteration. The secondorder model (which is called the trustregion subproblem) makes use of the Riemannian gradient and Hessian computations as shown in Section 4.3. The trustregion subproblem is then solved efficiently (using an iterative quadratic optimization solver, e.g., with the truncated conjugate gradient algorithm) to obtain a candidate search direction. If the candidate search leads to an appreciable decrease in the function , then it is accepted else it is rejected [23, Chapter 7]. Algorithm 1 summarizes the key steps of the proposed trustregion algorithm for solving (5).
Input: dimensional hyperbolic embeddings and rank . 
Initialize . 
repeat 
1: Compute . 
2: Riemannian TR step: compute a search direction which minimizes the trust region subproblem. It makes use of and its directional derivative, and their Riemannian counterparts (10). 
3: Update (retraction step) from (9). 
until convergence 
Output: and . 
4.5 Computational complexity
The manifoldrelated ingredients cost . For example, the computation of the Riemannian gradient in (10) involves only the tangent space projection operation that costs . Similarly, the retraction operation costs .
The computations of and its derivatives cost (for all the three choices of the loss function in Section 3). The overall computational cost per iteration of our implementation is, therefore, .
4.6 Numerical implementation
We use the Matlab toolbox Manopt [25] to implement Algorithm 1 for (5). Manopt comes with a wellimplemented generic Riemannian trustregion solver, which can be used appropriately to solve (5) by providing the necessary optimizationrelated ingredients mentioned earlier. The Matlab codes are available at https://pratikjawanpuria.com.
5 Experiments
In this section, we evaluate the performance of the proposed lowrank parameterization of hyperbolic embeddings. In particular, we compare the quality of the lowrank hyperbolic embeddings obtained by minimizing the three different loss functions discussed in Section 3.
Experimental setup and evaluation metric
We are provided with the hyperbolic embeddings corresponding to a hierarchical entity such as nodes of a tree or a graph. We also have the ground truth information of the given tree (or graph). Let represents the ground truth, where is the set of nodes and be the set of edges between the nodes (). Hyperbolic embeddings can be employed to reconstruct the ground truth since a low hyperbolic distance (3
) between a pair of nodes implies a high probability of an edge between them. However, such a reconstruction may also incorporate errors such as missing out on an edge or adding a nonexistent edge.
We measure the quality of the hyperbolic embeddings as follows: let and be a pair of nodes in such that . Let and be the hyperbolic embeddings corresponding to and , respectively. We compute the hyperbolic distance (3) and rank it among the distance corresponding to all untrue edges from , i.e., . We then compute the mean average precision (MAP) of the ranking. The MAP score is a commonly employed metric for evaluating graph embeddings [3, 13, 26, 27]. Overall, we compare the quality of the proposed lowrank approximation by comparing the MAP score of the original high dimensional embeddings and the lowrank embeddings.
We obtain the original hyperbolic embeddings from the implementation provided by [3]. It should be noted that [3] learns the hyperbolic embeddings from the Poincaré model and we employ the transformation discussed in Section 2 to obtain embeddings corresponding to the hyperboloid model. It should be mentioned that though [13] directly learns hyperbolic embeddings from the hyperboloid model, its implementation is not available.
Rank  Method1  Method2  Method3 

Datasets
We perform experiments on the mammal and noun subtrees of the WordNet database[20]. WordNet is a lexical database and among other things, it also provides relations between pairs of concepts.
The ‘mammal’ dataset has mammal as the root node, with ‘isa’ (hypernymy) relationship defining the edges. As an example, it has relationships such as ‘rodent’ isa ‘mammal’, ‘squirrel’ isa ‘rodent’, etc. Hence, there exists an edge from the ‘mammal’ node to ‘rodent’ node and from ‘rodent’ node to ‘squirrel’ node. The WordNet mammal subtree consists of nodes and edges. A part of this subtree is displayed in Figure 1(a).
Similarly, the ‘noun’ dataset is also a subtree of WordNet database. Examples in this subtree include ‘photograph’ isa ‘object’, ’bronchitis’ isa ‘disease’, ‘disease’ isa ‘entity’, etc. It consists of nodes and edges.
Results
We compare the performance of the proposed lowrank approximation of hyperbolic embeddings with the three loss functions discussed in Section 3. Table 1 reports the results on the mammal dataset with different values of rank . The original dimensional hyperbolic embeddings for the mammal subtree achieve a MAP score of . We observe that all the three methods are able to obtain MAP scores very close to the original embeddings with rank . In addition, Method1 and Method2 perform well even in very lowrank setting (). This hints that penalizing with the Euclidean distance may be more suitable than compared to the hyperbolic distance (3) for approximating hyperbolic embeddings when the given rank is very small.
The results on the noun dataset are reported in Table 2. This dataset is challenging because of its scale and relatively low reconstruction performance of the original hyperbolic embeddings. The original dimensional hyperbolic embeddings for the noun subtree achieve a MAP score of . We observe that at rank our methods are able to get within of the performance obtained by the original embeddings.
Rank  Method1  Method2  Method3 

6 Conclusion and Future work
Recently, hyperbolic embeddings have gained popularity in many machine learning applications because of their ability to model complex networks. In this paper, we have looked at scenarios where hyperbolic embeddings are potentially high dimensional and how to compress them using a lowrank factorization model. While lowrank decomposition of Euclidean embeddings are wellknown, that of hyperbolic embeddings has not been wellstudied. To this end, we have proposed a systematic approach to compute lowrank approximations of hyperbolic embeddings. Our approach allows to decompose a high dimensional hyperbolic embedding () into a product of lowdimensional subspace () and a smaller dimensional hyperbolic embedding ().
We modeled the learning problem as an optimization problem on manifolds. Various optimizationrelated notions were presented to implement a Riemannian trustregion algorithm. Our experiments showed the benefit of the proposed lowrank approximations on realworld datasets.
As a future research direction, we would like to explore how lowrank hyperbolic embeddings are useful in downstream applications. Another research direction could be on developing methods to compute a “good” rank of hyperbolic embeddings.
References

[1]
M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst.
Geometric deep learning: Going beyond euclidean data.
IEEE Signal Processing Magazine, 34(4):18–42, 2017.  [2] A. Muscoloni, J. M. Thomas, S. Ciucci, G. Bianconi, and C. V. Cannistraci. Machine learning meets complex networks via coalescent embedding in the hyperbolic space. Nature Communications, 8(1):1615, 2017.
 [3] M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. In Neural Information Processing Systems Conference (NIPS), 2017.
 [4] F. Sala, C. D. Sa, A. Gu, and C. Ré. Representation tradeoffs for hyperbolic embeddings. In International Conference on Machine Learning (ICML), 2018.
 [5] A. Gu, F. Sala, B. Gunel, and C. Ré. Learning mixedcurvature representations in product spaces. In International Conference on Learning Representations (ICLR), 2019.
 [6] B. Dhingra, C. J. Shallue, M. Norouzi, A. M. Dai, and G. E. Dahl. Embedding text in hyperbolic spaces. In Twelfth Workshop on GraphBased Methods for Natural Language Processing (ACL), pages 59–69, 2018.
 [7] A. Tifrea, G. Bécigneul, and O.E. Ganea. Poincaré glove: Hyperbolic word embeddings. In International Conference on Learning Representations (ICLR), 2019.
 [8] D. Krioukov, F. Papadopoulos, M. Kitsak, A. Vahdat, and M. Boguñá. Hyperbolic geometry of complex networks. Physical Review E, 82(3):036106, 2010.
 [9] M. Hamann. On the treelikeness of hyperbolic spaces. Mathematical Proceedings of the Cambridge Philosophical Society, 164(2):345–361, 2018.
 [10] Y. Tay, L. A. Tuan, and S. C. Hui. Hyperbolic representation learning for fast and efficient neural question answering. In Web Search and Data Mining (WSDM), 2018.
 [11] T. D. Q. Vinh, Y. Tay, S. Zhang, G. Cong, and X.L. Li. Hyperbolic recommender systems. Technical report, arXiv:1809.01703, 2018.
 [12] B. P. Chamberlain, S. R. Hardwick, D. R. Wardrope, F. Dzogang, F. Daolio, and S. Vargas. Scalable hyperbolic recommender systems. Technical report, arXiv:1902.08648, 2019.
 [13] M. Nickel and D. Kiela. Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In International Conference on Machine Learning (ICML), 2018.
 [14] O.E. Ganea, G. Bécigneul, and T. Hofmann. Hyperbolic entailment cones for learning hierarchical embeddings. In International Conference on Machine Learning (ICML), 2018.

[15]
O.E. Ganea, G. Bécigneul, and T. Hofmann.
Hyperbolic neural networks.
In Neural Information Processing Systems Conference (NIPS), 2018.  [16] B. P. Chamberlain, J. R. Clough, and M. P. Deisenroth. Neural embeddings of graphs in hyperbolic space. Technical report, arXiv:1705.10359, 2017.
 [17] C. Gulcehre, M. Denil, M. Malinowski, A. Razavi, R. Pascanu, K. M. Hermann, P. Battaglia, V. Bapst, D. Raposo, A. Santoro, and N. Freitas. Hyperbolic attention networks. In International Conference on Learning Representations (ICLR), 2019.
 [18] J. Anderson. Hyperbolic Geometry. SpringerVerlag London, 2005.
 [19] J. Ratcliffe. Foundations of Hyperbolic Manifolds. SpringerVerlag New York, 2006.
 [20] G. A. Miller. Wordnet: A lexical database for english. Communications of the ACM, 38(11):39–41, 1995.
 [21] B. Wilson and M. Leimeister. Gradient descent in hyperbolic space. Technical report, arXiv:1805.08207, 2018.
 [22] A. Edelman, T. A. Arias, and S. T. Smith. The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2):303–353, 1998.
 [23] P.A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ, 2008.
 [24] X. Pennec. Hessian of the riemannian squared distance. Technical report, Université Côte d’Azur and Inria SophiaAntipolis Méditerranée, 2017.
 [25] N. Boumal, B. Mishra, P.A. Absil, and R. Sepulchre. Manopt, a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15(Apr):1455–1459, 2014.
 [26] A. Bordes, N. Usunier, A. GarcíaDurán, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multirelational data. In Neural Information Processing Systems Conference (NIPS), 2013.

[27]
M. Nickel, L. Rosasco, and T. A. Poggio.
Holographic embeddings of knowledge graphs.
In
AAAI Conference on Artificial Intelligence
, 2016.
Comments
There are no comments yet.