1 Introduction
Hyperbolic space is roomy. It can embed hierarchical structures uniformly and with arbitrarily low distortion [1, 2]. Euclidean space cannot achieve comparably low distortion even using an unbounded number of dimensions [3].
Embedding objects in hyperbolic spaces has found a myriad applications in exploratory science, from visualizing hierarchical structures such as social networks and link prediction for symbolic data [4, 5]
to natural language processing
[6, 7], brain networks [8], gene ontologies [9] and recommender systems [10, 11].Commonly in these applications, there is a treelike data structure which encodes similarity between a number of entities. We experimentally observe some relational information about the structure and the data mining task is to find a geometric representation of the entities consistent with the experimental information. In other words, the task is to compute an embedding. This concept is closely related to the classical distance geometry problems and multidimensional scaling (MDS) [12] in Euclidean spaces [13, 14].
The observations can be metric or nonmetric. Metric observations convey (inexact) distances; for example, in internet distance embedding a small subset of nodes with complete distance information are used to estimate the remaining distances [15]. Nonmetric observations tell us which pairs of entities are closer and which are further apart. The measure of closeness is typically derived from domain knowledge; for example, word embedding algorithms aim to relate semantically close words and their topics [16, 17].
In scientific applications it is desirable to compute good lowdimensional hyperbolic embeddings. Insisting on low dimension not only facilitates visualization, but also promotes simple explanations of the phenomenon under study. However, in most works that leverage hyperbolic geometry the embedding technique is not the primary focus and the related computations are often ad hoc. The situation is different in the Euclidean case, where the notions of MDS, Euclidean distance matrices (EDMs) and their characterization in terms of positive semidefinite Gram matrices play a central role in the design and analysis of algorithms [13, 18].
In this paper, we focus on computing lowdimensional hyperbolic embeddings. While there exists a strong link between Euclidean geometry and positive (semi)definiteness, we prove that what we call hyperbolic distance matrices (HDMs) can also be characterized via semidefinite constraints. Unlike in the Euclidean case, the hyperbolic analogy of the Euclidean Gram matrix is a linear combination of two rankconstrained Gramians. Together with the new spectral factorization method to directly estimate the hyperbolic points, this characterization gives rise to flexible embedding algorithms which can handle diverse constraints and mix metric and nonmetric data.
1.1 Related Work
The usefulness of hyperbolic space stems from its ability to efficiently represent the geometry of complex networks [19, 20]. Embedding metric graphs with underlying hyperbolic geometry has applications in word embedding [16, 17], geographic routing [21], routing in dynamical graphs [22], odor embedding [23], internet network embedding for delay estimation and server selection [15, 24], to name a few. In the literature such problems are known as hyperbolic multidimensional scaling [25].
There exist Riemann gradientbased approaches [26, 5, 7] which can be utilized to directly estimate such embedding for metric measurements [27]. We emphasize that these methods are iterative and only guaranteed to return a locally optimal solution. In Section 3.3, we prove a closedform spectral factorization method that finds the points from an HDM.
Nonmetric (or order) embedding has been proposed to learn visualsemantic hierarchies from ordered input pairs by embedding symbolic objects into a lowdimensional space [28]. In the Euclidean case, stochastic triplet embeddings [29], crowd kernels [30], and generalized nonmetric MDS [31] are some wellknown order embedding algorithms. For embedding hierarchical structures, Ganea et al. model order relations as a family of nested geodesically convex cones [32]. Zhou et. al. [23] show that odors can be efficiently embedded in hyperbolic space provided that the similarity between odors is based on the statistics of their cooccurrence within natural mixtures.
1.2 Contributions
We summarize our main contributions as follows:

Semidefinite characterization of HDMs: We introduce HDMs as an elegant tool to formalize distance problems in hyperbolic space; this is analogous to Euclidean distance matrices (EDM). We derive a semidefinite characterization of HDMs by studying the properties of hyperbolic Gram matrices—matrices of Lorentzian (indefinite) inner products of points in a hyperbolic space.

A flexible algorithm for hyperbolic distance geometry problems (HDGPs):
We use the semidefinite characterization to propose a flexible embedding algorithm based on semidefinite programming. It allows us to seamlessly combine metric and nonmetric problems in one framework and to handle a diverse set of constraints. The nonmetric and metric measurements are imputed as linear and quadratic constraints.

Spectral factorization and projection: We compute the final hyperbolic embeddings with a simple, closedform spectral factorization method. We also propose a suboptimal method to find a lowrank approximation of the hyperbolic Gramian in the desired dimension.
Euclidean  Hyperbolic 

Euclidean Distance Matrix  Hyperbolic Distance Matrix 
Gramian  HGramian 
Semidefinite relaxation  Semidefinite relaxation 
to complete an EDM  to complete an HDM 
Spectral factorization of a  Spectral factorization of an 
Gramian to estimate the points  HGramian to estimate the points 
1.3 Paper Organization
We first briefly review the analytical models of hyperbolic space and formalize hyperbolic distance geometry problems (HDGPs) in Section 2. Our framework is parallel with semidefinite approaches for Euclidean distance problems as per Table 1. In the ’Loid model, we define hyperbolic distance matrices to compactly encode hyperbolic distance measurements. We show that an HDM can be characterized in terms of the matrix of indefinite inner products, the hyperbolic Gramian. In Section 3, we propose a semidefinite representation of hyperbolic Gramians, and in turn HDMs. We cast HDGPs as rankconstrained semidefinite programs, which are then convexified by relaxing the rank constraints. We develop a spectral method to find a suboptimal lowrank approximation of the hyperbolic Gramian, to the correct embedding dimension. Lastly, we propose a closedform factorization method to estimate the embedded points. This framework lets us tackle a variety of embedding problems, as shown in Section 4, with real (odors) and synthetic (random trees) data. The proofs of propositions and derivations of proposed algorithms are given in the appendix.
Notation
We use small letters for vectors,
, and capital letters for matrices, . We enumerate elements of as , and let be short for the set . We define the set of asymmetric pairs as . We denote the th standard basis vector in by , . The norms , and are Frobenius, operator, and norm of column norms, respectively. The empirical expectation of a random variable is denoted by
. The projection matrix onto the span of top eigenvectors of is denoted by , where indicates that is positive semidefinite. Finally, and are allzero and allones vectors of appropriate dimensions.2 Hyperbolic Distance Geometry Problems
2.1 Hyperbolic Space
Hyperbolic space is a simply connected Riemannian manifold with constant negative curvature [33, 34]. In comparison, Euclidean and elliptic geometries are spaces with zero (flat) and constant positive curvatures. There are five isometric models for hyperbolic space: halfspace (), Poincaré (interior of the disk) (), jemisphere (), Klein (), and ’Loid () [33] (Table 2 and Figure 1). Each provides unique insights into the properties of hyperbolic geometry.
Models of dimensional hyperbolic space 
In the machine learning community the most popular models of hyperbolic geometry are Poincaré and ’Loid. We work in the ’Loid model as it has a simple, tractable distance function. It lets us cast the HDGP (formally defined in
Section 2.2) as a rankconstrained semidefinite program. Importantly, it also leads to a closedform embedding by a spectral method. For better visualization, however, we map the final embedded points to the Poincaré model via the stereographic projection, see Sections 4 and 2.1.2.2.1.1 ’Loid Model
Let and be vectors in with . The Lorentzian inner product of and is defined as
(1) 
where
(2) 
This is an indefinite inner product on . The Lorentzian inner product has almost all the properties of ordinary inner products, except that
can be positive, zero, or negative. The vector space equipped with the Lorentzian inner product (1) is called a Lorentzian space, and is denoted by . In a Lorentzian space we can define notions similar to the Gram matrix, adjoint, and unitary matrices known from Euclidean spaces as follows.
[Hadjoint [35]] The Hadjoint of an arbitrary matrix is characterized by
Equivalently,
(3) 
] An invertible matrix
is called Hunitary if .The ’Loid model of dimensional hyperbolic space is a Riemannian manifold , where
and is the Riemannian metric.
[Lorentz Gramian, HGramian] Let the columns of be the positions of points in (resp. ). We define their corresponding Lorentz Gramian (resp. HGramian) as
where is the indefinite matrix given by (2). The subtle difference between the Lorentz Gramian (defined for points in ) and the HGramian (defined only on the manifold ) will be important for the lowrank projection and the spectral factorization steps in Section 3. The indefinite inner product (1) also determines the distance between , as
(4) 
2.1.2 Poincaré Model
In the Poincaré model the points reside in the unit Euclidean ball, as shown in Table 2. The distance between is given by
(5) 
The isometric map between the ’Loid and the Poincaré model, is called the stereographic projection. For , we have
(6) 
The inverse of stereographic projection is given by
(7) 
The isometry between the ’Loid and Poincaré models makes them equivalent in their embedding capabilities. However, the Poincaré model facilitates visualization of the embedded points in a bounded disk, whereas the ’Loid model is an unbounded space.
2.2 Hyperbolic Distance Problems
In a metric hyperbolic distance problem, we want to find a point set , such that
for a subset of measured distances .
In many applications we have access to the true distances only through an unknown nonlinear map
; examples are connectivity strength of neurons
[36] or odor coocurrence statistics [23]. If all we know is that is a monotonically increasing function, then only the ordinal information has remained intact,This leads to nonmetric problems in which the measurements are in the form of binary comparisons [31].
For a set of binary distance comparisons of the form , we define the set of ordinal distance measurements as
We are now in a position to give a unified definition of metric and nonmetric embedding problems in a hyperbolic space.
Problem 1.
A hyperbolic distance geometry problem aims to find , given

a subset of pairwise distances such that

and/or a subset of ordinal distances measurements such that
where and .
We denote the complete sets of metric and nonmetric measurements by and .
3 Hyperbolic Distance Matrices
We now introduce hyperbolic distance matrices in analogy with Euclidean distance matrices to compactly encode interpoint distances of a set of points . The hyperbolic distance matrix (HDM) corresponding to the list of points is defined as
The th element of is hyperbolic distance between and , given by and for all .
HDMs are characterized by Lorentzian inner products which allows us to leverage the definition of an HGramian (Section 2.1.1). Given points , we compactly write the HDM corresponding to as
(8) 
where is an elementwise operator.
We now state our first main result: a semidefinite characterization of Gramians. This is a key step in casting HDGPs as rankconstrained semidefinite programs. [Semidefinite characterization of HGramian] Let be the hyperbolic Gram matrix for a set of points . Then,
where  
Conversely, any matrix that satisfies the above conditions is a hyperbolic Gramian for a set of points in . The proof is given in Section 3.
3.1 Solving for the HGramians
While creftypecap 1 could be formalized directly in domain, this approach is unfavorable as the optimization domain, , is a nonconvex set. What is more, the hyperbolic distances
(9) 
are nonlinear functions of with an unbounded gradient [25]. Similar issues arise when computing embeddings in other spaces such as Euclidean [14] or the space of polynomial trajectories [37]. A particularly effective strategy in the Euclidean case is the semidefinite relaxation which relies on the simple fact that the Euclidean Gramian is positive semidefinite. We thus proceed by formulating a semidefinite relaxation for hyperbolic embeddings based on Section 3.
Solving the HDGP involves two steps, summarized in Algorithm 1:

Complete and denoise the HDM via a semidefinite program;

Compute an embedding of the clean HDM: we propose a closedform spectral factorization method.
Note that step (2) is independent of step (1): given accurate hyperbolic distances, spectral factorization will give the points that reproduce them. However, since the semidefinite relaxation might give a Gramian with a higher rank than desired, eigenvalue thresholding in step (2) might move the points off of
. That is because eigenvalue thresholding can violate the necessary condition for the hyperbolic norm, , or in Section 3. We fix this by projecting each individual point to . The spectral factorization and the projection are summarized in Algorithms 3 and 2.Let be the measured noisy and incomplete HDM, with unknown entries replaced by zeroes. We define the mask matrix as
This mask matrix lets us compute the loss only at those entries that were actually measured. We use the semidefinite characterization of hyperbolic Gramians in Section 3 to complete and denoise the measured HDM, and eventually solve HDGP.
Although the set of hyperbolic Gramians for a given embedding dimension is nonconvex due to the rank constraints, discarding the rank constraints results in a straightforward semidefinite relaxation.
minimize  
w.r.t  
subject to  
Cost function  Parameters  Applications 

,  
,  
,  Lowrank hyperbolic embedding [38, 39, 40, 41]  
Ordinal outlier removal [42, 43, 44], 

Robust hierarchical embedding [5, 45]  
Anomaly detection in weighted graphs [46] 
However, if we convexify the problem by simply discarding the rank constraints, then all pairs become a valid solution. On the other hand, since
we can eliminate this ambiguity by promoting lowrank solutions for and . While directly minimizing
(10) 
is NPhard [47], there exist many approaches to make (10) computationally tractable, such as trace norm minimization [48], iteratively reweighted least squares minimization [40]
, or the logdet heuristic
[41] that minimizes the following smooth surrogate for (10):where is a small regularization constant. This objective function is linearized as
for and , which can be iteratively minimized^{2}^{2}2In practice, we choose a diminishing sequence of .. In our numerical experiments we will uset he trace norm minimization unless otherwise stated. Then, we enforce the data fidelity objectives and the properties of the embeddings space (Proposition 3) in the form of a variety of constraints:

Metric embedding: The quadratic constraint
makes sure the hyperbolic Gramian, , accurately reproduces the given distance data.

Nonmetric embedding: The ordinal measurement constraint of
is simply a linear constraint in form of
where and . In practice, we replace this constraint by to avoid trivial solutions.

’Loid model: The unit hyperbolic norm appears as a simple linear constraint
which guarantees that the embedded points reside in sheets . Finally, enforces all embedded points to belong to the same hyperbolic sheet, i.e. for all .
This framework can serve as a bedrock for multitude of other data fidelity objectives. We can seamlessly incorporate outlier removal schemes by introducing slack variables into the objective function and constraints [42, 43, 44]. For example, the modified objective function
can be minimized subject to and as a means of removing outlier comparisons (we allow some comparisons to be violated; see Section 4.3 for an example).
We can similarly implement outlier detection in metric embedding problems. As an example, we can adapt the outlier pursuit algorithm
[49]. Consider the measured Gramian of a point set with a few outlierswhere is outlierfree hyperbolic Gramian, is a matrix with only few nonzero columns and represents the measurement noise. Outlier pursuit aims to minimize a convex surrogate for
where is the number of nonzero columns of ; more details and options are given in Table 3.
3.2 Lowrank Approximation of HGramians
From Section 3, it is clear that the rank of a hyperbolic Gramian of points in is at most . However, the HGramian estimated by the semidefinite relaxation in Algorithm 2 does not necessarily have the correct rank. Therefore, we want to find its best rank approximation, namely , such that
(11) 
In Algorithm Algorithm 3 we propose a simple but suboptimal procedure to solve this lowrank approximation problem. Unlike iterative refinement algorithms based on optimization on manifolds [38], our proposed method is oneshot. It is based on the spectral factorization of the the estimated hyperbolic Gramian and involves the following steps:

Step 1: We find a set of points in , whose Lorentz Gramian best approximates ; See Section 2.1.1 and lines to of Algorithm 3. In other words, we relax the optimization domain of (11) from to ,

Step 2: We project each point onto , i.e.
This gives us an approximate rank hyperbolic Gramian, ; see Figure 2 and Section 6.3.

,

is the top th element of for .
The first step of lowrank approximation of a hyperbolic Gramian can be interpreted as finding the positions of points in (not necessarily on ) whose Lorentz Gramian best approximates .
3.3 Spectral Factorization of HGramians
To finally compute the point locations, we derive a spectral factorization method to estimate point positions from their Lorentz Gramian (line of Algorithm 3). This method exploits the fact that Lorentz Gramians have only one nonpositive eigenvalue (see Section 6.2 in the appendix) as detailed in the following proposition. Let be a hyperbolic Gramian for , with eigenvalue decomposition , and eigenvalues ^{3}^{3}3An HGramian is a Lorentz Gramian.. Then, there exists an unitary matrix such that . The proof is given in Section 6.2. Note that regardless of the choice of , will reproduce and thus the corresponding distances. This is the rigid motion ambiguity familiar from the Euclidean case [33]. If we start with an Gramian with a wrong rank, we need to follow the spectral factorization by Step 2 where we project each point onto . This heuristic is suboptimal, but it is nevertheless appealing since it only requires a single oneshot calculation as detailed in Section 6.3.
4 Experimental Results
In this section we numerically demonstrate different properties of Algorithm 1 in solving HDGPs. In a general hyperbolic embedding problem, we have a mix of metric and nonmetric distance measurements which can be noisy and incomplete.
4.1 Missing Measurements
Missing measurements are a common problem in hyperbolic embeddings of concept hierarchies. For example, hyperbolic embeddings of words based on Hearstlike patterns rely on cooccurrence probabilities of word pairs in a corpus such as WordNet
[50]. These patterns are sparse since word pairs must be detected in the right configuration [7]. In perceptual embedding problems, we ask individuals to rate pairwise similarities for a set of objects. It may be difficult to collect and embed all pairwise comparisons in applications with large number of objects [31].The proposed semidefinite relaxation gives a simple way to handle missing measurements. The metric sampling density of a measured HDM is the ratio of the number of missing measurements to total number of pairwise distances, . We want to find the probability of successful estimation given a sampling density . In practice, we fix the embedding dimension, , and the number of points, , and randomly generate a point set, . A trial is successful if we can solve the HDGP for noisefree measurements and a random mask of a fixed size so that the estimated hyperbolic Gramian has a small relative error,
We repeat for trials, and empirically estimate the success probability as where is the number of successful trials. We repeat the experiment for different values of and , see Figure 3.
For nonmetric embedding applications, we want to have consistent embedding for missing ordinal measurements. The ordinal sampling density of a randomly selected set of ordinal measurements is defined as . For a point set , we define the average relative error of estimated HDMs, as follows
where is the estimated HDM for ordinal measurements , and empirical expectation is with respect to the random ordinal set . We repeat the experiment for different realizations of (Figure 3). We can observe that across different embedding dimensions, the maximum allowed fraction of missing measurements for a consistent and accurate estimation increases with the number of points.
4.2 Weighted Tree Embedding
Treelike hierarchical data occurs commonly in natural scenarios. In this section, we want to compare the embedding quality of weighted trees in hyperbolic and the baseline in Euclidean space.
We generate a random tree with nodes, maximum degree of , and i.i.d. edge weights from ^{4}^{4}4The most likely maximum degree for trees with [51].. Let be the distance matrix for , where the distance between each two nodes is defined as the weight of the path joining them.
For the hyperbolic embedding, we apply Algorithm 2 with logdet heuristic objective function to acquire a lowrank embedding. On the other hand, Euclidean embedding of is the solution to the following semidefinite relaxation
minimize  (12)  
w.r.t  
subject to 
where , and is the entrywise square of . This SDR yields a minimum error embedding of , since the embedded points can reside in an arbitrary dimensional Euclidean space.
The embedding methods based on semidefinite relaxation are generally accompanied by a projection step to account for the potentially incorrect embedding dimension. For hyperbolic embedding problems, this step is summarized in Algorithm 3
, whereas it is simply a singular value thresholding of the Gramian for Euclidean problems. Note that the SDRs always find a
dimensional embedding for a set of points; see Algorithm 2 and (12). In this experiment, we define the optimal embedding dimension aswhere is the distance matrix for embedded points in (or ), and . This way, we accurately represent the estimated distance matrix in a low dimensional space. Finally, we define the relative (or normalized) error of embedding in dimensional space as
We repeat the experiment for randomly generated trees with a varying number of vertices . The hyperbolic embedding yields smaller average relative embedding error compared to Euclidean embedding, see Figure 4. It should also noted that the hyperbolic embedding has a lower optimal embedding dimension, even though the lowrank hyperbolic Gramian approximation is suboptimal.
4.3 Odor Embedding
In this section, we want to compare hyperbolic and Euclidean nonmetric embeddings of olfactory data following the work of Zhou et al. [23]. We conduct identical experiments in each space, and compare embedding quality of points from Algorithm 2 in hyperbolic space to its semidefinite relaxation counterpart in Euclidean space, namely generalized nonmetric MDS [31].
We use an olfactory dataset comprising monomolecular odor concentrations measured from blueberries [52]. In this dataset, there are odors across the total of fruit samples. Like Zhou et al. [23], we begin by computing correlations between odor concentrations across samples [23]. The correlation coefficient between two odors and is defined as
where , is the concentration of th odor in th fruit sample, is total number of fruit samples and .
The goal is to find an embedding for odors (or ) such that
where,
The total number of distinct comparisons grows rapidly with the number of points, namely million. In this experiment, we choose a random set of size for to have the sampling density of ^{5}^{5}5In hyperbolic embedding, this is the ratio of number of ordinal measurements to number of variables, i.e. ., which brings the size of ordinal measurements to .
We ensure the embedded points do not collapse by imposing the following minimum distance constraint for all ; this corresponds to a simple linear constraint in the proposed formulation. An ideal order embedding accurately reconstructs the missing comparisons. We calculate the percentage of correctly reconstructed distance comparisons as , where is the complete ordinal set corresponding to a dimensional embedding.
A simple regularization technique helps to remove outlier measurements and improve the generalized accuracy of embedding algorithms. We introduce the parameter to permit SDR algorithms to dismiss at most percent of measurements, namely
where .
Space  

Hyperbolic  
Euclidean  
In Figure 5, we show the embedded points in and with different levels of allowable violated measurements. We can observe in Table 4 that hyperbolic space better represent the structure of olfactory data compared to Euclidean space of the same dimension. This is despite the fact that the number of measurements per variable is in favor of Euclidean embedding, and that the low rank approximation of hyperbolic Gramians is suboptimal. Moreover, if we remove a small number of outliers we can produce more accurate embeddings. These results corroborate the statistical analysis of Zhou et. al. [23] that aims to identify the geometry of the olfactory space. ^{6}^{6}6Statistical analysis of Betti curve behavior of underlying clique topology [36].
5 Conclusion
We introduced hyperbolic distance matrices, an analogy to Euclidean distance matrices, to encode pairwise distances in the ’Loid model of hyperbolic geometry. Same as in the Euclidean case, although the definition of hyperbolic distance matrices is trivial, analysing their properties gives rise to powerful algorithms based on semidefinite programming. We proposed a semidefinite relaxation which is essentially plugandplay: it easily handles a variety of metric and nonmetric constraints, outlier removal, and missing information and can serve as a template for different applications. Finally, we proposed a closedform spectral factorization algorithm to estimate the point position from hyperbolic Gramians. Several important questions are still left open, most notably the role of the isometries in the ’Loid model and the related concepts such as Procrustes analysis.
Reproducible Research
The research described in this paper is completely reproducible. The experimental code is written in Python with the help of the cvxpy environment [53]. Code and data and documentation to reproduce the experimental results are available at https://github.com/swingresearch/hdm.
Ackowledgement
We would like to thank Lav Varshney for bringing our attention to hyperbolic geometry and for critical reading and comments on the manuscript.
6 Appendices
6.1 Proof of Proposition 3
A hyperbolic Gramian can be written as for a . Let us rewrite it as
where is the th row of , and are positive semidefinite matrices. We have and . On the other hand, we have
where is the th element of , , and (a) is due to , and (b) results from CauchyShwartz inequality. The equality holds for , which yields the condition.
Conversely, let , where , , and . Let us write and for . Then, we define
where for all . By construction, we have , and
Finally, guarantees that for all . We prove the contrapositive statement. Let and belong to different the hyperbolic sheets, e.g. . Then,
where (a) is due to CauchyShwartz inequality. This is in contradiction with condition. Therefore, belong to the same hyperbolic sheet, namely .
6.2 Derivations for Algorithm 3
Let be a hyperbolic Gramian, with eigenvalue decomposition
(13) 
where such that

,

is the th top element of for
The best rank Lorentz Gramian approximation of , in sense, is given by
where , , and