DeepAI
Log In Sign Up

Geometry of Comparisons

Many data analysis problems can be cast as distance geometry problems in space forms—Euclidean, elliptic, or hyperbolic spaces. We ask: what can be said about the dimension of the underlying space form if we are only given a subset of comparisons between pairwise distances, without computing an actual embedding? To study this question, we define the ordinal capacity of a metric space. Ordinal capacity measures how well a space can accommodate a given set of ordinal measurements. We prove that the ordinal capacity of a space form is related to its dimension and curvature sign, and provide a lower bound on the embedding dimension of non-metric graphs in terms of the ordinal spread of their sub-cliques. Computer experiments on random graphs, Bitcoin trust network, and olfactory data illustrate the theory.

READ FULL TEXT VIEW PDF

page 5

page 6

page 9

page 10

02/23/2016

Lens depth function and k-relative neighborhood graph: versatile tools for ordinal data analysis

In recent years it has become popular to study machine learning problems...
06/11/2015

Recovering metric from full ordinal information

Given a geodesic space (E, d), we show that full ordinal knowledge on th...
04/03/2015

Point Localization and Density Estimation from Ordinal kNN graphs using Synchronization

We consider the problem of embedding unweighted, directed k-nearest neig...
05/21/2021

Generalization Error Bound for Hyperbolic Ordinal Embedding

Hyperbolic ordinal embedding (HOE) represents entities as points in hype...
11/11/2020

Ordinally Consensus Subset over Multiple Metrics

In this paper, we propose to study the following maximum ordinal consens...
11/28/2022

Angular triangle distance for ordinal metric learning

Deep metric learning (DML) aims to automatically construct task-specific...
10/19/2020

On the Consistency of Metric and Non-Metric K-medoids

We establish the consistency of K-medoids in the context of metric space...

1 Introduction

Distances reveal the geometry of their underlying space. Even distance comparisons carry valuable information. Consider a set of points in space . In non-metric embedding problems, we have measurements of the form

where is the distance between and , and is an unknown non-linear and monotonically increasing (or decreasing) function. In such problems, only the order of these measurements is useful. We can interpret them as distance comparisons, since

In this paper, we address the following question:

What can we say about a space from distance comparisons alone?

Euclidean distance geometry problems (DGP) have a rich history in the literature from robotics [1, 2] and wireless sensor networks [3] to molecular conformations [4] and dimensionality reduction [5]. Typically, we want to find a representation for a set of measured distances in a Euclidean space [6]. Beyond Euclidean DGPs, there has recently been a surge in applications of hyperbolic geometry in data analysis, most notably as a natural space to work with hierarchical data. Social networks [7], gene ontologies [8], Hearst graph of hypernyms [9] and olfactory data [10] are all examples of hierarchical data structures. Similarly, spherical embedding aims to embed a set of objects on a (hyper)sphere given their dissimilarities [11]. Spherical embedding problems have various applications in astronomy [12], distance problems on Earth [13], and texture mapping [14]. To compute an embedding, we have to know the geometry of embedding space.

Euclidean, spherical and hyperbolic geometry are categorical examples of constant curvature spaces, known as space forms. A space form is characterized by its curvature and dimension. For non-metric embedding problems posed in space forms, we want to characterize these two properties from the measured distance comparisons. However, it is impossible to infer the magnitude of a space form’s curvature only based on distance comparisons. In other words, if a set of distance comparisons is realizable in a space form with curvature (or ), then we can find an equivalent embedding in a space form with curvature ( or ) for any positive .

In the literature, a related problem is to detect intrinsic structure in neural activity, invariant under nonlinear monotone transformations of measurements. Giusti et al. [15]

propose a method based on clique topology of the graph of correlations between pairs of neurons. Clique topology of a weighted graph describes the behavior of cycles in its order complex

222Order complex of a complete, weighted graph is a sequence of graphs where is the graph having vertices and no edges, has a single edge corresponding to the highest edge weight of , and each subsequent graph has an additional edge for the next-highest edge weight [15]. as a function of edge densities, also known as Betti curves. The statistical behavior of Betti curves can help distinguish random and geometric structures of size in Euclidean space. Zhou et al. [10] generalize this statistical approach to hyperbolic space.

Main contributions

In this paper, we propose a distribution-free approach to determine a lower bound on the embedding dimension of space forms with only distance comparisons. We show that ordering of distances inferred from comparisons contains information about the dimension of space forms. We introduce the ordinal capacity of a metric space, defined as follows: Ordinal capacity of a metric space is the maximum number of points such that

We prove that ordinal capacity characterizes the admissible patterns of ordinal measurements. Intuitively, in a Euclidean space with fixed dimension , we claim only a specific pattern of distance comparisons

is realizable. We show that the ordinal capacity of a space form is only related to its dimension and curvature sign. Then, we define the ordinal spread for a point set to describe the appearance pattern of vertex pairs in the sorted distance list

We prove that -point ordinal spread of space forms — the maximum ordinal spread of its point sets — is related to their ordinal capacity. The theoretical bounds on the -point ordinal spread of space forms give us a practical test to find a minimum Euclidean (and spherical) embedding dimension.

Notation

For any two numbers , we let and

be their maximum and minimum. We use small letters for vectors,

, and capital letters for matrices, . We denote the -th standard basis vector in by , and let be short for the set . For vectors , their dot product is denoted by , and their Lorentzian inner product is . Finally, and are all-zero and all-one vectors of appropriate dimensions. Let be a subset of a metric space , and ; We define

The cardinality of a discrete set is denoted by . The graph-theoretic notations simplifies the main results of this paper. For a graph , we denote its edge set as . Let be a complete -partite graph with part sizes . The Turán graph [16] is a complete -partite graph with vertices, and part sizes 333From , we have , .

Then, . 444This is simplified from . For , we assume the graph is complete and .

2 Non-metric distance problems in space forms

A space form is a complete, connected Riemannian manifold of dimension and constant sectional curvature. The hyperbolic, Euclidean and (hyper)spherical spaces are famous examples of space forms with constant negative, zero and positive curvatures. Space forms are equivalent to spherical, Euclidean, or hyperbolic spaces up to an isomorphism [17], see Table 1.

Hyperbolic (’Loid Model) Euclidean Hyperspherical
Table 1: Space form with distance function , and sectional curvature of a tangent subspace at point , . The curvature magnitude scales pairwise distances.

In general, distance geometry problems aim to find an embedding for a set of distance-related measurements in a metric space. They can be metric [18], non-metric [19], or unlabeled [20] depending on the data modality and application domain. In this paper, we focus on non-metric distance problems in space forms.

Problem 1.

Let be a space form with distance function . A non-metric space form distance geometry problem aims to find , given a subset of ordinal distances measurements such that

(1)

where .

For noise-free measurements, we can fully encode distance comparisons in a sorted list, namely

(2)

This list is not necessarily unique. A determinstic or a randomized binary sort algorithm needs at least pairwise comparisons to uniquely sort the distance list [21]. In this paper, we assume that such a list always exists, and is unique.

Consider a set of points that form a centered, regular pentagon shown in Figure 1 . The sorted distance list could be . We can summarize the labels appearing in the sorted distance list in the label matrix ,

(3)

where -th column represent the labels appearing at the -th position in the sorted distance list. The label matrix summarizes the appearance pattern of individual points in the distance list (2). Alternatively, we can represent this sequence in a binary label matrix . If appears in the -th position of the distance list (2), then . Otherwise, . In Figure 2, we show the binary label matrix (in green) associated with the ordered distance list (3). In the next section, we use the binary label matrix of measured data in creftypecap 1 to extract useful information about the geometry of underlying space.

Figure 1: : An example of a point configuration in . : Point sets in spherical , Euclidean and hyperbolic (Poincaré disk ) spaces with maximum (-th) ordinal spread.

3 Ordinal Spread

We consider identifying the embedding space in creftypecap 1. Specifically, we want to characterize the dimension of space form given a set of binary distance comparisons of the form (1). We focus on inferring geometrical information through binary label matrix associated with (1). Any such inference must be invariant with respect to arbitrary permutations of point labels. It will be useful to devise a canonical procedure to relabel point sets. We assign to each point a unique number in that corresponds to its first appearance in the sorted distance list. For the point set shown in Figure 1 , we have

according to (3). The following sequence shows the appearance order of points in the sorted distance list,

The canonical ordering of the labels assigns points and to the largest distance, to the second largest distance, etc. This procedure is illustrated as follows,

We can summarize this procedure as permuting the columns of binary label matrix by a permutation operator , see Figure 2.

Figure 2: Binary label matrix , before and after permutation .

This relabeling procedure can give us intuitions to extract geometrical information from a distance list. For instance, we show that the appearance pattern of new labels in the sorted distance list bears geometrical implications. Let us formalize this intuition by introducing -th ordinal spread for a point set. The -th ordinal spread of the point set is defined as

for the ordered distance list . Simply, we write , where no confusion can arise.

The -th ordinal spread of a point set is if the first appearance of the -th label in the ordered distance happens in position . In other words, we have

For the point set shown in Figure 1- with label matrix (3), we have

[] For any metric space , and points . We have

  • , ,

Let us devise an experiment to show how the -th ordinal spread can distinguish space forms. We randomly generate i.i.d. points from absolutely continuous distributions with full support in hyperbolic (’Loid ), Euclidean () and spherical () spaces.555

We use normal and uniform distributions for Euclidean and spherical spaces. For hyperbolic space, we project a normally distributed

onto the hyperboloid sheet, i.e. . For trials, we plot the -th ordinal spread for each realization , see Figure 3. We find the empirical maximum of to be a sensitive indicator for geometry of underlying space. While the emerging pattern of ’s is dependent on the distribution of point sets, the behavior of empirical maximum of -th ordinal spread is robust to the choice of point set distributions, as it converges to its supremum almost surely. Therefore, we introduce -point ordinal spread for a metric space – a novel concept to categorize space forms based on their ability to realize extremal ordinal patterns, in the sense of the following definition.

Figure 3: The -th ordinal spread of randomly generated points in -dimensional space forms.

Let be a metric space. The -point ordinal spread of is defined as

By definition, the ordinal spread number of a space form depends on extremal configurations of point sets. In Figure 1 , we show point sets with maximum (-th) ordinal spread of , see Section 3. In the next section, we introduce ordinally dense subsets, and show how they determine the -point ordinal spread of space forms.

4 Ordinal Capacity

Let be a set of distinct points in metric space . If

then we say that is an ordinally dense subset of , or in short . This definition formalizes the point configurations with maximum ordinal spread. A set of points is ordinally dense in if and only if it has a subset of points whose pairwise distances are all larger than (or equal to) their distances to the -th point. In other words, we have

The existence of an ordinally dense subset of size depends on the curvature sign and dimension of space forms. Hence, we want to find the maximum number of ordinally dense points in space forms. The ordinal capacity for a metric space is defined as

The ordinal capacity is an indicator of the capability of a metric space to accommodate different patterns of point labels. For space forms, this concept is intimately related to the famous spherical cap packing problem [22], as the proof of the following result shows (see Section 7.2). [] The ordinal capacity for a space form is given by

where . The ordinal capacity of a hyperbolic space is infinite. This implies that there exists an ordinally dense point set for any . In Poincaré model, a centered -gon with an extra point in the center is an ordinally dense set, see Figure 1 . In comparison, Euclidean and spherical spaces have a finite ordinal capacity, increasing exponentially666Their ordinal capacities have a lower bound of the form [23], see Section 7.2. with their dimension as given in Table 2. In Figure 1 , we show a regular hexagon with an extra point in the center. All pairwise distances in hexagon are larger or equal to their distances to the center. This point set configuration in fact achieves .

Table 2: Numerical values for .

The ordinal capacity can not be used to distinguish between and . However, it is possible to refine the ordinal capacity of spherical space if we only consider points set with ; See Section 7.2.3. [] The -point ordinal spread of a space form is given by

This theorem gives a universal upper bound on ordinal spread of point sets. We can use it to find a bound on minimum dimension for embedding in a space form. In practice, give a set of non-metric measurements associated with point set , we calculate the empirical -point ordinal spread as

(4)

where . Then, we can find a lower bound for Euclidean (or spherical) embedding dimension by computing

(5)

The ordinal capacity of hyperbolic spaces is infinite, regardless of their dimension. Hence, this test can not be used to give a lower bound on the dimension of hyperbolic space, as it always gives .

5 Numerical Results

In this section, we numerically illustrate a geometrical intuition for ordinal capacity number of Euclidean and hyperbolic spaces. Then, we experiment with popular real-world datasets, namely olfactory data [24] and Bitcoin Trust Network [25].

5.1 Stylized Experiments

We generate i.i.d. point sets from a normal distribution in -dimensional hyperbolic and Euclidean spaces. 777In ’Loid model, we generate random point where is normally distributed. For trials, we plot the ordinal spread of each realization and varying sizes of point sets

. The maximum ordinal spread of the generated point sets gives an estimate for the

-point ordinal spread of Euclidean and hyperbolic spaces, see Section 3. We repeat this experiment by fixing a point in the center of the coordinate system, and projecting the remaining points to their circumscribed circle, i.e. point sets where

where . The random points yield a more accurate estimate for the and , see Figure 4. We also show the individual points in sets with maximum ordinal spread accumulate on non-overlapping spherical caps of the circle, see Section 7.3. In the proof for Section 4, we show that there are strictly non-overlapping spherical caps for -dimensional Euclidean space, whereas this number is infinite for hyperbolic spaces. Therefore, ordinal capacity of a space is equal to the total number such caps plus the center point. The estimated -point ordinal spread of Euclidean space is close to the theoretical bound, e.g., we have , whereas the theoretical bound is . Finally, the estimated -point ordinal spread of a hyperbolic space matches its theoretical bound of .

Figure 4: Ordinal spread of i.i.d. point sets in and , in top and bottom rows. For a fixed , we show the point set with the maximum ordinal spread – for Figures and , for Figures and . The partitions in Figures and resemble the ordinally dense point sets shown in Figure 1 .

5.2 Geometry of Similarity Graphs

Generally, in non-metric embedding problems, the measurements are in form of similarities (or dissimilarities) between a set of entities. In this section, we want to experiment with olfactory [24] and Bitcoin Trust Network [25] datasets. The olfactory dataset contains mono-molecular odor concentrations of blueberries. There are odors across the total of fruit samples. The cross-correlations between mono-odor concentrations across samples represent the similarity measurements. The embedding goal is to find a representation for odors in a space form, such that

We summarize these distance comparisons in a non-increasing list of distances,

We randomly select up to different sub-cliques of size . In Figure 5 , we show the ordinal spread of each sub-clique. The maximum ordinal spread of these sub-cliques, , serves as a test for the -point ordinal spread of underlying space, (4). We compare with the theoretical values of , see (5). In this experiment, we show that the minimum dimension of Euclidean (and spherical) space must be at least .

Keeping a record of Bitcoin users’ reputation prevents transactions with fraudulent users. The Bitcoin OTC trust network is a weighted who-trusts-whom graph of people [25]. There are members in the network. The member rates another member an integer between (total distrust) to (total trust). This is normalized to a non-negative number in interval,

, and interpreted as the probability that user

trusts user . For a network with nodes, there could be up to of such trust probabilities. 888We assume each member trusts itself with probability of , and in general. If is unavailable for a pair , we replace it with the average trust probability of the network. To embed such probabilities, we relate the distance between two users to a function of their probability of mutual trust, i.e.

where .

Similarly, we randomly choose up to different sub-cliques size of . In Figure 5 , we show the ordinal spread of each sub-clique, along with their maximum value. The theoretical values for again suggests that the Euclidean embedding dimension must be at least . This estimate could be improved by sampling more sub-cliques since the total number of sub-clique grows rapidly with their size.

Figure 5: The ordinal spread of randomly chosen sub-cliques of size in olfactory dataset and Bitcoin Trust Network.

6 Conclusion

In this paper, we focus on inferring the geometry of space forms only from distance comparisons between a set of entities. We introduce novel notions such as ordinal capacity and spread for a metric space, as well as ordinally dense discrete sets. We provide a theoretical lower bound for the embedding dimension of Euclidean and spherical spaces. Our geometrical approach for studying embedding spaces in non-metric problems brings new perspective to design similar algorithms. Future works include finding a useful upper bound for embedding dimensions, and generalizing the results to hyperbolic spaces.

Broader Impact

This work provides a theoretical framework to identify the underlying geometry of space forms from distance comparisons. The authors believe that this study does not have any future societal impacts.

7 Appendices

7.1 Proof of Proposition 3

From Section 3, the values for and are trivial. The lower bound for simply follows from the uniqueness of pairwise distances. To put formally, we have

For the upper bound, is maximum when all smallest pairwise distances are incident to a unique point; For example, see Figure 1 . The total length of the distance list is . Therefore, we have

7.2 Proof of Theorem 4

Let us separately consider hyperbolic, Euclidean, and spherical spaces.

7.2.1 Hyperbolic space

Let , and be a set of parameterized points in ’Loid model of -dimensional hyperbolic space, such that

where , and . To see an example, see Figure 6. Therefore,

Therefore, for any , there exists a such that . Hence,

Figure 6: An example of parameterized points in and in .

7.2.2 Euclidean space

There is a set of points in such that

where and .

Proof.

Let be a set of points in such that

or . Without loss of generality, we assume and . Let and . We want to show that . Following the definition of ordinal spread, we have

where holds with equality if appears last in the sorted distance list, is due to . To prove inequality , let for distinct . Then,

where follows from , , , and follows from the symmetry in the argument. Therefore, we have

Hence, is an ordinally dense subset of . ∎

From Section 7.2.2, we want find an ordinally dense set of points in such that

and . From the definition of ordinal spread, we have

There, we can find a maximum number of ordinally dense points by solving a spherical cap packing problem, see Figure 7.

Figure 7: Spherical -cap packing on the surface of a unit sphere .

Let be the -dimensional unit sphere in . We define the spherical -cap as

for any .

The maximum number of non-overlapping is defined as

Therefore, we have