N-sphere chord length distribution

11/20/2014 ∙ by Panagiotis Sidiropoulos, et al. ∙ UCL 0

This work studies the chord length distribution, in the case where both ends lie on a N-dimensional hypersphere (N ≥ 2). Actually, after connecting this distribution to the recently estimated surface of a hyperspherical cap SLi11, closed-form expressions of both the probability density function and the cumulative distribution function are straightforwardly extracted, which are followed by a discussion on its basic properties, among which its dependence from the hypersphere dimension. Additionally, the distribution of the dot product of unitary vectors is estimated, a problem that is related to the chord length.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The chord length distribution, when its ends lie on a locus, is a rather challenging problem, which still is only partially solved at present. However, recently a number of researchers have established closed-form solutions for a number of special locus types, such as the chord length distribution of a regular polygon [1], a parallelogram [6] and a cube [5]. Additionally, properties of the distribution in more generic scenarios have been examined and modelled, such as the average chord length in a compact convex subset of a n-dimensional Euclidean space [2].

The chord length distribution of a hypersphere is still not available, even though it presents not only mathematical interest, but also can be employed in a multitude of applications. For example, since by default normalised vectors lie on a hypersphere of radius , the -sphere chord length distribution could be used as a “randomness” benchmark to give a hint whether a set of normalised vectors is selected uniformly and independently. On the other hand, in applications where the input data are known only through their distances, the detection of

-dimensional hyperspheres through their “chord distribution footprint” could provide a low boundary in the dimension of the unknown input data. Moreover, this distribution could be used to train classifiers only with positive examples in cases where the negative examples can be modelled as uniform and independently selected distances.

Recently, S. Li achieved to produce a closed-form expression of the surface of a hyperspherical cap [3]. In this formula the hyperspherical-cap surface is given as a fraction of the total hypersphere surface. In this work it is proven that the chord length distribution in a hypersphere reduces to the ratio of the hyperspherical-cap surface over the total hypersphere surface. This proof, along with the resulting probability density function (pdf) and cumulative distribution function (cdf) are introduced in Section II, followed by the properties of the new distribution, as well as its dependence from the hypersphere dimension , in Section III-A. The (determined from the chord length distribution) unitary vector dot product distribution is examined in Section IV, while, Section V concludes this work.

Ii N-sphere chord length distribution

Let be points selected uniformly and independently from the surface of a -dimensional hypersphere of radius , i.e., . The pairwise Euclidean distances of , generate a set of distances (). We would like to estimate the distribution that the distance values follow. Since the distances are invariant to the coordinate system, it is assumed that this is selected so as the first end of the chord is always . The second end of the chord determines the chord length.

When the hypersphere is a circle and the objective degenerates to the known (e.g., [7]) circle chord length distribution. As a matter of fact, the pdf and the cdf are given by the following formulas:

(1)
(2)

However, in the general case (i.e., when ) there are no closed-forms expressions giving neither the pdf nor the cdf . In this paper we introduce these formulas. In order to achieve this, we need to prove that

Proposition II.1.

The locus of the -sphere points that have a constant distance from a fixed point on it is a -sphere of radius .

Proof.

The fixed point have coordinates . Then the points of the hypersphere that have distances from have coordinates that satisfy the following two equations:

(3)
(4)

By the above equations it follows that

(5)

By replacing to Eq. (3) we get that

(6)

The -sphere of Eq. (6) is the intersection of the

-sphere with the hyperplane

. Since , all points of the N-sphere with distance from , , lie “northern” to the -sphere (i.e., have larger value than the points in ), while all points with distance from , , lie “southern” to the -sphere. It can be deduced that the above hyperplane cuts the hypersphere into two parts, each defined by the distance from being smaller or larger than . By default, a hyperspherical cap is defined as the portion of a hypersphere that is cut by a hyperplane, thus the two above parts are hyperspherical caps. Consequently, it was shown that

Proposition II.2.

The locus of the -sphere points that have distance , from a point on it is a hyperspherical cap of radius .

Since the maximum distance the cap radius and the cap height form a right triangle (Fig. 1), the height of the cap is

(7)
Fig. 1: A hyperspherical cap and the relation of the maximum distance from a point , the hyperspherical cap height and its radius .

Recently it was proven from S. Li [3] that the surface of a hyperspherical cap (that is smaller than a hyper-hemisphere, i.e., ) is given by the following formula:

(8)

In Eq. (8), is the hypersphere dimension, its radius, the hypersphere surface, the colatitude angle [3] and the regularised incomplete beta function given by

(9)

The colatitude angle, the sphere radius and the hyperspherical cap satisfy the following [3]:

(10)

By replacing from Eq. (7) it follows that

(11)

Finally, from Proposition II.2, Eq. (8) and Eq. (11) it is deduced that

Proposition II.3.

The cumulative distribution function of the -sphere chord length is

(12)
Proposition II.4.

The probability density function of the -sphere chord length is:

(13)

Iii Some properties of the normalised vector distance distribution

Equations (12) and (13) give the cdf and the pdf of the -sphere chord length distribution, respectively. In this section, some of its basic properties are examined. Furthermore, we discuss how these properties vary with , especially when .

Iii-a Basic properties

Eq. (12) suggests that the pdf , and are as follows:

(14)
(15)
(16)

while the cdf , and are

(17)
(18)
(19)

When , then . Since , Eq. (12) implies that , i.e., that

Proposition III.1.

The median value of the -sphere chord length distribution is independent of and equal to .

The probability density functions and cumulative distribution functions for are shown in Figs. 2 and 3

, respectively. In order to estimate the distribution moments, we use the variable transform

, which leads to the following formula:

(20)
(a) (b) (c)
(d) (e) (f)
Fig. 2: The cumulative distribution functions. (a) (b) (c) (d) (e) (f) .
(a) (b) (c)
(d) (e) (f)
Fig. 3: The probability density functions. (a) (b) (c) (d) (e) (f) .

Eq. (20) gives the order moment of the -sphere chord length distribution. For , the mean value is given by

Proposition III.2.

The mean value of the -sphere chord length distribution is

(21)

On the contrary, for the second order moment the following proposition stands:

Proposition III.3.

The second order moment of the -sphere chord length distribution is independent of and equal to .

Proof.

By replacing in Eq. (20) we obtain the following:

(22)

where is the gamma function. Since it is known that for all the following property stands [8]:

(23)

By replacing in Eq. (22) it follows that . ∎

Propositions III.3 and III.2

imply that the variance

is

(24)

Finally, it should be noted that the Bertrand problem [4], which refers to the probability of a random chord being larger than the radius, can be answered by replacing in Eq. (12), i.e.,

(25)

Fig. 4 demonstrates as a function of . More results about the chord length distribution dependence from the hypersphere dimension are given in the next sub-section.

Fig. 4: The probability that the chord length of a hypersphere is smaller than the hypersphere radius, as a function of the hypersphere dimension.

Iii-B Dependence from the hypersphere dimension

The cdf of Eq. (12) can be recursively estimated using the following formula for regularised incomplete beta functions:

(26)

In the -sphere chord length distribution case , and . is a variable that takes values in the interval [0,1] when . This signifies that the second term of the right part of Eq. (26) is always positive in the interval , i.e., that the cdf scores that correspond to a fixed reduce with . Similarly, when the second term of the right part of Eq. (26) is always negative, which means that the cdf scores that correspond to a fixed increase with . Summarising:

Proposition III.4.

If is the probability that a chord length is smaller than , when both chord ends lie on a hypersphere of radius and dimension then

  • If , then ,

  • If , then .

Proposition III.4

can be used to bound the degrees of freedom in cases where the only input data are dissimilarity scores, if a process for generating uniform and independent scores is available. Moreover, the above proposition implies that as the dimension increases the pdf of Eq. (

13) becomes more concentrated around (Fig. 3).

The latter is established through the following property of :

Proposition III.5.
Proof.

The Stirling approximation for factorials suggests that

(27)

Through Stirling approximation, Eq. (21) becomes

(28)

i.e.,

(29)

It follows from propositions III.3 and III.5 that the variance limit when the dimension approaches infinity is zero, i.e.,

Proposition III.6.

Figures 5 and 6 show and as a function of , respectively. From proposition III.6 it follows that as the dimension increases the hypersphere chord lengths are concentrated within a small range around . In order to give a quantitative measure of the chord length range, the difference between the largest and the smallest

-quantile is employed. Note that this distance denotes the length of the interval from which the

most central values are selected. The results are shown on Table I. Table I demonstrates that when most chord lengths lie within a small range around . For example, the of chord lengths of a 256-sphere lie inside an interval of length around , i.e., an interval that spans only of the available distance range.

Fig. 5: The mean chord distance as a function of the hypersphere dimension .
Fig. 6: The chord distance variance as a function of the hypersphere dimension .
N () () () () ()
2 0.7321 1.0824 1.4142 1.5714 1.7943
3 0.4783 0.7321 1.0092 1.1637 1.4365
4 0.3781 0.5839 0.8173 0.9534 1.2101
8 0.2377 0.3703 0.5261 0.621 0.8102
16 0.1597 0.2494 0.3563 0.4222 0.5581
32 0.1102 0.1724 0.2468 0.293 0.389
64 0.077 0.1205 0.1727 0.2052 0.2731
128 0.0542 0.0848 0.1215 0.1444 0.1925
256 0.0382 0.0598 0.0857 0.1019 0.1358
TABLE I: Quantile ranges for different and (in parentheses is the percentage of distribution values that lie inside the range.

An intuitive analysis of the distribution change with can be done by considering the 3-dimensional sphere as a globe. In this case, the fixed point is on the North Pole, and the points having distance from are the points in the equator. Seen under this prism, the concentration of the distribution around denotes an expansion of the equatorial areas, in expense of the high latitude areas. Ideally, when “almost all” points of the hypersphere lie in the equator (meaning that the set of points within distance have infinitely more points than the set of points with distance either larger or smaller than ).

Iv Unitary vector dot product distribution

In the hypersphere of radius the distance of two points and and the dot product of the vectors and (where is the centre of the hypersphere) are connected via the following relation:

(30)

Since the distance distribution is already known, the dot product distribution can be estimated through Eq. (30). As a matter of fact, if is the cdf of the unitary dot product then

(31)

values are always positive, thus the above equation can be reformulated as

(32)

By replacing in Eq. (12) it follows that

(33)

The corresponding pdf is

(34)

The pdf is symmetric around

, thus all odd-order moments (among which, the average) are independent of

and equal to 0. For the even-order moments, i.e., when , the following formula stands:

(35)

V Conclusions

In this paper, the probability distribution that follow the

-sphere chord lengths was introduced. After estimating the pdf and the cdf, some of its basic properties were examined, among which its dependency from the dimension of the hypersphere. Starting from it, the distribution of the dot product of two randomly selected unitary vectors was also estimated.

References

  • [1] U. Basel. Random chords and points distances in regular polygons. Acta Mathematics Universitatis Comeniae, 83(1):1–18, 2014.
  • [2] B. Burgstaller and F. Pillichshammer. The average distance between two points. Bulletin of the Australian Mathematical Society, 80:353–359, 2009.
  • [3] S. Li. Concise formulas for the area and volume of a hyperspherical cap. Asian Journal of Mathematics and Statistics, 4(1):66–70, 2011.
  • [4] A. Papoulis.

    Probability, Random Variables, and Stochastic Processes

    .
    McGraw-Hill, 1984.
  • [5] J. Philip. The probability distribution of the distance between two random points in a box. Technical report, Royal Institute of Technology, Stockholm, 2007.
  • [6] S. Ren. Chord length distribution of parallelograms. Master’s thesis, China: Wuhan University of Science and Technology, May 2012.
  • [7] E. Weisstein. Circle line picking, from mathworld–a wolfram web resource. http://mathworld.wolfram.com/CircleLinePicking.html.
  • [8] E. Weisstein. Gamma function, from mathworld–a wolfram web resource. http://mathworld.wolfram.com/GammaFunction.html.