The chord length distribution, when its ends lie on a locus, is a rather challenging problem, which still is only partially solved at present. However, recently a number of researchers have established closed-form solutions for a number of special locus types, such as the chord length distribution of a regular polygon , a parallelogram  and a cube . Additionally, properties of the distribution in more generic scenarios have been examined and modelled, such as the average chord length in a compact convex subset of a n-dimensional Euclidean space .
The chord length distribution of a hypersphere is still not available, even though it presents not only mathematical interest, but also can be employed in a multitude of applications. For example, since by default normalised vectors lie on a hypersphere of radius , the -sphere chord length distribution could be used as a “randomness” benchmark to give a hint whether a set of normalised vectors is selected uniformly and independently. On the other hand, in applications where the input data are known only through their distances, the detection of
-dimensional hyperspheres through their “chord distribution footprint” could provide a low boundary in the dimension of the unknown input data. Moreover, this distribution could be used to train classifiers only with positive examples in cases where the negative examples can be modelled as uniform and independently selected distances.
Recently, S. Li achieved to produce a closed-form expression of the surface of a hyperspherical cap . In this formula the hyperspherical-cap surface is given as a fraction of the total hypersphere surface. In this work it is proven that the chord length distribution in a hypersphere reduces to the ratio of the hyperspherical-cap surface over the total hypersphere surface. This proof, along with the resulting probability density function (pdf) and cumulative distribution function (cdf) are introduced in Section II, followed by the properties of the new distribution, as well as its dependence from the hypersphere dimension , in Section III-A. The (determined from the chord length distribution) unitary vector dot product distribution is examined in Section IV, while, Section V concludes this work.
Ii N-sphere chord length distribution
Let be points selected uniformly and independently from the surface of a -dimensional hypersphere of radius , i.e., . The pairwise Euclidean distances of , generate a set of distances (). We would like to estimate the distribution that the distance values follow. Since the distances are invariant to the coordinate system, it is assumed that this is selected so as the first end of the chord is always . The second end of the chord determines the chord length.
When the hypersphere is a circle and the objective degenerates to the known (e.g., ) circle chord length distribution. As a matter of fact, the pdf and the cdf are given by the following formulas:
However, in the general case (i.e., when ) there are no closed-forms expressions giving neither the pdf nor the cdf . In this paper we introduce these formulas. In order to achieve this, we need to prove that
The locus of the -sphere points that have a constant distance from a fixed point on it is a -sphere of radius .
The fixed point have coordinates . Then the points of the hypersphere that have distances from have coordinates that satisfy the following two equations:
By the above equations it follows that
By replacing to Eq. (3) we get that
The -sphere of Eq. (6) is the intersection of the
-sphere with the hyperplane. Since , all points of the N-sphere with distance from , , lie “northern” to the -sphere (i.e., have larger value than the points in ), while all points with distance from , , lie “southern” to the -sphere. It can be deduced that the above hyperplane cuts the hypersphere into two parts, each defined by the distance from being smaller or larger than . By default, a hyperspherical cap is defined as the portion of a hypersphere that is cut by a hyperplane, thus the two above parts are hyperspherical caps. Consequently, it was shown that
The locus of the -sphere points that have distance , from a point on it is a hyperspherical cap of radius .
Since the maximum distance the cap radius and the cap height form a right triangle (Fig. 1), the height of the cap is
Recently it was proven from S. Li  that the surface of a hyperspherical cap (that is smaller than a hyper-hemisphere, i.e., ) is given by the following formula:
The colatitude angle, the sphere radius and the hyperspherical cap satisfy the following :
By replacing from Eq. (7) it follows that
The cumulative distribution function of the -sphere chord length is
The probability density function of the -sphere chord length is:
Iii Some properties of the normalised vector distance distribution
Equations (12) and (13) give the cdf and the pdf of the -sphere chord length distribution, respectively. In this section, some of its basic properties are examined. Furthermore, we discuss how these properties vary with , especially when .
Iii-a Basic properties
Eq. (12) suggests that the pdf , and are as follows:
while the cdf , and are
When , then . Since , Eq. (12) implies that , i.e., that
The median value of the -sphere chord length distribution is independent of and equal to .
, respectively. In order to estimate the distribution moments, we use the variable transform, which leads to the following formula:
Eq. (20) gives the order moment of the -sphere chord length distribution. For , the mean value is given by
The mean value of the -sphere chord length distribution is
On the contrary, for the second order moment the following proposition stands:
The second order moment of the -sphere chord length distribution is independent of and equal to .
By replacing in Eq. (20) we obtain the following:
where is the gamma function. Since it is known that for all the following property stands :
By replacing in Eq. (22) it follows that . ∎
Fig. 4 demonstrates as a function of . More results about the chord length distribution dependence from the hypersphere dimension are given in the next sub-section.
Iii-B Dependence from the hypersphere dimension
The cdf of Eq. (12) can be recursively estimated using the following formula for regularised incomplete beta functions:
In the -sphere chord length distribution case , and . is a variable that takes values in the interval [0,1] when . This signifies that the second term of the right part of Eq. (26) is always positive in the interval , i.e., that the cdf scores that correspond to a fixed reduce with . Similarly, when the second term of the right part of Eq. (26) is always negative, which means that the cdf scores that correspond to a fixed increase with . Summarising:
If is the probability that a chord length is smaller than , when both chord ends lie on a hypersphere of radius and dimension then
If , then ,
If , then .
can be used to bound the degrees of freedom in cases where the only input data are dissimilarity scores, if a process for generating uniform and independent scores is available. Moreover, the above proposition implies that as the dimension increases the pdf of Eq. (13) becomes more concentrated around (Fig. 3).
The latter is established through the following property of :
The Stirling approximation for factorials suggests that
Through Stirling approximation, Eq. (21) becomes
Figures 5 and 6 show and as a function of , respectively. From proposition III.6 it follows that as the dimension increases the hypersphere chord lengths are concentrated within a small range around . In order to give a quantitative measure of the chord length range, the difference between the largest and the smallest
-quantile is employed. Note that this distance denotes the length of the interval from which themost central values are selected. The results are shown on Table I. Table I demonstrates that when most chord lengths lie within a small range around . For example, the of chord lengths of a 256-sphere lie inside an interval of length around , i.e., an interval that spans only of the available distance range.
An intuitive analysis of the distribution change with can be done by considering the 3-dimensional sphere as a globe. In this case, the fixed point is on the North Pole, and the points having distance from are the points in the equator. Seen under this prism, the concentration of the distribution around denotes an expansion of the equatorial areas, in expense of the high latitude areas. Ideally, when “almost all” points of the hypersphere lie in the equator (meaning that the set of points within distance have infinitely more points than the set of points with distance either larger or smaller than ).
Iv Unitary vector dot product distribution
In the hypersphere of radius the distance of two points and and the dot product of the vectors and (where is the centre of the hypersphere) are connected via the following relation:
Since the distance distribution is already known, the dot product distribution can be estimated through Eq. (30). As a matter of fact, if is the cdf of the unitary dot product then
values are always positive, thus the above equation can be reformulated as
By replacing in Eq. (12) it follows that
The corresponding pdf is
The pdf is symmetric around
, thus all odd-order moments (among which, the average) are independent ofand equal to 0. For the even-order moments, i.e., when , the following formula stands:
In this paper, the probability distribution that follow the-sphere chord lengths was introduced. After estimating the pdf and the cdf, some of its basic properties were examined, among which its dependency from the dimension of the hypersphere. Starting from it, the distribution of the dot product of two randomly selected unitary vectors was also estimated.
-  U. Basel. Random chords and points distances in regular polygons. Acta Mathematics Universitatis Comeniae, 83(1):1–18, 2014.
-  B. Burgstaller and F. Pillichshammer. The average distance between two points. Bulletin of the Australian Mathematical Society, 80:353–359, 2009.
-  S. Li. Concise formulas for the area and volume of a hyperspherical cap. Asian Journal of Mathematics and Statistics, 4(1):66–70, 2011.
Probability, Random Variables, and Stochastic Processes. McGraw-Hill, 1984.
-  J. Philip. The probability distribution of the distance between two random points in a box. Technical report, Royal Institute of Technology, Stockholm, 2007.
-  S. Ren. Chord length distribution of parallelograms. Master’s thesis, China: Wuhan University of Science and Technology, May 2012.
-  E. Weisstein. Circle line picking, from mathworld–a wolfram web resource. http://mathworld.wolfram.com/CircleLinePicking.html.
-  E. Weisstein. Gamma function, from mathworld–a wolfram web resource. http://mathworld.wolfram.com/GammaFunction.html.