    # Reappraising the distribution of the number of edge crossings of graphs on a sphere

Many real transportation and mobility networks have their vertices placed on the surface of the Earth. In such embeddings, the edges laid on that surface may cross. In his pioneering research, Moon analyzed the distribution of the number of crossings on complete graphs and complete bipartite graphs whose vertices are located uniformly at random on the surface of a sphere assuming that vertex placements are independent from each other. Here we revise his derivation of that variance in the light of recent theoretical developments on the variance of crossings and computer simulations. We show that Moon's formula underestimates the true variance and provide exact formulae.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The shape of our planet can be approximated by a sphere with a radius of about miles. Many real transportation and mobility networks have vertices located on the surface of that sphere. These are examples of spatial networks, networks whose vertices are embedded in a space . In many transportation and mobility networks, the surface of the sphere is simplified as a projection on a plane  while in some other cases, e.g., air transportation networks [13, 10, 11], such an approximation is often not possible due to the long distances involved.

When vertices are embedded in some space, edges may cross. While crossings are exceptional in many spatial networks to the point of being neglectable , crossings can also be scarce but not neglectable in one-dimensional layouts of certain networks: syntactic dependency and RNA secondary structures, where vertices are arranged linearly (distributed along a line) [8, 5]. The former are networks whose vertices represent words of a sentence and the edges represent syntactic dependencies between them. These have become the de facto standard to represent the syntactic structure of sentences in computational linguistics  and the fuel of many quantitative studies [14, 19]. In RNA secondary structures, vertices are nucleotides A, G, U, and C, and edges are Watson-Crick (A-U, G-C) and (U-G) base pairs . In these one-dimensional networks, two edges cross whenever the endpoints’ positions are interleaved in the sequence.

Statistical properties of C, the number of edge crossings of a graph , have been studied in generic embeddings, denoted as , that meet three mathematical conditions : (1) only independent edges can cross (edges that do not share vertices), (2) two independent edges can cross in at most one point, and (3) if several edges of the graph, say edges, cross at exactly the same point then the amount of crossings equals . In our view, generic embeddings are two-fold: a space and a statistical distribution of the vertices in such space. In , the space is the surface of a sphere while the distribution of the vertices on that surface is uniformly random. Compact formulae for the expectation and the variance have been obtained . Here we apply such a framework to revise the problem of calculating the distribution of C in arrangements of vertices on the surface of a sphere. We use to denote the expectation of the number of crossings C, and to denote the variance of C in a generic layout .

In his pioneering research , J. W. Moon analyzed the properties of the distribution of C in uniformly random spherical arrangements (rsa), where vertices are arranged on the surface of a sphere uniformly at random and independently from each other, and edges become geodesics on the sphere’s surface. Specifically, Moon studied and , the expectation and variance of C in the random spherical layout, for two kinds of graphs: complete graphs of vertices, , and complete bipartite graphs, , with vertices in one partition and vertices in the other. His derivations of are straightforward. Borrowing the notation in , Moon obtained that the expectation of C is

 Ersa[C]=qδrsa,

where is the number of pairs of independent edges  and

is the probability that two independent edges cross. Indeed,

is a handle for the size of the set , consisting of the pairs of independent edges of a graph [18, 2]. Thanks to

 δrsa=1/8, (1)
 |Q(\complete)|=12(n2)(n−22)

and

 |Q(\compbip)|=2(n12)(n22),

Moon obtained

 Ersa[C(\complete)] =116(n2)(n−22), Ersa[C(\compbip)] =14(n12)(n22).

He also derived formulae for the variance of C for these two kinds of graphs, i.e.

 V(M)rsa[C(\complete)]=12(n2)(n−22)[1⋅764+2(n−42)⋅π2−864π2+4(n−4)⋅π2−864π2+2⋅−164],

which simplifies to

 V(M)rsa[C(\complete)] =3(n4)[564+π2−864π2(n−4)(n−1)] (2)

and also

 V(M)rsa[C(\compbip)]=116π2(n12)(n22)[(n1−1)(n2−1)(π2−8)+2(π2+4)], (3)

where the superscript is used to distinguish Moon’s work from our own derivations. Here we revise equations 2 and 3 in light of computer simulations and a recently introduced theoretical framework to investigate .

, we review Moon’s calculations, and compare them with our own with the help of numerical estimates of

in complete and complete bipartite graphs in sections 4.1 and 4.2 respectively. These numerical estimates confirm the correctness of our derivations and show that Moon’s (2)-(3) underestimate significantly the true variance. Section 5 discusses our findings and attempts to shed light on the origins of the inaccuracy of and . Section 6 details all the numerical methods involved in the numerical calculation of . This section is placed after the discussion to make the presentation of the main arguments more streamlined.

## 2 The variance of C in generic layouts

In , C was defined as a summation of pairwise crossings between independent edges, i.e.

 C=∑{e1,e2}∈Qα(e1,e2), (4)

where

is an indicator random variable that equals 1 whenever the independent edges

and cross in the given layout. This definition was used to derive the expectation of C as

 E∗[C]=qδ∗, (5)

where

 δ∗=E∗[α(e1,e2)] (6)

for two independent edges and embedded in the layout. Hereafter, an edge is a set of two vertices, denoted as . Since is an indicator random variable, is the probability that two independent edges cross in the given layout . For example, in Moon’s random spherical arrangement . Therefore, using (5) we obtain

 Ersa[C]=18q. (7)

Moreover, in uniformly random linear arrangements (rla), where the vertices of a graph are placed along a linear sequence uniformly at random, , and then

 Erla[C]=13q.

The same definition of C was used again in  to study the variance of C in a general layout, similarly to the way Moon did for the particular case of random spherical arrangements . In , the variance of C was expressed compactly as a summation over products between graph-dependent terms, the ’s, and layout-dependent terms, the ’s, i.e.

 V∗[C]=∑ω∈ΩfωE∗[γω]. (8)

Formally, the type of a product is obtained applying a function on a pair and then ,

 fω=∑q1∈Q∑q2∈QT(q1,q2)=ω1.

The crux to understand (8) are the layout-dependent terms, , actually a shorthand for

 E∗[γω]=E∗[α(e1,e2)α(e3,e4)|T({e1,e2},{e3,e4})=ω]−δ2∗, (9)

where is the type of the product for . The type of product is determined by the vertices forming the edges of as explained in detail in table 1. The set of all distinct types of products is

 Ω={00,01,021,022,03,04,12,13,24} (10)

following the encoding of each type in table 1.

The amount of products of type in the given graph satisfies 

 fω=aωnG(Fω),

where is a positive integer constant that depends only on , and is the number of subgraphs of isomorphic to the graph defined by the edges involved in the product of type . Figure 1 depicts all these graphs. Table 2 shows the values of the and the formal definition of each .

For the sake of brevity, we use the shorthand

 p∗,ω=E∗[α(e1,e2)α(e3,e4)|T({e1,e2},{e3,e4})=ω]. (11)

Since is an indicator random variable, is the probability that both pairs of independent edges cross in a generic embedding *. Figure 1: The subgraphs corresponding to each type of product of the form α(e1,e2)α(e3,e4). The type ω∈Ω (10) is indicated below them. In the product α(e1,e2)α(e3,e4), e1 and e2 share the same color and e3 and e4 share another color. Equally-colored edges are to cross when calculating α(e1,e2)α(e3,e4) and, by definition, belong to the same element of Q. Bi-colored edges (as in types 12, 13 and 24) belong to both elements of Q.

Therefore, as it was concluded in , in order to calculate the variance of C of a graph in a given layout one only needs to know the values of the ’s in (which are independent of the layout) and the values of the in the given layout (which are independent of the graph). The values of the ’s in complete and complete bipartite graphs, shown in table 3, have allowed to derive expressions for the variance of C that are valid for a generic embedding * . The substitution of the these values in (8) yields

 V∗[C(\complete)] =3(n4)((n−4)(n−5)(E∗[γ12]+4(E∗[γ021]+E∗[γ022])) (12) =+4(n−4)(E∗[γ13]+2E∗[γ03]) =+2E∗[γ04]+E∗[γ24])

and, likewise,

 V∗[C(\compbip)] =2(E∗[γ24]+E∗[γ04])(n12)(n22) (13) =+12(E∗[γ03]+E∗[γ13])[(n13)(n22)+(n12)(n23)] =+36(E∗[γ12]+E∗[γ022]+2E∗[γ021])(n13)(n23) =+24E∗[γ022][(n12)(n24)+(n14)(n22)].

A step further consists of instantiating the equations above replacing * by a uniformly random linear arrangements (rla). After calculating the values of the and substituting them into (12), one obtains 

 Vrla[C(\complete)]=0

as expected. Interestingly, the same approach on (13) produces

 Vrla[C(\compbip)]=190(n12)(n22)((n1+n2)2+n1+n2).

Next section shows how equivalent results can be obtained when replacing by a random spherical arrangement, which turns out to be a more complex case.

## 3 The variance of C in spherical random arrangements

Here we aim to calculate the values so as to establish the foundations to derive an arithmetic expression for the variance of C in uniformly random spherical arrangements of complete and complete bipartite graphs. Recall that, in this layout, vertices are distributed on the surface of a sphere uniformly at random, and edges become geodesics on that surface, i.e., great arcs (see section 6.2 for a detailed account of what we consider valid arc-arc intersections). We use the ’s to instantiate equations 12 and 13 and then obtain arithmetic expressions for and (section 3.4). Each is calculated via (9): once has been derived from 11, (6) is subtracted.

We first calculate the for the simplest cases. Thanks to  we have that and . Since ,

 prsa,24 =1/8 prsa,00 =prsa,01=1/64

and, following (9), we obtain

 Ersa[γ24]=7/64 (14) Ersa[γ00]=Ersa[γ01]=0. (15)

Furthermore , and then

 Ersa[γ04]=−1/64. (16)

Notice that, given a pair of edges , we can form a type 04 combining with or , which gives two configurations of type 04, and . For , we need both indicator variables to be for each pair of edges. However, if then it is impossible that or that .

So far we have calculated and for using arguments that can be applied to most layouts. Now we move on to for using an ad hoc approach for the spherical layout that is based on spherical trigonometry and integration over surfaces. Such surfaces are delimited by the edges that make up type (table 1).

We proceed gradually towards such aim. First, we introduce the relevant background from spherical trigonometry (section 3.1). Second, we propose a derivation for that is more detailed than that of Moon  (section 3.2). Finally, we proceed with the remainder of types of products namely , , , , (section 3.3).

### 3.1 Spherical trigonometry

We first recall some definitions and properties of spherical trigonometry . Henceforth, we assume a unit-radius sphere. Let , and be three points on a sphere of center , such that , and do not lie all together on a plane containing .

Points , , define a spherical triangle, denoted by , whose vertices are , and , and whose sides are the geodesics joining with , with , and with . Let , and denote the length of the sides , and , respectively (figure 2(a)).

For every point , let denote its antipodal vertex, that is, and are diametrically opposite. Thus, the segment goes through the center . Segment , the semicircle on the sphere containing , and , and the semicircle on the sphere containing , and delimit two disjoint regions on the sphere. The wedge is the one with smaller area. The angle of is the angle in defined by the planes containing the semicircles delimiting the wedge (figure 2(b)). The angles at vertices , and of a spherical triangle are the angles , and in of the wedges , and , respectively (figure 2(a)). Figure 2: (a) A spherical triangle with vertices A, B and C; sides α, β and γ, and angles a, b and c. (b) The wedge w(A;B,C) of angle α.

With this notation, the following formula relates the lengths , of two sides with the angles and of the spherical triangle

 cot(β)sin(α)=cos(α)cos(c)+sin(c)cot(b). (17)

Using this equation, from the length of two sides and the angle at the shared vertex of a spherical triangle, the remaining angles can be calculated. Concretely,

 cot(b)=cot(β)sin(α)−cos(α)cos(c)sin(c)

and, analogously,

 cot(a)=cot(α)sin(β)−cos(β)cos(c)sin(c).

Since this relation is used often in our calculations, let us define a function such that for any real numbers ,

 g(x,y,z)=arccot(cot(x)sin(y)−cos(y)cos(z)sin(z)). (18)

Let denote the area of a region of the sphere of radius 1. It is well-known that the area of a wedge is , where is the angle of the wedge, and the area of a spherical triangle is , where , and are the angles at vertices , and , respectively.

### 3.2 The probability that two edges cross

Let , and be three fixed points on the sphere and let be a random point. The geodesic crosses if and only if lies in the spherical triangle (figure 3(a)). Hence, the probability of this occurring is the area of the spherical triangle divided by the area of the sphere’s surface,

 Pr[AP and BC cross] =S(tr(A′,B,C))4π=S(w(A;B,C))−S(tr(A,B,C))4π =2a−(a+b+c−π)4π=a−b−c+π4π. (19) Figure 3: (a) Given three points A, B and C, and a random point P, the probability that the geodesic AP crosses the side BC is proportional to the area of the spherical triangle tr(A′,B,C). (b) Given two random points A and B, the probability that the geodesic AB has length α is proportional to the length of the circle that goes through B and lying on the plane perpendicular to the radius CA; notice that this circle has radius sinα if the radius of the sphere is 1.

Let denote the length of the geodesic defined by two random points on the sphere. The density function of is

 f(α)=12sin(α). (20)

Indeed, we know that and that must be proportional to the length of the circle obtained by intersecting the plane containing one of the points and perpendicular to the line that goes through the other point and the center of the sphere (figure 3(b)). Taking into account these facts, we conclude that satisfies (20).

Let be a geodesic of length . The probability that two random points lie on the different hemispheres determined by is , and the probability that the geodesic determined by two random points on different hemispheres cross a given arc of length is . Hence, the probability that a geodesic defined by two random points crosses a geodesic of length is

 Pr[s2 crosses s1| length of % s1 equal to α]=α/4π. (21)

Hence, the probability that two random edges on the sphere cross is, as already derived by Moon (1),

 δrsa=∫π0α4πf(α)dα=∫π0α8πsin(α)dα=1/8. (22)

### 3.3 The prsa,ω’s and the Ersa[γω]’s

Let be an element of type as described in table 1 (see also figure 1). Below we calculate for every . Table 4 summarizes the results on that are derived next analytically and confirms the accuracy of the results with the help of computer simulations. For a better understanding of the explanations below, we refer the reader to table 1 (where we describe the types of products following ), and figure 1 (that illustrates the graphs characterizing each type).

#### Case ω=12.

Let . By (20) and (21), the probability that two random edges and cross the edge of length is

 prsa,12=∫π0(α4π)2f(α)dα=132π2∫π0α2sin(α)dα=π2−432π2,

and then

 Ersa[γ12]=π2−864π2. (23)

#### Case ω=021.

Recall that , , and , with pairwise distinct. The relative position of , and can be given by three independent parameters: the length of the geodesic , the length of the geodesic and the angle of the wedge (figure 4 (a)).

On the one hand, given a random point , the probability that the edge crosses the edge whenever can be derived using (19) for the triangle when , and . Besides, this probability is the same for the opposite angle . On the other hand, by (21), the probability that the edge crosses the edge of length is . Therefore,

 prsa,021 =2∭π0a−b−c+π4π⋅β4πf(α)f(β)12πdαdβdc =2∭π0g(α,β,c)−g(β,α,c)−c+π4π⋅β32π2sin(α)sin(β)dαdβdc ≈0.013

and thus

 Ersa[γ021]=prsa,021−164≈−0.003. (24)

#### Case ω=13.

Recall that , and , with pairwise distinct. As in the preceding case, the relative position of , and can be given by three independent parameters: the length of the geodesic , the length of the geodesic and the angle of the wedge (figure 4 (a)). Moreover, given a random point (resp. ), the probability that the edge (resp. ) crosses the edge whenever can be derived using (19) for the triangle when , and . Besides, the probability of crossing is the same for the opposite angle . Therefore,

 prsa,13 =2∭π0(a−b−c+π4π)2f(α)f(β)12πdαdβdc =2∭π0(g(α,β,c)−g(β,α,c)−c+π4π)218πsin(α)sin(β)dαdβdc ≈0.031

and thus

 Ersa[γ13]=prsa,13−164≈0.016. (25) Figure 4: (a) Types 021 and 13: the relative position of points s, t and u is determined by parameters α, β and c. (b) Type 03: the relative position of points s, u and v is determined by parameters α, β and c. (c) Type 022: the relative position of points s, t, u and w is determined by parameters α, β, β′, c and c′. The spherical triangles tr(s,u,w) and tr(s,u,t) are drawn in black-blue and in black-red, respectively.

#### Case ω=03.

Assume that , , and , with pairwise distinct. Similarly as in the preceding case, the relative position of , and can be given by three independent parameters: the length of the geodesic , the length of the geodesic and the angle of the wedge (Figure 4(b)). Moreover, given a random point , the probability that the edge crosses whenever can be derived using (19) for the triangle when , and . Analogously, given a random point , the probability that the edge crosses can be derived using (19) for the triangle . Therefore,

 prsa,03 =2∭π0(a−b−c+π4π)(b−a−c+π4π)f(α)f(β)12πdαdβdc =2∭π0(g(α,β,c)−g(β,α,c)−c+π4π)(g(β,α,c)−g(α,β,c)−c+π4π) ==⋅18πsin(α)sin(β)dαdβ ≈0.01.

Thus

 Ersa[γ03]=prsa,03−164≈−0.0052. (26)

#### Case ω=022.

Recall that , , and , with pairwise distinct. The relative position of points , , and can be given by 5 independent parameters, , , , , and , where , and are the lengths of the geodesics , and , respectively; is the angle of the wedge ; and is the angle of the wedge , with (figure 4(c)).

Let , denote the angles at vertices and , respectively, of the spherical triangle and let , denote the angles at vertices and , respectively, of the spherical triangle . For and , given two random points and , the probability that crosses and the probability that crosses can be calculated using (19) for the triangles and , respectively. Since the probability of crossing is the same for the opposites angles of and , we derive that

 prsa,022 =4∫⋯∫π0(b−a−c+π4π)(b′−a′−c′+π4π) ==f(α)f(β)(β′)(12π)2dαdβdβ′dcdc′ =4∫⋯∫π0(g(β,α,c)−g(α,β,c)−c+π4π)(g(β′,α,c′)−g(α,β′,c′)−c′+π4π