 # Analogies between the crossing number and the tangle crossing number

Tanglegrams are special graphs that consist of a pair of rooted binary trees with the same number of leaves, and a perfect matching between the two leaf-sets. These objects are of use in phylogenetics and are represented with straightline drawings where the leaves of the two plane binary trees are on two parallel lines and only the matching edges can cross. The tangle crossing number of a tanglegram is the minimum crossing number over all such drawings and is related to biologically relevant quantities, such as the number of times a parasite switched hosts. Our main results for tanglegrams which parallel known theorems for crossing numbers are as follows. The removal of a single matching edge in a tanglegram with n leaves decreases the tangle crossing number by at most n-3, and this is sharp. Additionally, if γ(n) is the maximum tangle crossing number of a tanglegram with n leaves, we prove 1/2n2(1-o(1))<γ(n)<1/2n2. Further, we provide an algorithm for computing non-trivial lower bounds on the tangle crossing number in O(n^4) time. This lower bound may be tight, even for tanglegrams with tangle crossing number Θ(n^2).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

A drawing of a graph in the plane is a set of distinct points in the plane, one for each vertex of , and a collection of simple open arcs, one for each edge of the graph, such that if is an edge of with endpoints and , then the closure (in the plane) of the arc representing consists precisely of and the two points representing and . We further require that no edge–arc intersects any vertex point. The (standard) crossing number of is the number of pairs , where is a point of the plane, are arcs of representing distinct edges of such that . The crossing number of a graph is defined to be the minimum crossing number over all of its drawings.

Tanglegrams are well studied in the phylogenetics and computer science literature. A tanglegram of size is a triplet containing two rooted binary trees ( and ), each with leaves, and a fixed perfect matching between the two set of leaves. Two tanglegrams and are the same if there is a pair of tree-isomorphisms from to and from to that map each pair of matched leaves to a pair of matched leaves. A layout of a tanglegram is a straight-line plane drawing of the trees, the first drawn in the half plane with its leaves on the line and the second in the half plane with its leaves on the line , with a straight-line drawing of the matching edges between the leaves. The tangle crossing number of a tanglegram is the the minimum crossing number over all of its layouts, i.e. the minimum number of unordered pairs of crossing edges over all layouts. The tangle crossing number is related to the number of times parasites switch hosts  as well as the number of horizontal gene transfers .

Though tangle crossing numbers are crossing numbers of a very specific kind of drawing of a very specific class of graphs, a number of analogies are known between tangle crossing numbers and crossing numbers. As with the crossing numbers of general graphs , computing the tangle crossing number is NP-hard , even when both trees are complete binary trees . Testing whether a graph is planar can be done in polynomial (in fact linear) time . Analogously, testing for tangle crossing number 0 can also be done in linear time . Recently, Czabarka, Székely, and Wagner  gave an analogue of Kuratowski’s Theorem  for tanglegrams, characterizing tangle crossing number 0. Clearly, for a graph with edges we have , while for a tanglegram of size , . The expected crossing number of an Erdős-Rényi random graph for for any is where is the expected number of edges , and the expected tangle crossing number of a random and uniformly selected tanglegram with leaves is  , i.e. both of these quantities are as large as possible in order of magnitude.

We continue the study of the tangle crossing number with results which parallel results for graph crossing numbers. Hliněný and Salazar  studied the crossing number of 1-edge planar graphs (i.e. graphs in which there exists an edge whose removal results in a planar graph). For each , they define a 1-edge planar graph with vertices, edges, and crossing number . We find that the behavior is quite similar for the tangle crossing number. First we establish an upper bound for given any tanglegram and any matching edge . Then for each , we define a tanglegram of size with tangle crossing number for which there is a single matching edge whose removal yields a planar subtanglegram. In summary, we prove the following theorem in Section 3:

###### Theorem 1.

For any tanglegram of size and any matching edge in , let be the tanglegram induced by deleting the endpoints of and suppressing their (now degree two) neighbors. Then . This inequality is best possible, even when is planar.

We then examine the largest tangle crossing number of a tanglegram of size (an analogue of the crossing number of the complete graph on vertices). It is well known (e.g. by the crossing lemma or by the counting method) that the crossing number of the complete graph is . We prove the following result in Section 4:

###### Theorem 2.

For any tanglegram of size , . If is the maximum tangle crossing number among all tanglegrams of size , then

 limn→∞γ(n)(n2)=12.

Interestingly, the structure of a size tanglegram with maximum tangle crossing number remains unknown.

We conclude with a polynomial time algorithm for computing lower bounds on the tangle crossing number in Section 5. Drawing random tanglegrams of size

from a uniform distribution, we give computational evidence that these lower bounds are

with high probability, thus matching the result of Czabarka, Székely, Wagner

 that such a tanglegram has tangle crossing number with high probability.

## 2. Preliminaries

Before delving into the proofs of our main theorems, we need to establish some terminology and more formal definitions. A rooted binary tree is a tree in which one vertex is designated as the root and each vertex has either 0 or 2 children. A vertex with 0 children is a leaf. A vertex with 2 children is called an internal vertex. Thinking of the root as a common ancestor to all other vertices, the notions of descendant, parent, children and sibling become clear. If is a rooted binary tree, a subset of the leaves of induces a binary subtree , obtained from the smallest subtree of by suppressing all degree 2 vertices and choosing as the root of the vertex which was closest to the root of . For any internal vertex of , the subtree induced by the leaves which are descendants of is a clade of at . If the two children of are leaves, then the corresponding clade is called a cherry.

A tanglegram layout is a straight-line drawing in the plane of two rooted binary trees, and , each with leaves and a perfect matching between their leaves (each leaf of paired with a unique leaf of ) having the following properties:

• A plane drawing of appears in the half plane with only the leaves of on the line .

• A plane drawing of appears in the half plane with only the leaves of on the line .

• The matching is represented by a (straight-line) drawing of edges connecting each leaf of with the appropriate leaf of .

The crossing number of such a layout is precisely the number of unordered pairs of matching edges which cross. As there are matching edges, the crossing number is clearly at most .

To transform one layout into another, we define a switch. First observe that a layout induces a total order on the leaves of by the -coordinate of the leaves on the line . Now each internal vertex of has two children and . To make a switch at , redraw the tree so that in the new layout, the order of leaves and is reversed if and only if one was a leaf in the clade at and the other was a leaf in the clade at . The resulting tanglegram layout displays the new drawing of , an unchanged drawing of , and the corresponding straight-line drawing of the matching edges connecting the appropriate pairs of leaves. Switch operations at internal vertices of are defined analogously. Observe that the switch operation defines an equivalence relation on the set of tanglegram layouts and each equivalence class will be called a tanglegram, denoted by the triple .

Let be a tanglegram. The size of is the size of the matching (also the number of leaves in and the number of leaves in ). The tangle crossing number of , denoted , is the minimum number of pairs of edges that cross, among all layouts of . If has size then one can easily deduce that .

Given a tanglegram , a subset of induces a subtanglegram where is the subtree of induced by leaves of which are endpoints of edges in and is defined similarly.

We let to denote the maximum tangle crossing number among all tanglegrams of size . In addition, we utilize the now standard notation for the set .

## 3. Subtanglegrams of one size smaller

In a tanglegram of size , the tangle crossing number is at most (Theorem 2). Given a tanglegram with tangle crossing number close to this upper bound, on average, each matching edge crosses one fourth of all the other matching edges. We explore the maximum number of crossings a single edge could contribute to the overall tangle crossing number. Phrased another way, for any tanglegram of size and subtanglegram of size , we determine the maximum value of . The result is given in Theorem 3, an upper bound which Theorem 4 shows to be tight, even for tanglegrams with planar. These two theorems together complete the proof of Theorem 1.

Throughout this section, given a tanglegram and , we use to denote the subtanglegram of induced by edges in .

###### Theorem 3.

If is a tanglegram of size and is any matching edge of , then

 crt(T)−crt(T−e)≤n−3.
###### Proof.

We will proceed by induction on . First observe that if is a tanglegram of size at most then it is planar, and if is a tanglegram of size then  ; so the theorem is trivial when .

Let and suppose that in every tanglegram of size , each edge contributes at most to the tangle crossing number. Fix a tanglegram of size , and let be an arbitrary matching edge of . Say has endpoints in and in . Fix an optimal layout of with the fewest number of crossings.

In , let be the parent of and let be the clade at the other child of . (Similarly, define and .) There are two planar drawings of whose subdrawings of agree with the drawing of in , one with immediately above the leaves of and one with immediately below the leaves of . The ordering of the leaves of in each of these drawings of is exactly the same as the ordering of the leaves in the drawing of in . Further, one of these drawings of can be obtained from the other by making a switch at . A similar claim can be made about , , , and . Figure 1 uses dashed lines to indicate the two potential positions of and for in a drawing of .

We claim that there is drawing of using one of these two drawings of and one of these two drawings of in which matching edge crosses at most edges. This is sufficient to complete the proof as the number of crossings between two edges of in is exactly (because the underlying drawing of remained unchanged) which implies as desired.

First observe that and each have at least one leaf. There are two cases to consider: (1) and each have exactly one leaf and they are matched in , or (2) there is a leaf in and a leaf in which are not matched with one another.

For the first case, let be the edge matching the single leaf in with the single leaf in . Consider the drawing of with above and above so that is above . Suppose, for contradiction, that participates in strictly more than crossings in this drawing of . As does not cross itself or , matching edge must cross every other edge in . Since there are no leaves between the left endpoints of and and no leaves between the right endpoints of and , it follows that also participates in crossings in this drawing of . As the drawing of was optimal, we see that contributes to the tangle crossing number of tanglegram which had size . However, by the induction hypothesis, each edge in contributes at most crossings to , a contradiction.

For the second case, let be a leaf in and be a leaf in which are not matched to each other. We say (respectively, ) is “matched upward” if the leaf to which it is matched is at least as high as the lowest leaf of (respectively, ). The leaf (respectively, ) is “matched downward” if the leaf to which it is matched is no higher than the highest leaf of (respectively, ).

Let and be the matching edges, one with endpoint and the other with endpoint . If and are both matched upward (respectively, downward), draw the vertex below (respectively, above) and the vertex below (respectively, above) . On the other hand, if is matched to a leaf higher (lower) than the leaves of and is matched to a leaf lower (higher) than the leaves of , then draw directly below (above) the leaves of and directly above (below) the leaves of . In each of these cases, the edge crosses neither nor , and therefore crosses at most other edges, from which follows. ∎

Now we prove that the inequality in Theorem 3 is best possible. To do so, we present an infinite family of tanglegrams such that has size , tangle crossing number , and there exists a matching edge such that . We say is 1-edge tangle planar as is not planar but there is a matching edge such that the subtanglegram is planar. The two binary trees in are rooted caterpillars.

###### Definition 1.

The rooted caterpillar of size is the unique rooted binary tree with leaves such that there are two leaves of distance from the root and for each there is one leaf of distance from the root. (See Figure 2 for an example.)

###### Definition 2.

For each , we define the caterpillar tanglegram as follows: and are copies of the rooted caterpillar . We label the leaves of as , where is the leaf’s distance from the root. Since there are precisely two leaves at distance , we arbitrarily label one of these instead. Similarly, the leaves of are labeled using . Finally, we construct the matching . (See Figure 3 for an example.)

###### Theorem 4.

For each , the caterpillar tanglegram is 1-edge tangle planar and has tangle crossing number .

###### Proof.

Note that the tanglegram is clearly a planar tanglegram (see Figure 3), so is 1-edge tangle planar. The same drawing demonstrates that . It remains to show that . Suppose, for contradiction, that there is some for which . Furthermore, let be the least index witnessing this strict inequality. One can check by computer that for , so . Since contains a subdrawing of , . There are two cases for a fixed optimal drawing of : at least one matching edge in the set is part of a crossing or else none of them are.

In the latter case, only the edges , , and have crossings, and therefore , a contradiction.

In the former case, say the edge is part of a crossing. The subtanglegram induced by is isomorphic to and has tangle crossing number at most . It follows that , which contradicts the minimality of . ∎

## 4. Maximizing the crossing number

While a single edge in a tanglegram of size can contribute up to to the tangle crossing number, not all matching edges can realize this many crossings in a drawing which minimizes the tangle crossing number. The aim of this section is to better understand the maximum tangle crossing number among tanglegrams of the same size. To prove Theorem 2. We begin with the first part:

###### Theorem 5.

If is a tanglegram of size then Consequently,

###### Proof.

Let be a tanglegram. Suppose and let be a tanglegram layout of having crossings. By making a switch at every internal vertex in , we obtain a new layout of . Note that in , the plane drawing of can be viewed as a reflection of the drawing of in across the line , while the plane drawing of is the same in both and . For any unordered pair of edges in , and cross in if and only if they do not cross in . This implies that has exactly crossings. Since , every layout has at least crossings. Consequently, and .

Suppose that, contrary to our statement, . It follows from our proof so far that any layout of has crossings, and for any unordered pair of matching edges there is a layout in which they cross. Let be a cherry of with leaves and incident with matching edges . As noted above, and must cross in some layout of . From , we create a new layout by making a switch at the parent of and . The number of crossings in is , a contradiction. ∎

To complete the proof of Theorem 2, we prove

 liminfn→∞γ(n)(n2)≥1/2

by constructing for each a family of tanglegrams of size such that for any and large enough , for all .

We begin by constructing a family for each integer . Any is the result of the following procedure: Take an arbitrary -tuple of size rooted binary trees . Label the leaves of with labels arbitrarily. For each , identify the root of with leaf in and assign labels to the leaves of . The result is the rooted binary tree with leaves. Similarly, is built from with leaf labels . The matching is defined as .

Figure 4 shows a tanglegram in . The binary trees , , are marked by dashed rectangles. The tree is the subtree of consisting of the roots of , , and their ancestors. Note that the trees and need not be isomorphic. They are only isomorphic here because there is only one binary tree, up to isomorphism, with 3 leaves. Further, for any choice of two clades in and two clades in , there is at least one pair of edges which cross.

With a well-defined set of tanglegrams for each , we now define for any integer . Fix and choose such that . Let be the set of tanglegrams of size such that if and only if there is a tanglegram with a subtanglegram of . Figure 5 shows a tanglegram in . The tanglegram with bold edges is a subtanglegram in .

###### Theorem 6.
 liminfn→∞γ(n)(n2)≥12.
###### Proof.

First we show that for each and , . Observe that for each , is a clade of at one of the leaves of . Therefore in any tanglegram layout of all the leaves of appear forming a vertical consecutive block, for each . A similar assertion holds for the leaves of , . For any , there are 4 edges with both endpoints in the clades , , , and . Because the leaves in a single clade form a vertical consecutive block in any layout, either the edges and or the edges and form a crossing. As a result, .

As the tangle crossing number of each tanglegram in is at least , the tangle crossing number of each tanglegram in with is also at least .

Let and , so . Observe that for each tanglegram , . Therefore

 γ(n)(n2)≥maxT∈Tncrt(T)(n2)≥(k2)2((k+1)22) =12(1−2k+2)(1−2k+1)2 ≥12(1−2√n+1)(1−2√n)2.

As a result,

 liminfn→∞γ(n)(n2)≥liminfn→∞12(1−2√n+1)(1−2√n)2=12.

Theorems 5 and 6 complete the proof of Theorem 2.

## 5. Lower bound of the tangle crossing number

Let be a tanglegram of size . In this section, we present an algorithm which outputs a non-trivial lower bound for the tangle crossing number of in time. As we will show, this lower bound is tight for some tanglegrams with quadratic tangle crossing number. The algorithm runs in two phases. First it partitions the leaves of each tree into clades. In the second phase the clades are used to compute the lower bound for . Now we describe the algorithm for partitioning the leaves a given tree into clades, given a restriction on their size. Note that we use this algorithm independently for and .

Algorithm 1 can be implemented in time. This follows from noting that step 1 requires a post-order traversal of and each of steps 2 and 3 require a pre-order traversal of . Let be the set of vertices from step 2. Note that if , then is not an ancestor of and vice versa. It is easy to see that a consequence of this property is that the collection from step 3 is a partition of the leaves of into clades. Algorithm 2 below computes the lower bound for the tangle crossing number.

Note that Algorithm 2 runs in . This follows since steps 1 and 2 take time, step 3 takes time, and step 4 takes time. To prove correctness, suppose and are clades in and suppose and are clades in . Because these are clades, any layout of will have either the edges cross the edges or the edges cross the edges. As a result, these 4 clades will contribute at least to the tangle crossing number. Thus, as done in step 4, summing these minimums over all pairs of clades from and pairs of clades from , we obtain a lower bound on .

One may notice that Algorithm 2 depends on the choice of and . When , the choice of is optimal for the tanglegrams in from Section 4 described for the proof of Theorem 6. For each tree in these tanglegrams, Algorithm 1 finds the clades with leaves that were used to build these trees. With this clade partition, for all . So the tangle crossing number is at least by Algorithm 2. It is not hard to find tanglegrams in with tangle crossing number exactly . Thus the output of Algorithm 2 for the family of tanglegrams is tight.

We ran simulations for different choices of and with random tanglegrams drawn from a uniform distribution. Figure 6 shows the average lower bounds when . For each , we picked tanglegrams of size uniformly at random. The random sampling algorithm is a SageMath  implementation of Algorithm 3 from [2, p. 253]. The source code for our implementation is available at . Based on the simulations, it appears that yields better lower bounds. Figure 6. The average lower bound for crt(T) for different choices of CL and CR. The symbols s, m and l represent 4,√n, and n/2 respectively. The curve labeled ml represents the average output with CL=√n and CR=n/2.

In  it is shown that there exists such that a random tanglegram has tangle crossing number with high probability. Fitting the ll curve from Figure 6, the curve corresponding to , to a quadratic function via least squares yields . This suggests that the tangle crossing number of the random tanglegram is at least . For the same sample, a plot of the maximum lower bounds is fit by the curve . These two growth rates are to be compared with the upper bound of from Theorem 2.

Another way to view this process is to create an auxiliary bipartite multigraph with a vertex for each clade and the number of edges between two clades is the number of edges which match a vertex of one clade to a vertex of the other clade. We then restrict to straight-line drawings where the vertices of one partite set remain on the line and the vertices of the other partite set lie on the line . The minimum crossing number over all such drawings of this multigraph provides a lower bound on the crossing number of the tanglegram. However, Garey and Johnson  proved that even this problem on the auxiliary bipartite multigraph is NP-complete.

## 6. Open Questions and Further Work

Although the lower bound provided in Section 5 is tight for many small tanglegrams, we don’t expect it being close to the real answer all the time, since we are doing a polynomial time approximation to an NP-hard problem. One may notice that the lower bound is dependent on the choice of clades. While we made an arbitrary choice, we are interested in polynomial time algorithms to choose the clades for an optimized lower bound.

In Section 4, we provided a family of tanglegrams with crossing number asymptotically . While the tangle crossing number of tanglegrams in is at least , there are tanglegrams of size with larger tangle crossing number. Is it perhaps true that , at least for ? We remain interested in the maximum tangle crossing number over all tanglegrams of size .

## 7. Acknowledgements

The authors would like to extend their gratitude to the American Mathematical Society for organizing the Mathematics Research Community workshops where this work began. All authors were supported by the National Science Foundation under Grant Number DMS 1641020. Smith was also supported in part by NSF-DMS grant 1344199 and Székely was also supported by the NSF-DMS grants 1300547 and 1600811. Da Lozzo was supported by the U.S. Defense Advanced Research Projects Agency (DARPA) under agreement no. AFRL FA8750-15-2-0092. The views expressed are those of the authors and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

## References

•  F. Barrera-Cruz and J. C.-H. Lin. Codes for generating tanglegrams of size uniformly.
•  S. C. Billey, M. Konvalinka, and F. A. Matsen IV. On the enumeration of tanglegrams and tangled chains. J. Combin. Theory Ser. A, 146:239–263, 2017.
•  K. Buchin, M. Buchin, J. Byrka, M. Nöllenburg, Y. Okamoto, R. Silveira, and A. Wolff. Drawing (complete) binary tanglegrams - hardness, approximation, fixed-parameter tractability. Algorithmica, 62(1-2):309–332, 2012.
•  A. Burt and R. Trivers. Genes in Conflict: The Biology of Selfish Genetic Elements. Belknap Press, Cambridge, MA, 2008.
•  É. Czabarka, L. A. Székely, and S. Wagner. Inducibility in binary trees and crossings in random tanglegrams. SIAM J. Disc. Math., 31(3):1732–1750, 2017.
•  É. Czabarka, L. A. Székely, and S. Wagner. A tanglegram Kuratowski theorem. (submitted) https://arxiv.org/abs/1708.00309, 2017+.
•  The Sage Developers. SageMath, the Sage Mathematics Software System (Version 8.0), 2017.
•  H. Fernau, M. Kaufmann, and M. Poths. Comparing trees via crossing minimization. In S. Sarukkai and S. Sen, editors, Proc. 25th Intern. Conf. Found. Softw. Techn. Theoret. Comput. Sci. (FSTTCS’05), LNCS vol. 3821, pages 457–469. Springer-Verlag, 2005.
•  M. R. Garey and D. S. Johnson. Crossing number is NP-complete. SIAM J. Alg. Disc. Meth., 4(3):312–316, 1983.
•  M. S. Hafner and S. A. Nadler. Phylogenetic trees support the coevolution of parasites and their hosts. Nature, 332:258–259, 1988.
•  P. Hliněný and G. Salazar. On the crossing number of almost planar graphs. In M. Kaufmann and D. Wagner, editors, Graph Drawing. GD 2006. LNCS vol. 4372, pages 162–173. Springer Berlin Heidelberg, 2007.
•  J.E. Hopcroft and R.E. Tarjan. Efficient planarity testing. J. Assoc. Comput. Mach., 21(4):549–568, 1974.
•  K. Kuratowski. Sur le problème des courbes gauches en topologie. Fund. Math., 15:271–283, 1930.
•  J. Spencer and G. Tóth. Crossing numbers of random graphs. Random Structures and Algorithms, 21:347–358, 2002.