1. Introduction
Although phylogenetic trees have been used as the standard model of evolution, phylogenetic networks have become popular amongst biologists as a tool to describe conflicting signals in data or uncertainty in evolutionary histories [4, 6, 9]. Therefore, when we wish to reconstruct the phylogenetic tree on a set of species from nontreelike data, a natural idea would be to describe the data using a phylogenetic network on and then remove extra arcs to discover an embedding of inside , where is called a ‘support tree’ of [6].
However, the above strategy only makes sense when is ‘treebased’, namely, is merely a tree with additional arc [6], which is not always the case [12]. In [6], Francis and Steel provided a lineartime algorithm for finding a support tree of if is treebased and reporting that it does not exist otherwise. Another lineartime algorithm for this decision problem was obtained by Zhang in [13].
While Francis and Steel’s work was followed by many studies (e.g., [1, 3, 4, 5, 7, 11, 13]), Hayamizu’s recent work [8] significantly advanced our understanding of how treebased networks could be useful in contemporary phylogenetic analysis. In fact, Hayamizu’s structure theorem has derived a series of lineartime and lineardelay algorithms for many basic problems (e.g., counting, enumeration and optimisation) on support trees, and has thus enabled various data analysis using treebased phylogenetic networks (see [8] for details).
In the present paper, we consider a socalled ‘top ranking problem’, with the aim to further facilitate the application of treebased phylogenetic networks. The problem is as follows: given a treebased phylogenetic network where each arc exists in the true evolutionary lineage with probability , list top support trees of in nonincreasing order by their likelihood values. We note that this problem is an important generalisation of the top ranking problem, which asks for a maximum likelihood support tree of and can be solved in linear time [8], since nearly optimal support trees can provide more biological insights than the maximum likelihood one.
At first glance, ranking top support trees may seem more difficult than picking arbitrary support trees, the latter of which is possible with linear delay [8]; however, in this paper, we provide a lineardelay (i.e., optimal) algorithm for the top ranking problem and thus reveal that the above two problems have the same time complexity, which is an interesting property of treebased phylogenetic networks.
2. Preliminaries
Throughout this paper, represents a nonempty finite set of presentday species. All graphs considered here are finite, simple, directed acyclic graphs. For a graph , and denote the sets of vertices and arcs of , respectively. A graph is called a subgraph of a graph if both and hold, in which case we write . When but , then is called a proper subgraph of . When and , is a spanning subgraph of . Given a graph and a nonempty subset of , is said to induce the subgraph of , that is, the one whose arcset is and whose vertexset consists of all ends of arcs in . For a graph with and a partition of , the collection of arcinduced subgraphs of is called a decomposition of . For an arc , and are called the tail and head of and are denoted by and , respectively. For a vertex of a graph , the indegree of in , denoted by , is defined to be the cardinality of the set . The outdegree of in , denoted by , is defined in a similar manner. For any graph , a vertex with is called a leaf of .
Definition 2.1.
A rooted binary phylogenetic network is defined to be a finite simple directed acyclic graph with the following properties:

has a unique vertex with and ;

is the set of leaves of ;

for any , holds.
In Definition 2.1, the vertex is called the root of , and a vertex with is called a reticulation vertex of . When has no reticulation vertex, is called a rooted binary phylogenetic tree.
Definition 2.2 ([6]).
If a rooted binary phylogenetic network that has a spanning tree that can be obtained by inserting zero or more vertices into each arc of a rooted binary phylogenetic tree , then is said to be treebased and is called a support tree of .
Theorem 2.3 ([6]).
Let be a rooted binary phylogenetic network and let be a subset of . Then, the subgraph of is a support tree of if and only if satisfies the following three conditions, in which case is called an ‘admissible’ arcset of . Moreover, there exists a onetoone correspondence between support trees of and admissible arcsets of .

contains all with or .

for any with , exactly one of is in .

for any with , at least one of is in .
In this paper, as the conditions in Theorem 2.3 still make sense for any subgraph of , we consider admissible arcsets of subgraphs of .
3. Known results: the structure of support trees
Here, we summarise without proofs the relevant material in [8]. A connected subgraph of a treebased phylogenetic network with is called a zigzag trail (in ) if there exists a permutation of such that for each , either or holds. Then, any zigzag trail in is specified by an alternating sequence of (not necessarily distinct) vertices and distinct arcs of , such as , which can be more concisely expressed as or in reverse order. A zigzag trail in is said to be maximal if contains no zigzag trail such that is a proper subgraph of . A maximal zigzag trail with even is called a crown if can be written in the cyclic form and is called a fence otherwise. Furthermore, a fence
with odd
is called an Nfence, in which case can be expressed as . A fence with even is called an Mfence if it can be written in the form , rather than .From now on, we represent a maximal zigzag trail by a sequence of the elements of that form the zigzag trail in this order, assuming that no confusion arises. Then, we can encode an arbitrary arcinduced subgraph of by an
dimensional vector. For example, for an Nfence
, the subgraph of induced by the subset is specified by the vector . With this notation, we can state Hayamizu’s structure theorem for treebased phylogenetic networks, which gives an explicit characterisation of the family of all admissible arcsets of as follows.Theorem 3.1 ([8]).
Any treebased phylogenetic network is uniquely decomposed into maximal zigzag trails , each of which is a crown, Mfence or Nfence. Moreover, a subgraph of is a support tree of if and only if is an admissible arcset of for any . Furthermore, the collection of support trees of is characterised by a direct product of families of the admissible arcsets of , namely, we have with
4. Top support tree ranking problem
Given a treebased phylogenetic network where each arc is chosen with probability , we can assign a ranking number to each support tree of by the likelihood value . In principle, the top support tree ranking problem for asks for an ordered set of support trees of such that holds for any support tree of other than (). However, such a ranking is not unique in general, since there can be ‘ties’ in the collection of support trees of as well as in the family of admissible arcsets of each maximal zigzag trail in . For convenience, we ensure the uniqueness of the ranking by using the lexicographical order on vectors as follows.
Assume that is a treebased phylogenetic network with as in Theorem 3.1 and that is any maximal zigzag trail in . We define the local ranking for to be a totally ordered set such that for any , holds if either or holds. Note that the elements of are dimensional vectors and any two of them are comparable lexicographically. From now, we identify the th element of with its local ranking number in order to write . Then, the elements of are vectors having the same dimension again and so we can break ties by using as before. Abusing the notation slightly, we call the totally ordered set the support tree ranking (for ). For any with , the top support tree ranking (for ) is defined to be a unique subsequence of the first elements of . Note that for any , one can determine in time whether or not holds [8].
Problem 4.1.
Top support tree ranking problem
Input:
A treebased phylogenetic network with associated probability and not exceeding the number of support trees of .
Output:
The top support tree ranking for .
5. Results
As a preliminary step, we prove the following proposition about the local ranking.
Proposition 5.1.
For any maximal zigzag trail in a treebased phylogenetic network with associated probability , the first element in the local ranking can be found in time. Moreover, given the th element in , one can find the th element in time.
Proof.
One can check in time whether is a crown, Nfence or Mfence. In the case when is a crown or Nfence, the local ranking for is trivial to compute as holds by Theorem 3.1. Assume that is an Mfence with . Also, let with for each and let for each . Then, holds for each . As one can obtain both and in time, computing the likelihood values for all requires time. This completes the proof. ∎
We define and for each . Recalling , we see that is a linear extension of the partially ordered set (i.e., implies ), where is the usual componentwise order on vectors (e.g., if and only if and ). We also note that this requires each to be an order ideal of (i.e., for any , implies ). These arguments lead to the following proposition.
Proposition 5.2.
Let be the top support tree ranking for a treebased phylogenetic network with associated probability and let be as defined above. Then, holds, and for each , there exists with .
Let be the unit vector such that th component is one and the others are all zeros. Also, for each , let be the first index such that the th component of is strictly greater than one and let . For example, gives . Then, we have the next lemma, which is illustrated in Figure 1.
Lemma 5.3.
Let be the support tree ranking for a treebased phylogenetic network with associated probability and let be a graph with and . Then, is a spanning tree of the Hasse diagram of such that is the root of and implies .
Proof.
It is clear that implies (and hence ). By construction, is a tree rooted at because holds and for each , there exists a unique element with . This completes the proof. ∎
In what follows, for any , we write to mean the least element of . For any , let and . Also, for any , let , where represents a unique element with . We note that both and are possible to occur.
Lemma 5.4.
Let be the graph as in Lemma 5.3 and let be a subset of that is recursively defined by
(1) 
Then, for each , we have and for all .
Proof.
Let and for . We will show that holds for any , which completes the proof, since and for all .
For , we have
where we assume that . Note that contains . This implies
For any , we have because holds. We thus obtain
From Equation (1) and , the desired conclusion follows. ∎
We are in a position to give an algorithm for Problem 4.1. As illustrated in Table 1, the algorithm starts by setting and and then returns for each , where is iteratively updated using Equation (1).
In order to analyse the running time of the above algorithm, let us review some basics of a priority queue, which is a data structure for maintaining objects that are prioritised by their associated values. In its most basic form, a priority queue supports the operations called Insert and Deletemin, where the former refers to adding a new object, and the latter to detecting and deleting the one with the highestpriority [2]. Implemented with a binary heap, each of these operations can be performed in time, where denotes the number of the elements in the priority queue [2].
Theorem 5.5.
The top support tree ranking problem (Problem 4.1) can be solved with linear delay, and hence in time.
Proof.
As Equation (1) implies that holds for any , holds for any . Then, if we keep the elements of each in a priority queue, time suffices to return and to delete from . Also, once and have been obtained, inserting the two elements requires time. We note that follows from . By Proposition 5.1, for each , one can compute in time, which equals time as is a decomposition of . Hence, our algorithm can return one after the other in such a way that the delay between two consecutive outputs is time. This completes the proof. ∎
Finally, we make two remarks. First, time is required to output distinct support trees of as each support tree has size . Therefore, the running time of our algorithm (as well as that of the enumeration algorithm in [8]) is , which guarantees the optimality of those algorithms. Second, as commonly in the literature (e.g., [10]), it would be natural to wonder about the time complexity of an analogue of Problem 4.1 that only asks for outputting a sequence of the differences between and ; however, we note that this problem still requires time because the size of each difference is . To illustrate this, consider a treebased phylogenetic network that is decomposed into maximal fences, each of which has only one admissible arcset, and crowns, each of which has size . The difference between any two support trees has size , which equals if is a constant.
References
 [1] M. Anaya, O. AnipchenkoUlaj, A. Ashfaq, J. Chiu, M. Kaiser, M. S. Ohsawa, M. Owen, E. Pavlechko, K. St. John, S. Suleria, K. Thompson, and C. Yap, On determining if treebased networks contain fixed trees, Bulletin of Mathematical Biology 78 (2016), no. 5, 961–969.
 [2] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to algorithms, MIT press, 2009.
 [3] M. Fischer, M. Galla, L. Herbst, Y. Long, and K. Wicke, Nonbinary treebased unrooted phylogenetic networks and their relations to binary and rooted ones, arXiv:1810.06853 [qbio.PE] (2018).
 [4] A. Francis, K. T. Huber, and V. Moulton, Treebased unrooted phylogenetic networks, Bulletin of mathematical biology 80 (2018), no. 2, 404–416.
 [5] A. Francis, C. Semple, and M. Steel, New characterisations of treebased networks and proximity measures, Advances in Applied Mathematics 93 (2018), 93–107.
 [6] A. R. Francis and M. Steel, Which phylogenetic networks are merely trees with additional arcs?, Systematic Biology 64 (2015), no. 5, 768–777.
 [7] M. Hayamizu, On the existence of infinitely many universal treebased networks, Journal of Theoretical Biology 396 (2016), 204–206.
 [8] by same author, A structure theorem for treebased phylogenetic networks, arXiv:1811.05849 [math.CO] (2018).
 [9] D. H. Huson, R. Rupp, and C. Scornavacca, Phylogenetic networks: concepts, algorithms and applications, Cambridge University Press, 2010.
 [10] S. Kapoor and H. Ramesh, Algorithms for enumerating all spanning trees of undirected and weighted graphs, SIAM Journal on Computing 24 (1995), no. 2, 247–265.
 [11] J. C. Pons, C. Semple, and M. Steel, Treebased networks: characterisations, metrics, and support trees, Journal of Mathematical Biology 78 (2019), no. 4, 899–918.
 [12] L. van Iersel, Different topological restrictions of rooted phylogenetic networks. Which make biological sense?, http://phylonetworks.blogspot.nl/2013/03/differenttopologicalrestrictionsof.html, 2013, Accessed: 20190316.
 [13] L. Zhang, On treebased phylogenetic networks, Journal of Computational Biology 23 (2016), no. 7, 553–565.
Acknowledgement
The first author acknowledges support from JST PRESTO Grant Number JPMJPR16EB.
Comments
There are no comments yet.