1 Introduction
Phylogenetic networks are a generalization of phylogenetic trees that are commonly used to represent the evolutionary histories of species that cross with one another or exchange genetic material, such as plants and viruses. There are several classes of phylogenetic networks and various ways have been devised to build them – see e.g. Elworth2019 ; steel2016phylogeny for recent surveys. Mathematically speaking, a phylogenetic network on a set of species is basically a directed acyclic graph, with a single source or root, such that every sink or leaf has indegree 1 and the set of leaves is equal to . In this paper, we shall only consider recoverable, binary networks (or networks for short), that is, phylogenetic networks that satisfy a certain condition on the ancestors of and in which the root has outdegree 2 and all other, nonleaf, vertices have degree 3 (see Figure 1 for some examples). Precise definitions are given in Section 2.
Recently, there has been growing interest in the problem of building a network with leafset from a collection of networks each of which having leafset equal to some subset of in such a way that the input networks are each contained in the final network. Early work on this socalled supernetwork problem focused on building up networks from phylogenetic trees, that is, phylogenetic networks whose underlying graph is a tree. Several results have been presented for this problem, including algorithms for constructing networks from triplets, which are 3leaved phylogenetic trees, (e.g. huber2010practical ) and from collections of phylogenetic trees all on leafset (e.g. willson2010regular ) – for a recent summary of these approaches see semple2021trinets . However, an important issue with this strategy is that phylogenetic trees do not necessarily encode phylogenetic networks, i.e., there are examples of distinct (nonisomorphic) networks that contain the same set of phylogenetic trees (see e.g. gambette2012encodings ), making it impossible to uniquely reconstruct such networks from their trees.
Motivated by this issue, in huber2013encoding it was proposed to build networks from collections of 3leaved networks, or trinets. In that paper, the authors focused on building level1 networks^{2}^{2}2In fact they considered the somewhat more general class of 1nested networks. where, in general, level networks are networks that can be converted into a tree by deleting at most arcs from each biconnected component. In particular, they showed that level1 networks are encoded by the trinets that they contain, and gave an algorithm for constructing a level1 network on from its trinets that is polynomial in (see also trilonet for a more general algorithm). In level2trinets the encoding result was extended to the more general class of level2 networks, and also to the distinct and quite broad class of socalled treechild networks. Recently, in semple2021trinets it was also shown that orchard networks, which generalise treechild networks, are encoded by their trinets, and an algorithm was given for constructing an orchard network from its trinets that is polynomial in the size of the vertex set of the network (whose size is not necessarily polynomial in ).
Intriguingly, in huber2015much it was shown that, as with trees, trinets do not encode networks in general. Indeed, in (semple2021trinets, , p.28) it was shown that even level4 networks are not encoded by their trinets and, since level2 networks are encoded by their trinets (see above), it was asked whether or not level3 networks are encoded by their trinets (see also dagstuhl ). In the first result of this paper we answer this question – in particular, the two networks and in Figure 1 are level3 and are easily seen to be distinct and to contain the same set of trinets (see leonie ). Hence, level networks are encoded by their trinets only if . As the algorithm in huber2013encoding can be used to uniquely reconstruct a level1 network from its trinets, this leaves open the question of finding a polynomial algorithm for building a level2 network from its trinets, which is the purpose of the rest of this paper. In particular, we shall present an algorithm which constructs a level2 network on from any set of trinets whose leafset union is that runs in time (Algorithm 1) and that is guaranteed to reconstruct a level2 network from its set of trinets (Theorem 3). We now proceed by presenting some preliminaries, after which we shall describe our level2 algorithm. We will conclude with a brief discussion of our results.
2 Preliminaries
We refer the reader to (steel2016phylogeny, , Chapter 10) for more information on the terminology and basic results on phylogenetic networks that we summarise in this section.
Definition 1.
Let be some finite set (corresponding to a set of species, say). A binary phylogenetic network (on ) is a directed acyclic graph with the following types of vertices: a single root with indegree 0 and outdegree 2; treevertices with indegree 1 and outdegree 2; reticulations with indegree 2 and outdegree 1; and leaves with indegree 1 and outdegree 0, where the leaves are in onetoone correspondence with the elements of .
Let be a binary phylogenetic network on , and suppose that are two vertices in the vertex set of . If there is a directed path from to (including the case that ), then we say that is an ancestor of and that is a descendant of . When is an arc, we say that is a parent of and that is a child of . We say that is a cutarc if deleting disconnects . A set is called a cutarc set in if or is the set of descendant leaves of for some cutarc . A cutarc set is minimal if and there is no cutarc set with and . A network is simple if it has no minimal cutarc set.
Now, suppose . A lowest stable ancestor (LSA) of in is a vertex such that, for all , all paths from the root to contain , and such that there is no descendant of with that satisfies this property. It is not difficult to see that the lowest stable ancestor is always unique for any (steel2016phylogeny, , p.263). We say that is recoverable if is the root of . In this paper, for simplicity, we shall call a recoverable, binary phylogenetic network on a network. Only in statements of theorems we will mention these restrictions explicitly.
A biconnected component of a network is a maximal subgraph not containing any cutarcs. A network is level if each biconnected component contains at most reticulations. A level network is strictly level if it is not level for any . This paper will mainly focus on level2 networks; see Figure 3 for an example.
A network on is a trinet if and a binet if . If is a trinet or binet on then we also use to denote the set . Furthermore, for a set of trinets and/or binets , we define . We will now define the restriction of a network to a subset of , which will be used to define the set of trinets contained in a network.
Definition 2.
Let be a network on and . The restriction of to , denoted , is the network on obtained from by deleting all vertices that are not on a path from to an element of and subsequently replacing parallel arcs by single arcs and suppresssing indegree1 outdegree1 vertices, until neither of these operations is applicable.
The set of trinets of a network on is defined as . The set of binets and trinets of a network on is defined as . Observe that can be obtained from .
We say that two networks on are equal and write if there is an isomorphism such that, for all , has the same label as .
The following theorem forms the basis for our new level2 algorithm.
Theorem 1 (level2trinets ).
Let be a recoverable, binary level2 network on with . Then there exists no recoverable network with .
2.1 Generators
Our algorithm will make heavy use of the underlying structure of biconnected components, which is called a “generator” and defined as follows.
Definition 3.
Let be a simple network. The underlying generator of is the directed multigraph obtained from by deleting all leaves and suppressing all indegree1 outdegree1 vertices. The arcs and indegree2 outdegree0 vertices of are called sides. The arcs are also called arc sides and the indegree2 outdegree0 vertices also reticulation sides. We say that leaf is on side (or that side contains ) if either

is a reticulation side of and the parent of in , or

is an arc side of obtained by suppressing indegree1 outdegree1 vertices of a path in and the parent of lies on path .
See Figure 2 for all underlying generators of simple level1 and level2 networks.
To attach leaf to a reticulation side means adding with an arc from to . To attach a list of leaves to an arc side means subdividing to a path with internal vertices and adding leaves with arcs .
A trinet is called a crucial trinet of a simple network if it contains a leaf on each arc side of the underlying generator of and, for each pair of parallel arcs in , a leaf on at least one of these two sides. Crucial trinets are of special interest because they have the same underlying generator as the network .
Two reticulation sides of a generator are symmetric if there exists an automorphism of with . The equivalence classes under this notion of symmetry are called sets of symmetric reticulation sides.
Two arc sides , of a generator are symmetric if there exists an automorphism of with for each reticulation side and such that and . The equivalence classes under this notion of symmetry are called sets of symmetric arc sides. For an example, see Figure 2. The idea behind this definition is that the reticulation sides of are parents of leaves in . In our algorithm, we will make heavy use of crucial trinets, which contain those leaves. Since they are labelled, we can distinguish them.
3 Algorithm
3.1 Outline
We work with multisets of trinets and binets because these may arise when collapsing or restricting trinet sets. Hence, let be a multiset of binets and trinets. The highlevel idea of the algorithm is to first find a minimal cutarc set . Then we construct by collapsing to a single leaf and find a network for recursively. The next step is to construct from by restricting to the taxa in and to find a simple network for . Finally, we construct from and by replacing by . The pseudo code is in Algorithm 1.
Within our explanation of the algorithm we will also explain why in case the underlying set of is for some recoverable level2 network , the algorithm correctly reconstructs .
3.2 Finding a minimal cutarc set
We first find minimal cutarc sets using the digraphs which were introduced in trilonet for level1 networks and are defined as follows. See Figure 3 for an example.
Definition 4.
Given a multiset of binets and trinets and , is the digraph with vertex set and an arc if at most trinets with have a minimal cutarc set not containing .
A sink set in a digraph is a set such that there is no arc with and . A sink set is minimal if and there is no sink set with and . A strongly connected component of a digraph is a maximal subgraph containing, for any , a directed path from to and from to .
If is a level1 network, minimal sink sets in correspond to minimal cutarc sets in trilonet . To extend this result to level2 networks, we will use the following theorem, which is a special case of (huber2019hierarchies, , Theorem 7.3). It uses the closure digraph of a set of trinets, which was introduced in trilonet and is defined as follows. Its vertex set is and it has an arc if, for all , there exists a trinet on in in which is a descendant of .
Theorem 2.
huber2019hierarchies Let be a binary level2 network on and . Then is minimal cutarc set of if and only if is a minimal sink set of the closure digraph .
The next lemma shows that the closure digraph is equal to if is the set of trinets of some network.
Lemma 1.
If for some network on , then .
Proof.
First let be an arc of . Assume that is not an arc of . Then there exists a such that is not a descendant of in the trinet on . We now claim that the arc entering is a cutarc of . If it is not, then there is some arc of with such that is not a descendant of and is a descendant of . This arc must lie on a path from the root to at least one of . However, it cannot be on a path from the root to or because each such path passes through . Also, it cannot be on a path from the root to because such a path does not contain any descendants of . Hence, we can conclude that is a cutarc set, which contradicts the assumption that is an arc of .
Now let be an arc of and let . Then is a descendant of in the trinet on in . Hence, is not a cutarc set. Since a minimal cutarc set contains at least two leaves, it follows that has no minimal cutarc set not containing . It now follows that is an arc of . ∎
Since we consider trinet sets that are not necessarily exactly the trinet set of some network, we cannot always simply use the digraph . In particular, it may happen that has no arcs. We therefore use the strategy described in Algorithm 2.
3.3 Constructing a simple network
Once we have found a minimal cutarc set , we need to construct the part of the network below this cutarc. To do this, we restrict to and find a simple network for .
If the underlying set of is with a level2 network and is a minimal cutarc set of , then the underlying set of is with either a tree with two leaves or a simple network.
3.3.1 The number of reticulations
Let be the fraction of the trinets in that are strictly level and let . If , we construct a network equal to a binet with maximum multiplicity in . Otherwise, if , we set the number of reticulations to 1, else we set to .
Suppose has underlying set with either a tree with two leaves or a level2 network that is simple (note that it may also be level1). If has two leaves then all binets in are equal to and the algorithm correctly constructs . Now assume . If is a simple level1 network, then , so the algorithm correctly sets the number of reticulations to . Finally, suppose is a simple strictly level2 network. Then . Moreover, at least of the trinets in are strictly level2, since any crucial trinet is strictly level2. Hence, we have and the algorithm correctly sets the number of reticulations to .
3.3.2 Leaves on reticulation sides
Let be the number of reticulations determined in the previous subsection. Let be a generator that is the underlying generator of the maximum number of strictly level trinets in . Let be the set of trinets in that have underlying generator .
For each and for each set of symmetric reticulation sides of , let denote the fraction of trinets in that have leaf on a side in . We proceed greedily as follows. Pick maximizing over all leaves that have not been assigned to a side yet and over all containing at least one side that has not been assigned a leaf yet. Assign to an arbitrary side in . Repeat until all reticulation sides have been assigned a leaf. Attach each leaf assigned to a reticulation side to this side.
Let be the set of trinets in that have underlying generator and that have an automorphism such that each reticulation side of contains its assigned leaf. From now on, we assume that each reticulation side of the generator of each trinet in contains its assigned leaf.
Suppose the underlying set of is for some simple level2, strictly level, network . Then all strictly level trinets have the same underlying generator as . Moreover, for each set of symmetric reticulation sides, for all leaves that are on a side in in and otherwise. Hence, the algorithm correctly assigns leaves to sets of symmetric reticulation sides. It can assign leaves to an arbitrary side within this set since level2 generators have at most one set of symmetric reticulation sides (see Figure 2), and those are symmetric.
3.3.3 Leaves per set of symmetric arc sides
For each leaf that has not been assigned to a reticulation side, assign to a set of symmetric arc sides of , maximizing the fraction of trinets in that have leaf on a side in .
Suppose the underlying set of is for some simple level2 network . Then it can be argued as in the previous subsection that the algorithm assigns each leaf to the set of symmetric arc sides corresponding to its location in .
3.3.4 Leaves per arc side
Consider a set of symmetric arc sides and the set of leaves assigned to . For , let denote the set of simple trinets in containing both and , and let denote the fraction of trinets in in which and are on the same side of the underlying generator, with . We define the following score for :
The main idea of this score function is that, assuming the trinets come from some level2 network, if and only if and are on the same side.
The algorithm proceeds as follows. Create a partition of , initially consisting of only singletons. While or there exist with , pick a pair maximizing
(1) 
Merge sets and in .
Finally, assign, injectively at random, the parts of to the sides in .
Suppose the underlying set of is for some simple level2 network . The only level2 generators with symmetric arc sides (see Figure 2) are and with and with , . If are on the same side then and otherwise we have . We can now see that if are on the same side then is equal to the number of leaves on that side (since each of the three sums is equal to the number of leaves on that side) which is at least . If, on the other hand, are on different sides, then (since the first sum is and the other two sums are at least ). Hence, the algorithm correctly splits the leaves in into two sets corresponding to the leaves on side and (or and ). For generators and it does not matter which set is assigned to which side, by symmetry. For generator , this does matter. It is done randomly here and corrected if necessary in the next subsection.
3.3.5 Side alignment
The following is only necessary when the underlying generator is generator , see Figure 2, since it contains more than one set of symmetric arc sides. Call its sets of symmetric arc sides , and . We have to consider swapping sides and/or (i.e., assign the leaves assigned to to and vice versa and/or assign the leaves assigned to to and vice versa). From the four possibilities, we choose the one maximizing the following score:
(2) 
with
(3) 
Suppose the underlying set of is for some simple level2 network with underlying generator . Then we have that if or and if or vice versa. Hence, and . Therefore, choosing the assignment maximizing (2), out of all possible assignments, chooses the assignment corresponding to .
3.3.6 Ordering the leaves on the arc sides
Consider a side and the set of leaves assigned to side . Let denote the set of simple trinets in containing both and and both on the same side. Let denote the fraction of trinets in in which the parent of is an ancestor of . Let be an ordered list of leaves, which is initially empty. Find a leaf maximizing
(4) 
Append leaf to and continue until is a permutation of . The permutation then describes the ordering of the leaves on side . Attach the list of leaves to side .
Suppose the underlying set of is for some simple level2 network . For two leaves on the same arc side of , we have that if the parent of is an ancestor of and otherwise. Hence, (4) is equal to the number of leaves that have not been added to the permutation yet and are below on side , minus the number of leaves that have not been added to the permutation yet and are above on side . Therefore, the algorithm constructs the ordering of leaves on side in .
The pseudo code for constructing a simple network is in Algorithm 3.
3.4 Theoretical result
The following theorem shows that the algorithm is guaranteed to reconstruct a level2 network from its set of trinets.
Theorem 3.
If is a recoverable, binary level2 network on with , then Algorithm 1 will output when applied to input .
Proof.
The proof is by induction on the number of vertices of .
The base case is that is a tree with leaves and vertices. Say that and that is the minimal cutarc set. In this case, the algorithm will generate (see Section 3.2). The set contains only the tree on and hence this is constructed as (see Section 3.3.1). The set contains only the tree on and hence is this tree. Combining and then gives (see Section 3.1).
Now suppose has at least vertices. Then the algorithm finds a minimal cutarc set of by Section 3.2. Let be the corresponding cutarc of and let be the subnetwork of rooted at . Let be the network obtained from by deleting all vertices of except for and labelling by . Then the underlying set of is and the underlying set of is . We have argued in Section 3.3 that the algorithm constructs (which is either a tree with two leaves or a simple network) from . If then since is recoverable and we are done. Otherwise, contains fewer vertices than . If has at least three leaves, the algorithm constructs from by induction. If has two leaves, then only contains and hence the algorithm constructs (see Section 3.3.1). In both cases, combining and gives (see Section 3.1). ∎
It remains to analyze the running time of the algorithm. Algorithm 2 can be implemented efficiently to run in time (similarly to trilonet for level1). The main idea here is to first compute , the number of trinets containing and that have a minimal cutarc set not containing . This can be done in time since we need to loop through the set of trinets only once and update the values affected by this trinet , i.e., with . Finding a minimal cutarc set in a trinet can be done in constant time as the size of each trinet is bounded by a constant (as any trinet that is not recoverable can be ignored). After that, the digraph can be constructed in time, and this only needs to be done for the smallest for which for at least one pair . The condensed digraph can be found with Tarjan’s algorithm for computing strongly connected components in time. Since the number of generators, and the number of sides of each generator, is bounded by a constant, the bottleneck of Algorithm 3 is Line 4. The values can be computed in time and the values in time. The values can be computed in time by looping through all and updating the values of with and . This last step has to be repeated times. So Algorithm 3 takes time. Computing and can be done in time since the size of the trinets is bounded by a constant. All of this has to be repeated times. Hence, the algorithm runs in time .
4 Discussion
We have presented an algorithm that, for an input set of trinets (and possibly binets) with leafset , outputs a level2 network on with run time and that is guaranteed to reconstruct a level2 network from its set of trinets. Note that a variant of this algorithm is presented in sjors . It should also be noted that our level2 algorithm cannot be used to decide whether or not an arbitrary set of trinets is contained in some level2 network or not in polynomial time. Indeed, if an arbitrary set of level1 trinets is input into the algorithm, then it will output a level1 network. But it is known that deciding whether or not an arbitrary set of level1 trinets is contained in a level1 network is NPcomplete huber2017reconstructing . In addition, in light of these observations concerning level1 trinets, our algorithm can be used build level1 networks for more general inputs that the level1 TriLoNet algorithm described in trilonet , since TriLoNet’s input is restricted to collections in which there is a trinet on every 3subset of the leafset.
In terms of potential applications of our level2 algorithm, in trilonet a method is presented to derive collections of level1 trinets from molecular sequence data; it would be interesting to see if this approach could be extended (or a new approach developed) to derive level2 trinets as well. We expect that this could be quite complicated as level2 trinets (and even level1 trinets) can be quite complex, and so it may be necessary to restrict the level1/level2 building blocks to some subset of the list of potential 3leaved networks.
In another direction, in this paper we have shown that level3 networks are not necessarily encoded by their trinets. However, Figure 1 is essentially the only case in which a level3 network is not encoded leonie , and so it would be interesting to investigate if there is a polynomialtime algorithm for constructing level3 networks from trinets modulo this symmetry. Alternatively, it can be shown that the collection of 4leaved networks (or quarnets) contained in a level3 network encode the network leonie , and so new algorithms could be potentially developed to build level3 networks from quarnets. In this vein, an interesting open question is whether or not a level network is always encoded by its nets. Some partial results concerning this question are presented in frank .
References
 [1] Magnus Bordewich, Britta Dorn, Simone Linz, and Rolf Niedermeier. Algorithms and complexity in phylogenetics. In Dagstuhl Reports, volume 9. Schloss DagstuhlLeibnizZentrum fuer Informatik, 2020.
 [2] R. A. Leo Elworth, Huw A. Ogilvie, Jiafan Zhu, and Luay Nakhleh. Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization, pages 317–360. Springer International Publishing, Cham, 2019.
 [3] Philippe Gambette and Katharina T Huber. On encodings of phylogenetic networks of bounded level. Journal of mathematical biology, 65(1):157–180, 2012.
 [4] Katharina T Huber and Vincent Moulton. Encoding and constructing 1nested phylogenetic networks with trinets. Algorithmica, 66(3):714–738, 2013.
 [5] Katharina T Huber, Vincent Moulton, and Taoyang Wu. Hierarchies from lowest stable ancestors in nonbinary phylogenetic networks. Journal of Classification, 36(2):200–231, 2019.
 [6] Katharina T Huber, Leo van Iersel, Steven Kelk, and Radoslaw Suchecki. A practical algorithm for reconstructing level1 phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3):635–649, 2010.
 [7] Katharina T Huber, Leo van Iersel, Vincent Moulton, Celine Scornavacca, and Taoyang Wu. Reconstructing phylogenetic level1 networks from nondense binet and trinet sets. Algorithmica, 77(1):173–200, 2017.
 [8] Katharina T Huber, Leo van Iersel, Vincent Moulton, and Taoyang Wu. How much information is needed to infer reticulate evolutionary histories? Systematic biology, 64(1):102–111, 2015.
 [9] Leo van Iersel and Vincent Moulton. Trinets encode treechild and level2 phylogenetic networks. Journal of mathematical biology, 68(7):1707–1729, 2014.
 [10] Frank Janisse. Encoding levelk phylogenetic networks, 2021. MSc thesis, TU Delft, http://resolver.tudelft.nl/uuid:11939b58b83440738de8b61d9a5f9a81.
 [11] Sjors Kole. Constructing level2 phylogenetic networks from trinets, 2020. MSc thesis, TU Delft, http://resolver.tudelft.nl/uuid:c699ea63f8c840f78f0711ac055c42e0.
 [12] Leonie Nipius. Rooted binary level3 phylogenetic networks are encoded by quarnets, 2020. BSc thesis, TU Delft, http://resolver.tudelft.nl/uuid:a9c5a8d4bc8b4d15bdbb3ed35a9fb75d.
 [13] James Oldman, Taoyang Wu, Leo van Iersel, and Vincent Moulton. TriLoNet: Piecing Together Small Networks to Reconstruct Reticulate Evolutionary Histories. Molecular Biology and Evolution, 33(8):2151–2162, 2016.
 [14] Charles Semple and Gerry Toft. Trinets encode orchard phylogenetic networks. Journal of Mathematical Biology, 83(3):1–20, 2021.
 [15] Mike Steel. Phylogeny: discrete and random processes in evolution. SIAM, 2016.
 [16] Stephen Willson. Regular networks can be uniquely constructed from their trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3):785–796, 2010.
Comments
There are no comments yet.