An algorithm for reconstructing level-2 phylogenetic networks from trinets

09/23/2021 ∙ by Leo van Iersel, et al. ∙ University of East Anglia Delft University of Technology 0

Evolutionary histories for species that cross with one another or exchange genetic material can be represented by leaf-labelled, directed graphs called phylogenetic networks. A major challenge in the burgeoning area of phylogenetic networks is to develop algorithms for building such networks by amalgamating small networks into a single large network. The level of a phylogenetic network is a measure of its deviation from being a tree; the higher the level of network, the less treelike it becomes. Various algorithms have been developed for building level-1 networks from small networks. However, level-1 networks may not be able to capture the complexity of some data sets. In this paper, we present a polynomial-time algorithm for constructing a rooted binary level-2 phylogenetic network from a collection of 3-leaf networks or trinets. Moreover, we prove that the algorithm will correctly reconstruct such a network if it is given all of the trinets in the network as input. The algorithm runs in time O(t· n+n^4) with t the number of input trinets and n the number of leaves. We also show that there is a fundamental obstruction to constructing level-3 networks from trinets, and so new approaches will need to be developed for constructing level-3 and higher level-networks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Phylogenetic networks are a generalization of phylogenetic trees that are commonly used to represent the evolutionary histories of species that cross with one another or exchange genetic material, such as plants and viruses. There are several classes of phylogenetic networks and various ways have been devised to build them – see e.g. Elworth2019 ; steel2016phylogeny for recent surveys. Mathematically speaking, a phylogenetic network on a set of species is basically a directed acyclic graph, with a single source or root, such that every sink or leaf has indegree 1 and the set of leaves is equal to . In this paper, we shall only consider recoverable, binary networks (or networks for short), that is, phylogenetic networks that satisfy a certain condition on the ancestors of and in which the root has outdegree 2 and all other, non-leaf, vertices have degree 3 (see Figure 1 for some examples). Precise definitions are given in Section 2.

Figure 1: Left: Two distinct level-3 networks and on the set . Right: The set of trinets  that is contained in both and .

Recently, there has been growing interest in the problem of building a network with leaf-set from a collection of networks each of which having leaf-set equal to some subset of in such a way that the input networks are each contained in the final network. Early work on this so-called supernetwork problem focused on building up networks from phylogenetic trees, that is, phylogenetic networks whose underlying graph is a tree. Several results have been presented for this problem, including algorithms for constructing networks from triplets, which are 3-leaved phylogenetic trees, (e.g. huber2010practical ) and from collections of phylogenetic trees all on leaf-set (e.g. willson2010regular ) – for a recent summary of these approaches see semple2021trinets . However, an important issue with this strategy is that phylogenetic trees do not necessarily encode phylogenetic networks, i.e., there are examples of distinct (non-isomorphic) networks that contain the same set of phylogenetic trees (see e.g. gambette2012encodings ), making it impossible to uniquely reconstruct such networks from their trees.

Motivated by this issue, in huber2013encoding it was proposed to build networks from collections of 3-leaved networks, or trinets. In that paper, the authors focused on building level-1 networks222In fact they considered the somewhat more general class of 1-nested networks. where, in general, level- networks are networks that can be converted into a tree by deleting at most arcs from each biconnected component. In particular, they showed that level-1 networks are encoded by the trinets that they contain, and gave an algorithm for constructing a level-1 network on from its trinets that is polynomial in (see also trilonet for a more general algorithm). In level2trinets the encoding result was extended to the more general class of level-2 networks, and also to the distinct and quite broad class of so-called tree-child networks. Recently, in semple2021trinets it was also shown that orchard networks, which generalise tree-child networks, are encoded by their trinets, and an algorithm was given for constructing an orchard network from its trinets that is polynomial in the size of the vertex set of the network (whose size is not necessarily polynomial in ).

Intriguingly, in huber2015much it was shown that, as with trees, trinets do not encode networks in general. Indeed, in (semple2021trinets, , p.28) it was shown that even level-4 networks are not encoded by their trinets and, since level-2 networks are encoded by their trinets (see above), it was asked whether or not level-3 networks are encoded by their trinets (see also dagstuhl ). In the first result of this paper we answer this question – in particular, the two networks and in Figure 1 are level-3 and are easily seen to be distinct and to contain the same set of trinets (see leonie ). Hence, level- networks are encoded by their trinets only if . As the algorithm in huber2013encoding can be used to uniquely reconstruct a level-1 network from its trinets, this leaves open the question of finding a polynomial algorithm for building a level-2 network from its trinets, which is the purpose of the rest of this paper. In particular, we shall present an algorithm which constructs a level-2 network on from any set of trinets whose leaf-set union is that runs in time (Algorithm 1) and that is guaranteed to reconstruct a level-2 network from its set of trinets (Theorem 3). We now proceed by presenting some preliminaries, after which we shall describe our level-2 algorithm. We will conclude with a brief discussion of our results.

2 Preliminaries

We refer the reader to (steel2016phylogeny, , Chapter 10) for more information on the terminology and basic results on phylogenetic networks that we summarise in this section.

Definition 1.

Let  be some finite set (corresponding to a set of species, say). A binary phylogenetic network (on ) is a directed acyclic graph with the following types of vertices: a single root with indegree 0 and outdegree 2; tree-vertices with indegree 1 and outdegree 2; reticulations with indegree 2 and outdegree 1; and leaves with indegree 1 and outdegree 0, where the leaves are in one-to-one correspondence with the elements of .

Let  be a binary phylogenetic network on , and suppose that are two vertices in the vertex set of . If there is a directed path from  to  (including the case that ), then we say that  is an ancestor of  and that  is a descendant of . When  is an arc, we say that  is a parent of  and that  is a child of . We say that  is a cut-arc if deleting  disconnects . A set  is called a cut-arc set in  if  or  is the set of descendant leaves of  for some cut-arc . A cut-arc set  is minimal if  and there is no cut-arc set  with  and . A network is simple if it has no minimal cut-arc set.

Now, suppose . A lowest stable ancestor (LSA) of  in  is a vertex  such that, for all , all paths from the root to  contain , and such that there is no descendant  of  with that satisfies this property. It is not difficult to see that the lowest stable ancestor is always unique for any  (steel2016phylogeny, , p.263). We say that is recoverable if is the root of . In this paper, for simplicity, we shall call a recoverable, binary phylogenetic network on a network. Only in statements of theorems we will mention these restrictions explicitly.

A biconnected component of a network is a maximal subgraph not containing any cut-arcs. A network is level- if each biconnected component contains at most  reticulations. A level- network is strictly level- if it is not level- for any . This paper will mainly focus on level-2 networks; see Figure 3 for an example.

A network on  is a trinet if  and a binet if . If  is a trinet or binet on  then we also use  to denote the set . Furthermore, for a set of trinets and/or binets , we define . We will now define the restriction of a network to a subset of , which will be used to define the set of trinets contained in a network.

Definition 2.

Let  be a network on  and . The restriction of  to , denoted , is the network on  obtained from  by deleting all vertices that are not on a path from to an element of  and subsequently replacing parallel arcs by single arcs and suppresssing indegree-1 outdegree-1 vertices, until neither of these operations is applicable.

The set of trinets  of a network  on  is defined as . The set of binets and trinets  of a network  on  is defined as . Observe that can be obtained from .

We say that two networks  on  are equal and write  if there is an isomorphism  such that, for all  has the same label as .

The following theorem forms the basis for our new level-2 algorithm.

Theorem 1 (level2trinets ).

Let  be a recoverable, binary level-2 network on  with . Then there exists no recoverable network  with .

2.1 Generators

Our algorithm will make heavy use of the underlying structure of biconnected components, which is called a “generator” and defined as follows.

Definition 3.

Let  be a simple network. The underlying generator of  is the directed multigraph  obtained from  by deleting all leaves and suppressing all indegree-1 outdegree-1 vertices. The arcs and indegree-2 outdegree-0 vertices of  are called sides. The arcs are also called arc sides and the indegree-2 outdegree-0 vertices also reticulation sides. We say that leaf  is on side  (or that side  contains ) if either

  • is a reticulation side of  and the parent of  in , or

  • is an arc side of  obtained by suppressing indegree-1 outdegree-1 vertices of a path  in  and the parent of  lies on path .

See Figure 2 for all underlying generators of simple level-1 and level-2 networks.

Figure 2: The only underlying generator of a simple level-1 network and the four underlying generators of simple level-2 networks level2trinets . Generator has three sets of symmetric arc sides while generators  and  have one set of symmetric arc sides . Generator  is the only level-2 generator with symmetric reticulation sides.

To attach leaf  to a reticulation side  means adding  with an arc from  to . To attach a list  of leaves to an arc side  means subdividing  to a path with  internal vertices  and adding leaves with arcs .

A trinet  is called a crucial trinet of a simple network  if it contains a leaf on each arc side of the underlying generator  of  and, for each pair of parallel arcs in , a leaf on at least one of these two sides. Crucial trinets are of special interest because they have the same underlying generator as the network .

Two reticulation sides  of a generator  are symmetric if there exists an automorphism of  with . The equivalence classes under this notion of symmetry are called sets of symmetric reticulation sides.

Two arc sides , of a generator  are symmetric if there exists an automorphism of  with  for each reticulation side  and such that  and . The equivalence classes under this notion of symmetry are called sets of symmetric arc sides. For an example, see Figure 2. The idea behind this definition is that the reticulation sides of  are parents of leaves in . In our algorithm, we will make heavy use of crucial trinets, which contain those leaves. Since they are labelled, we can distinguish them.

3 Algorithm

3.1 Outline

We work with multisets of trinets and binets because these may arise when collapsing or restricting trinet sets. Hence, let  be a multiset of binets and trinets. The high-level idea of the algorithm is to first find a minimal cut-arc set . Then we construct  by collapsing  to a single leaf  and find a network  for  recursively. The next step is to construct  from  by restricting to the taxa in  and to find a simple network  for . Finally, we construct  from  and  by replacing  by . The pseudo code is in Algorithm 1.

Figure 3: A level-2 network , its set of trinets  and the digraph . The set  is the only minimal sink set in and the only minimal cut-arc set in .
Data: Multiset  of level-2 trinets and (possibly) binets on taxon set .
Result: Level-2 phylogenetic network  on .
1 Find a cut-arc set  using Algorithm 2;
//Find network  with  collapsed
2 Initialize and let  be a new taxon;
3 for  with  do
4       if  then Add  to ;
5       else
6            Pick ;
7             Construct ;
8             Relabel  to  and add the resulting trinet or binet to ;
9            
10      
11Construct  from  by recursively running this algorithm;
//Find simple network  on 
12 ;
13 Construct a simple network  for  using Algorithm 3;
//Combine  and 
14 if  then
15       return the network constructed from  and  by identifying  with the root of 
16else  return ;
Algorithm 1 Constructing level-2 networks from trinets

Within our explanation of the algorithm we will also explain why in case the underlying set of is for some recoverable level-2 network , the algorithm correctly reconstructs .

3.2 Finding a minimal cut-arc set

We first find minimal cut-arc sets using the digraphs  which were introduced in trilonet for level-1 networks and are defined as follows. See Figure 3 for an example.

Definition 4.

Given a multiset  of binets and trinets and , is the digraph with vertex set  and an arc  if at most  trinets  with  have a minimal cut-arc set not containing .

A sink set in a digraph  is a set  such that there is no arc  with  and . A sink set  is minimal if  and there is no sink set  with  and . A strongly connected component of a digraph is a maximal subgraph  containing, for any , a directed path from  to  and from  to .

If  is a level-1 network, minimal sink sets in correspond to minimal cut-arc sets in  trilonet . To extend this result to level-2 networks, we will use the following theorem, which is a special case of (huber2019hierarchies, , Theorem 7.3). It uses the closure digraph  of a set  of trinets, which was introduced in trilonet and is defined as follows. Its vertex set is  and it has an arc  if, for all , there exists a trinet on  in  in which  is a descendant of .

Theorem 2.

huber2019hierarchies Let  be a binary level-2 network on  and . Then  is minimal cut-arc set of  if and only if  is a minimal sink set of the closure digraph .

The next lemma shows that the closure digraph is equal to if  is the set of trinets of some network.

Lemma 1.

If  for some network  on , then .

Proof.

First let  be an arc of . Assume that  is not an arc of . Then there exists a such that is not a descendant of in the trinet  on . We now claim that the arc entering is a cut-arc of . If it is not, then there is some arc of  with such that  is not a descendant of  and  is a descendant of . This arc  must lie on a path from the root to at least one of . However, it cannot be on a path from the root to  or  because each such path passes through . Also, it cannot be on a path from the root to  because such a path does not contain any descendants of . Hence, we can conclude that  is a cut-arc set, which contradicts the assumption that is an arc of .

Now let  be an arc of  and let . Then  is a descendant of in the trinet on  in . Hence, is not a cut-arc set. Since a minimal cut-arc set contains at least two leaves, it follows that  has no minimal cut-arc set not containing . It now follows that  is an arc of . ∎

Since we consider trinet sets that are not necessarily exactly the trinet set of some network, we cannot always simply use the digraph . In particular, it may happen that has no arcs. We therefore use the strategy described in Algorithm 2.

Data: Multiset  of level-2 trinets and (possibly) binets on taxon set .
Result: Set .
1 for  do
2       Construct (see Definition 4);
3       if  has at least one arc then
4             if  has a strongly connected component that is a minimal sink set then
5                   return a smallest such set;
6                  
7            else
8                  Construct the condensed digraph  of ;
9                   Find a vertex  of  with a minimum number of children, over all vertices with at least one child;
10                   return the set of vertices of corresponding to  and its children;
11            
12      
Algorithm 2 Finding a cut-arc set

From Theorem 2 and Lemma 1 follows that this process produces a minimal cut-arc set if the input set is equal to  for some level-2 network . Since  is not affected by binets or multiple copies of trinets, the same holds when  is a multiset of binets and trinets with underlying set .

3.3 Constructing a simple network

Once we have found a minimal cut-arc set , we need to construct the part of the network below this cut-arc. To do this, we restrict  to  and find a simple network for .

If the underlying set of  is  with  a level-2 network and  is a minimal cut-arc set of , then the underlying set of  is  with  either a tree with two leaves or a simple network.

3.3.1 The number of reticulations

Let  be the fraction of the trinets in  that are strictly level- and let . If , we construct a network equal to a binet with maximum multiplicity in . Otherwise, if , we set the number of reticulations  to 1, else we set  to .

Suppose  has underlying set with  either a tree with two leaves or a level-2 network that is simple (note that it may also be level-1). If  has two leaves then all binets in  are equal to  and the algorithm correctly constructs . Now assume . If  is a simple level-1 network, then , so the algorithm correctly sets the number of reticulations to . Finally, suppose  is a simple strictly level-2 network. Then . Moreover, at least of the trinets in  are strictly level-2, since any crucial trinet is strictly level-2. Hence, we have and the algorithm correctly sets the number of reticulations to .

3.3.2 Leaves on reticulation sides

Let  be the number of reticulations determined in the previous subsection. Let  be a generator that is the underlying generator of the maximum number of strictly level- trinets in . Let  be the set of trinets in  that have underlying generator .

For each  and for each set of symmetric reticulation sides  of , let denote the fraction of trinets in  that have leaf  on a side in . We proceed greedily as follows. Pick  maximizing  over all leaves  that have not been assigned to a side yet and over all  containing at least one side that has not been assigned a leaf yet. Assign  to an arbitrary side in . Repeat until all reticulation sides have been assigned a leaf. Attach each leaf assigned to a reticulation side to this side.

Let  be the set of trinets in  that have underlying generator  and that have an automorphism such that each reticulation side of  contains its assigned leaf. From now on, we assume that each reticulation side of the generator of each trinet in  contains its assigned leaf.

Suppose the underlying set of  is  for some simple level-2, strictly level-, network . Then all strictly level- trinets have the same underlying generator as . Moreover, for each set  of symmetric reticulation sides, for all leaves  that are on a side in  in  and otherwise. Hence, the algorithm correctly assigns leaves to sets of symmetric reticulation sides. It can assign leaves to an arbitrary side within this set since level-2 generators have at most one set of symmetric reticulation sides (see Figure 2), and those are symmetric.

3.3.3 Leaves per set of symmetric arc sides

For each leaf  that has not been assigned to a reticulation side, assign  to a set of symmetric arc sides  of , maximizing the fraction of trinets in  that have leaf  on a side in .

Suppose the underlying set of  is  for some simple level-2 network . Then it can be argued as in the previous subsection that the algorithm assigns each leaf to the set of symmetric arc sides corresponding to its location in .

3.3.4 Leaves per arc side

Consider a set of symmetric arc sides  and the set of leaves  assigned to . For , let  denote the set of simple trinets in  containing both  and , and let  denote the fraction of trinets in  in which  and  are on the same side of the underlying generator, with . We define the following score for :

The main idea of this score function is that, assuming the trinets come from some level-2 network, if and only if  and  are on the same side.

The algorithm proceeds as follows. Create a partition  of , initially consisting of only singletons. While or there exist with , pick a pair  maximizing

(1)

Merge sets  and  in .

Finally, assign, injectively at random, the parts of  to the sides in .

Suppose the underlying set of  is  for some simple level-2 network . The only level-2 generators with symmetric arc sides (see Figure 2) are and  with and with , . If  are on the same side then  and otherwise we have . We can now see that if  are on the same side then is equal to the number of leaves on that side (since each of the three sums is equal to the number of leaves on that side) which is at least . If, on the other hand,  are on different sides, then  (since the first sum is  and the other two sums are at least ). Hence, the algorithm correctly splits the leaves in  into two sets corresponding to the leaves on side  and  (or  and ). For generators  and  it does not matter which set is assigned to which side, by symmetry. For generator , this does matter. It is done randomly here and corrected if necessary in the next subsection.

3.3.5 Side alignment

The following is only necessary when the underlying generator  is generator , see Figure 2, since it contains more than one set of symmetric arc sides. Call its sets of symmetric arc sides , and . We have to consider swapping sides and/or (i.e., assign the leaves assigned to  to  and vice versa and/or assign the leaves assigned to  to  and vice versa). From the four possibilities, we choose the one maximizing the following score:

(2)

with

(3)

Suppose the underlying set of  is  for some simple level-2 network  with underlying generator . Then we have that  if  or  and if  or vice versa. Hence,  and . Therefore, choosing the assignment maximizing (2), out of all possible assignments, chooses the assignment corresponding to .

3.3.6 Ordering the leaves on the arc sides

Consider a side  and the set of leaves  assigned to side . Let  denote the set of simple trinets in  containing both  and  and both on the same side. Let  denote the fraction of trinets in  in which the parent of  is an ancestor of . Let  be an ordered list of leaves, which is initially empty. Find a leaf  maximizing

(4)

Append leaf  to  and continue until  is a permutation of . The permutation  then describes the ordering of the leaves on side . Attach the list of leaves  to side .

Suppose the underlying set of  is  for some simple level-2 network . For two leaves  on the same arc side  of , we have that  if the parent of  is an ancestor of  and  otherwise. Hence, (4) is equal to the number of leaves that have not been added to the permutation  yet and are below  on side , minus the number of leaves that have not been added to the permutation  yet and are above  on side . Therefore, the algorithm constructs the ordering  of leaves on side  in .

The pseudo code for constructing a simple network is in Algorithm 3.

Data: Multiset  of level-2 trinets and (possibly) binets on taxon set .
Result: Simple level-2 network  on .
//Determine the level 
1 Let and  the fraction of trinets in  that are level-;
2 if  then
3       return an arbitrary network with maximum multiplicity in 
4if  then
5      
6else
7      
//Determine the generator
8 the underlying generator of the maximum number of level- trinets in ;
9 ;
//Assign leaves to reticulation sides
10 the set of trinets in  that have underlying generator ;
11 while there is a reticulation side of  that has not been assigned a leaf do
12       Find  that has not been assigned to a side and a set of symmetric reticulation sides  that have not all been assigned a leaf, maximizing the fraction of trinets in  that have leaf  on a side in ;
13       Assign  to an arbitrary side in  and attach  to this side in ;
14      
15 the set of trinets in  that have underlying generator  and that have an automorphism such that each reticulation side of  contains its assigned leaf;
16 Relabel the sides of the generators of the trinets in  such that each reticulation side contains its assigned leaf;
//Assign leaves to sets of symmetric arc sides
17 for each leaf  that has not been assigned to a reticulation side do
18       Assign  to a set of symmetric arc sides  maximizing the fraction of trinets in  that have leaf  on a side in ;
19      
Continued in Algorithm 4
Algorithm 3 Constructing a simple level-2 network
//Assign leaves to arc sides
1 for each set of symmetric arc sides  do
2       partition of  containing only singletons;
3       the fraction of simple trinets containing  in which  are on the same side;
4       ;
5       ;
6       while there exist with , or  do
7             Find a pair  maximizing ;
8             Merge sets  and  in  and update ;
9            
10      while there is a leaf in  that has not been assigned to a side do
11             Pick a set containing a leaf that has not been assigned to a side;
12             Pick a side  that has not been assigned any leaves;
13             Assign the leaves from  to side ;
14            
15      
//Align sides
16 if  is generator from Figure 2 then
17       Find bijections and maximizing ;
18       with ;
19       Assign the leaves assigned to  to , respectively;
20      
//Order leaves on arc sides
21 for each arc side  with set  of assigned leaves do
22       the set of simple trinets in  containing  and  on the same side;
23       the fraction of trinets in  in which the parent of  is an ancestor of ;
24       ;
25       while  is not a permutation of  do
26             Find a leaf  maximizing and append  to ;
27            
28      Attach the list of leaves to side  in ;
29      
30return ;
Algorithm 4 Continuation of Algorithm 3

3.4 Theoretical result

The following theorem shows that the algorithm is guaranteed to reconstruct a level-2 network from its set of trinets.

Theorem 3.

If  is a recoverable, binary level-2 network on  with , then Algorithm 1 will output  when applied to input .

Proof.

The proof is by induction on the number of vertices of .

The base case is that  is a tree with  leaves and  vertices. Say that  and that  is the minimal cut-arc set. In this case, the algorithm will generate (see Section 3.2). The set  contains only the tree on  and hence this is constructed as  (see Section 3.3.1). The set  contains only the tree on  and hence  is this tree. Combining  and  then gives  (see Section 3.1).

Now suppose  has at least  vertices. Then the algorithm finds a minimal cut-arc set  of  by Section 3.2. Let  be the corresponding cut-arc of  and let  be the subnetwork of  rooted at . Let  be the network obtained from  by deleting all vertices of  except for  and labelling  by . Then the underlying set of  is  and the underlying set of  is . We have argued in Section 3.3 that the algorithm constructs  (which is either a tree with two leaves or a simple network) from . If  then  since  is recoverable and we are done. Otherwise,  contains fewer vertices than . If  has at least three leaves, the algorithm constructs  from  by induction. If  has two leaves, then  only contains  and hence the algorithm constructs  (see Section 3.3.1). In both cases, combining  and  gives  (see Section 3.1). ∎

It remains to analyze the running time of the algorithm. Algorithm 2 can be implemented efficiently to run in  time (similarly to trilonet for level-1). The main idea here is to first compute , the number of trinets containing  and  that have a minimal cut-arc set not containing . This can be done in  time since we need to loop through the set of trinets only once and update the values  affected by this trinet , i.e., with . Finding a minimal cut-arc set in a trinet can be done in constant time as the size of each trinet is bounded by a constant (as any trinet that is not recoverable can be ignored). After that, the digraph  can be constructed in  time, and this only needs to be done for the smallest  for which  for at least one pair . The condensed digraph can be found with Tarjan’s algorithm for computing strongly connected components in  time. Since the number of generators, and the number of sides of each generator, is bounded by a constant, the bottleneck of Algorithm 3 is Line 4. The values  can be computed in  time and the values  in  time. The values  can be computed in  time by looping through all  and updating the values of  with  and . This last step has to be repeated  times. So Algorithm 3 takes time. Computing  and  can be done in  time since the size of the trinets is bounded by a constant. All of this has to be repeated  times. Hence, the algorithm runs in time .

4 Discussion

We have presented an algorithm that, for an input set of trinets (and possibly binets) with leaf-set , outputs a level-2 network on with run time and that is guaranteed to reconstruct a level-2 network from its set of trinets. Note that a variant of this algorithm is presented in  sjors . It should also be noted that our level-2 algorithm cannot be used to decide whether or not an arbitrary set of trinets is contained in some level-2 network or not in polynomial time. Indeed, if an arbitrary set of level-1 trinets is input into the algorithm, then it will output a level-1 network. But it is known that deciding whether or not an arbitrary set of level-1 trinets is contained in a level-1 network is NP-complete huber2017reconstructing . In addition, in light of these observations concerning level-1 trinets, our algorithm can be used build level-1 networks for more general inputs that the level-1 TriLoNet algorithm described in trilonet , since TriLoNet’s input is restricted to collections in which there is a trinet on every 3-subset of the leaf-set.

In terms of potential applications of our level-2 algorithm, in trilonet a method is presented to derive collections of level-1 trinets from molecular sequence data; it would be interesting to see if this approach could be extended (or a new approach developed) to derive level-2 trinets as well. We expect that this could be quite complicated as level-2 trinets (and even level-1 trinets) can be quite complex, and so it may be necessary to restrict the level-1/level-2 building blocks to some subset of the list of potential 3-leaved networks.

In another direction, in this paper we have shown that level-3 networks are not necessarily encoded by their trinets. However, Figure 1 is essentially the only case in which a level-3 network is not encoded leonie , and so it would be interesting to investigate if there is a polynomial-time algorithm for constructing level-3 networks from trinets modulo this symmetry. Alternatively, it can be shown that the collection of 4-leaved networks (or quarnets) contained in a level-3 network encode the network leonie , and so new algorithms could be potentially developed to build level-3 networks from quarnets. In this vein, an interesting open question is whether or not a level- network is always encoded by its -nets. Some partial results concerning this question are presented in frank .

References

  • [1] Magnus Bordewich, Britta Dorn, Simone Linz, and Rolf Niedermeier. Algorithms and complexity in phylogenetics. In Dagstuhl Reports, volume 9. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2020.
  • [2] R. A. Leo Elworth, Huw A. Ogilvie, Jiafan Zhu, and Luay Nakhleh. Advances in Computational Methods for Phylogenetic Networks in the Presence of Hybridization, pages 317–360. Springer International Publishing, Cham, 2019.
  • [3] Philippe Gambette and Katharina T Huber. On encodings of phylogenetic networks of bounded level. Journal of mathematical biology, 65(1):157–180, 2012.
  • [4] Katharina T Huber and Vincent Moulton. Encoding and constructing 1-nested phylogenetic networks with trinets. Algorithmica, 66(3):714–738, 2013.
  • [5] Katharina T Huber, Vincent Moulton, and Taoyang Wu. Hierarchies from lowest stable ancestors in nonbinary phylogenetic networks. Journal of Classification, 36(2):200–231, 2019.
  • [6] Katharina T Huber, Leo van Iersel, Steven Kelk, and Radoslaw Suchecki. A practical algorithm for reconstructing level-1 phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3):635–649, 2010.
  • [7] Katharina T Huber, Leo van Iersel, Vincent Moulton, Celine Scornavacca, and Taoyang Wu. Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets. Algorithmica, 77(1):173–200, 2017.
  • [8] Katharina T Huber, Leo van Iersel, Vincent Moulton, and Taoyang Wu. How much information is needed to infer reticulate evolutionary histories? Systematic biology, 64(1):102–111, 2015.
  • [9] Leo van Iersel and Vincent Moulton. Trinets encode tree-child and level-2 phylogenetic networks. Journal of mathematical biology, 68(7):1707–1729, 2014.
  • [10] Frank Janisse. Encoding level-k phylogenetic networks, 2021. MSc thesis, TU Delft, http://resolver.tudelft.nl/uuid:11939b58-b834-4073-8de8-b61d9a5f9a81.
  • [11] Sjors Kole. Constructing level-2 phylogenetic networks from trinets, 2020. MSc thesis, TU Delft, http://resolver.tudelft.nl/uuid:c699ea63-f8c8-40f7-8f07-11ac055c42e0.
  • [12] Leonie Nipius. Rooted binary level-3 phylogenetic networks are encoded by quarnets, 2020. BSc thesis, TU Delft, http://resolver.tudelft.nl/uuid:a9c5a8d4-bc8b-4d15-bdbb-3ed35a9fb75d.
  • [13] James Oldman, Taoyang Wu, Leo van Iersel, and Vincent Moulton. TriLoNet: Piecing Together Small Networks to Reconstruct Reticulate Evolutionary Histories. Molecular Biology and Evolution, 33(8):2151–2162, 2016.
  • [14] Charles Semple and Gerry Toft. Trinets encode orchard phylogenetic networks. Journal of Mathematical Biology, 83(3):1–20, 2021.
  • [15] Mike Steel. Phylogeny: discrete and random processes in evolution. SIAM, 2016.
  • [16] Stephen Willson. Regular networks can be uniquely constructed from their trees. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(3):785–796, 2010.