1 Introduction
To represent evolutionary relationships among species, phylogenetic trees have long been a powerful tool. However, as we now not only acknowledge speciation but also nontreelike processes such as hybridization and lateral gene transfer to be driving forces in the evolution of certain groups of organisms (e.g. bacteria, plants, and fish) mallet16 ; soucy15 , phylogenetic networks become more widely used to represent ancestral histories. A phylogenetic network is a generalization of a rooted phylogenetic tree. More precisely, such a network is a rooted directed acyclic graph whose leaves are labeled huson10 .
The following optimization problem, which is biologically relevant and mathematically challenging, motivates much of the theoretical work that has been done in reconstructing phylogenetic networks from phylogenetic trees. Given a collection of rooted binary phylogenetic trees on a set of species such that correctly represents the treelike evolution of different parts of the species’ genomes, what is the smallest number of reticulation events that is required to simultaneously embed the trees in into a phylogenetic network? Here, reticulation events are collectively referring to all nontreelike events and they are represented by vertices in a phylogenetic network whose indegree is at least two. Without any structural constraints on a phylogenetic network, it is wellknown that can always be embedded into such a network baroni05 ; semple07 and, hence, the optimization problem is welldefined. Moreover, despite the problem being NPhard bordewich07 , even for when , several exact algorithms have been developed that, given two rooted phylogenetic trees, construct a phylogenetic network whose number of reticulation events is minimized over the space of all networks that embed both trees albrecht12 ; chen13 ; piovesan12 ; wu10 .
Motivated by the introduction of temporal networks baroni06 ; moret04 , which are phylogenetic networks that satisfy several time constraints, Humphries et al. humphries13 ; humphries13a recently investigated the special case of the aforementioned optimization problem for when one is interested in minimizing the number of reticulation events over the smaller space of all temporal networks that embed a given collection of rooted binary phylogenetic trees. More precisely, in the context of their two papers, the authors considered temporal networks to be phylogenetic networks that satisfy the following three constraints:

speciation events occur successively,

reticulation events occur instantaneously, and

each nonleaf vertex has a child whose indegree is one.
The second constraint implies that the three species that are involved in a reticulation event, i.e. the new species resulting from this event and its two distinct parents, must coexist in time. Moreover, a phylogenetic network that satisfies the third constraint (but not necessarily the first two constraints) is referred to as a treechild network in the literature cardona12 . Intuitively, if a phylogenetic network is temporal, then one can assign a time stamp to each of its vertices such that the following holds for each edge in . If is a reticulation, then the time stamp assigned to is the same as the time stamp assigned to . Otherwise, the time stamp assigned to is strictly greater than that assigned to . Baroni et al. baroni06 showed that it can be checked in polynomial time whether or not a given phylogenetic network satisfies the first two constraints.
Humphries et al. humphries13 have established a new characterization to compute the minimum number of reticulation events that is needed to simultaneously embed an arbitrarily large collection of rooted binary phylogenetic trees into a temporal network. This characterization, which is formally defined in Section 2, is in terms of cherries, and the existence of a particular type of sequence on the leaves of the trees, called a cherrypicking sequence. It was shown that such a sequence for exists if and only if the trees in can simultaneously be embedded into a temporal network (humphries13, , Theorem 1). Moreover, a cherrypicking sequence for can be exploited further to compute the minimum number of reticulation events that is needed over all temporal networks. Importantly, not every collection is guaranteed to have a solution, i.e. there may be no cherrypicking sequence for and, hence no temporal network that embeds all trees in . It was left as an open problem by Humphries et al. humphries13 to analyze the computational complexity of deciding whether or not has a cherrypicking sequence for when .
In this paper, we make progress towards this question and show that it is NPcomplete to decide if has a cherrypicking sequence for when . Translated into the language of phylogenetic networks, this result directly implies that it is computationally hard to decide if a collection of at least eight rooted binary phylogenetic trees can simultaneously be embedded into a temporal network. To establish our result, we use a reduction from a variant of the Intermezzo problem guttmann06 . On a more positive note, we show that deciding if has a cherrypicking sequence can be done in polynomial time if the number of trees and the number of cherries in each such tree are bounded by a constant. To this end, we explore connections between phylogenetic trees and automata theory and show how the problem at hand can be solved by using a deterministic finite automaton.
The remainder of the paper is organized as follows. The next section contains notation and terminology that is used throughout the paper. Section 3 establishes NPcompleteness of a variant of the Intermezzo problem which is then, in turn, used in Section 4 to show that it is NPcomplete to decide if has a cherrypicking sequence for when . In Section 5, we show that deciding if has a cherrypicking sequence is polynomialtime solvable if the number of cherries in each tree and the size of are bounded by a constant. We finish the paper with some concluding remarks in Section 6.
2 Preliminaries
This section provides notation and terminology that is used in the subsequent sections. Throughout this paper, denotes a finite set.
Phylogenetic trees. A rooted binary phylogenetic tree is a rooted tree with leaf set and, apart from the root which has degree two, all interior vertices have degree three. Furthermore, a pair of leaves of is called a cherry if and are leaves that are adjacent to a common vertex. Note that every rooted binary phylogenetic tree has at least one cherry. We denote by the number of cherries in . We now turn to a rooted binary phylogenetic tree with exactly one cherry. More precisely, we call a caterpillar if and the elements in can be ordered, say , so that is a cherry and, if denotes the parent of , then, for all , we have as an edge in , in which case we denote the caterpillar by . To illustrate, Figure 1 shows the caterpillar with cherry . Two rooted binary phylogenetic trees and are said to be isomorphic if the identity map on induces a graph isomorphism on the underlying trees.
Subtrees. Now, let be a rooted binary phylogenetic tree, and let be a subset of . The minimal rooted subtree of that connects all vertices in is denoted by . Furthermore, the rooted binary phylogenetic tree obtained from by contracting all nonroot degree vertices is the restriction of to and is denoted by . We also write or for short to denote . For a set of rooted binary phylogenetic trees, we write (resp. ) when referring to the set (resp. ). Lastly, a rooted binary phylogenetic tree is pendant in if it can be detached from by deleting a single edge.
Cherrypicking sequences. Let be a set of rooted binary phylogenetic trees with . We say that an ordering of the elements in , say , is a cherrypicking sequence for precisely if each with labels a leaf of a cherry in each tree that is contained in . Clearly, if , then has a cherrypicking sequence. However, if , then may or may not have a cherrypicking sequence.
We now formally state the decision problem that this paper is centered around.
CPSExistence
Instance. A collection of rooted binary phylogenetic trees.
Question. Does there exist a cherrypicking sequence for ?
The significance of CPSExistence is the problem’s equivalence to the question whether or not all trees in can simultaneously be embedded into a rooted phylogenetic network that satisfies the three temporal constraints as alluded to in the introduction.
Automata and languages. Let be an alphabet. A language is a subset of all possible strings (also called words) whose symbols are in . More precisely, is a subset of , where the operator is the Kleene star. A deterministic finite automaton (or short automaton) is a tuple , where

is a finite set of states,

is a finite alphabet,

is a transition relation,

is the initial state, and

are final states.
A given automaton accepts a word if and only if is in a final state after having read all symbols from left to right, i.e.
The language that is recognized by is defined as the set of words that accepts. For the automata constructed in this paper, we have and being a total function that maps each pair of a state in and a symbol in to a state in . For a detailed introduction to automata theory and languages, see the book by Hopcroft and Ullman hopcroft79 .
3 A variant of the Intermezzo problem
In this section, we establish NPcompleteness of a variant of the ordering problem Intermezzo. Let be a finite set, and let be an ordering on the elements in . For two elements and in , we write precisely if precedes in . With this notation in hand, we now formally state Intermezzo which was shown to be NPcomplete via reduction from 3SAT (guttmann06, , Lemma 1).
Intermezzo
Instance. A finite set , a collection of pairs from , and a collection of pairwisedisjoint triples of distinct elements in .
Question. Does there exist a total linear ordering on the elements in such that for each in , and or for each in ?
Example. Consider the following instance of Intermezzo with three pairs and two disjoint triples (when viewed as sets):
A total linear ordering on the elements in that satisfies all constraints defined by and is
While each element can appear an unbounded number of times in the input of a given Intermezzo instance, this number is bounded from above by in the following Intermezzo variant.
DisjointIntermezzo
Instance. A finite set , collections of pairs from , and collections of triples of distinct elements in such that, for each , the elements in are pairwise disjoint.
Question. Does there exist a total linear ordering on the elements in such that
and
Let be an instance of DisjointIntermezzo, and let be an ordering on the elements of that satisfies the two ordering constraints for each pair and triple in the statement of DisjointIntermezzo. We say that is an DisjointIntermezzo ordering for .
We next show that DisjointIntermezzo is NPcomplete via reduction from the following restricted version of 3SAT.
2P2N3SAT
Instance. A set of variables, and a set of clauses, where each clause is a disjunction of exactly three literals, such that each variable appears negated exactly twice and unnegated exactly twice in .
Question. Does there exist a truth assignment for that satisfies each clause in ?
Berman et al. (berman03, , Theorem 1) established NPcompleteness for 2P2N3SAT.
Theorem 3.1
4DisjointIntermezzo is NPcomplete.
We show that the construction by Guttmann and Maucher (guttmann06, , Lemma 1), that was used to show that Intermezzo is NPcomplete via reduction from 3SAT, yields an instance of 4DisjointIntermezzo if we reduce from 2P2N3SAT.
Using the same notation as Guttmann and Maucher (guttmann06, , Lemma 1), their construction is as follows. Let be an instance of 2P2N3SAT that is given by a set of variables and a set of clauses
where each . Furthermore, for , let denote the number such that . We define the following three sets:
where is an abbreviation of with . By construction, the elements in are pairwisedisjoint triples of distinct elements in and, so, the three sets , , and form an instance of Intermezzo.
Now, we show how the pairs and triples in can be partitioned into sets with , , and such that the elements in are pairwise disjoint. Recalling that is a set of pairwisedisjoint triples, we start by setting and . Furthermore, we set
and . By construction, it is easy to check that the pairs in are pairwise disjoint. Lastly, consider the remaining pairs
and observe that the only possibility for two pairs in to have a nonempty intersection is to have an element with in common. Now, since each is equal to an element in
and each element appears exactly twice negated and twice unnegated in , it follows that there is a partition of into and so that all pairs in the resulting two sets are pairwise disjoint. Setting completes the construction of an instance of 4DisjointIntermezzo. Noting that it is straightforward to compute the partition
in polynomial time and that we did not modify the construction described by Guttmann and Maucher (guttmann06, , Lemma 1) itself, it follows from the same proof that has a satisfying truth assignment if and only if has a 4DisjointIntermezzo ordering. ∎
Remark. By the construction of an instance of 4DisjointIntermezzo in the proof of Theorem 3.1, we note that no pair or triple occurs twice and that, for each , we have . We will freely use these facts throughout the remainder of the paper.
4 Hardness of CPSExistence
In this section, we show that the decision problem CPSExistence is NPcomplete for any collection of rooted binary phylogenetic trees on the same leaf set that consists of a constant number of trees with . To establish the result, we use a reduction from 4DisjointIntermezzo.
Let be an instance of 4DisjointIntermezzo. Using the same notation as in the definition of DisjointIntermezzo, let
and let . For each , we next construct two rooted binary phylogenetic trees. Let be the subset of that precisely contains each element of that is neither contained in an element of nor contained in an element of
Furthermore, let and both be the caterpillar shown in Figure 1. Setting , let and be the two rooted binary phylogenetic trees obtained from and that result from the following fourstep process.

For each in turn, replace the leaf in (resp. ) with the 3taxon tree on the top left (resp. bottom left) in Figure 2 and increment by one.

For each with in turn, replace the leaf in (resp. ) with the 8taxon tree on the top right (resp. bottom right) in Figure 2 and increment by one.

For each in turn, replace the leaf in and with the cherry and increment by one.

For each element in , replace the leaf label in and with .
We call the set of intermezzo trees associated with . The next observation is an immediate consequence from the above construction and the fact that, for each , the elements in and are pairwise disjoint.
Observation 4.1
For an instance of 4DisjointIntermezzo, the set of intermezzo trees associated with consists of eight pairwise nonisomorphic rooted binary phylogenetic trees whose set of leaves is .
We now establish the main result of this section.
Theorem 4.2
Let be a collection of rooted binary phylogenetic trees. CPSExistence is NPcomplete for .
Clearly, CPSExistence for is in NP because, given an ordering on the elements in , we can decide in polynomial time if is a cherrypicking sequence for . Let be an instance of 4DisjointIntermezzo, and let be the set of eight intermezzo trees that are associated with . Note that each tree in can be constructed in polynomial time and has a size that is polynomial in . The remainder of the proof essentially consists of establishing the following claim.
Claim. is a ‘yes’instance of 4DisjointIntermezzo if and only if has a cherrypicking sequence.
First, suppose that has a cherrypicking sequence. Let be a cherrypicking sequence for , and let be the subsequence of of length that contains each element in . We next show that is a DisjointIntermezzo ordering for . Let be an element of some with , and let , with , be the unique leaf label of and such that is the leaf set of a pendant subtree of and . By construction of and , it is easily seen that exists and in . Hence, in . Turning to the triples, let be an element of some with , and let , with , be the unique leaf label of and such that is the leaf set of a pendant subtree of and . Again, by construction, exists. Let and, similarly, let . It is straightforward to check that each cherrypicking sequence for and satisfies either
Hence, as and are pendant in and , respectively, we have , or in and, consequently, in . Since the above argument holds for each pair and each triple, it follows that is a 4DisjointIntermezzo ordering for and, so, is a ‘yes’instance.
Conversely, suppose that is a ‘yes’instance of 4DisjointIntermezzo. Let be a 4DisjointIntermezzo ordering on the elements of . To ease reading, let
Modify as follows to obtain an ordering .

Concatenate with the sequence .

For each in , do one of the following two depending on the order of , , and in . If in , then replace with and replace with . Otherwise, if , replace with and replace with .
Since is a 4DisjointIntermezzo ordering with or for each , it follows from the construction of from that is an ordering on the elements in . It remains to show that is a cherrypicking sequence for . First, consider a pendant subtree with leaf set in and for some . By construction, is a pair in and, so, we have in and in . Second, consider a pendant subtree with leaf set in and for some . By construction, is a triple in and, so, we have either in and
in , or in and
in . Third, consider a pendant subtree with leaf set in and for some . By construction, we have in . Fourth, if , then, as has a 4DisjointIntermezzo ordering, there does not exist a pair in for some . Lastly, observe that is a suffix of and that, for any two trees, say and in , we have that and are isomorphic. Since is a 4DisjointIntermezzo ordering, it is now straightforward to check that is a cherrypicking sequence of . This establishes the proof of the claim and, thereby, the theorem.∎
The next corollary shows that CPSExistence is not only NPcomplete for a collection of eight rooted binary phylogenetic trees on the same leaf set, but for any such collection with a fixed number of trees with .
Corollary 4.3
Let be a collection of rooted binary phylogenetic trees. CPSExistence is NPcomplete for any fixed with .
Clearly, CPSExistence for with is in NP. To establish the corollary, we show how one can modify the reduction that is described prior to Theorem 4.2 to obtain a set of rooted binary phylogenetic trees from an instance of 4DisjointIntermezzo.
Let be an instance of 4DisjointIntermezzo. Throughout the remainder of the proof, we assume that there exists an such that . Otherwise, since and is fixed, it follows that has a constant number of pairs and triples with and is solvable in polynomial time.
Now, let and with be a collection of pairs and triples, respectively, such that . Theorem 4.2 establishes the result for when . We may therefore assume that and consider two cases. First, suppose that is even. Replace and in with a partition of into sets. Each of the resulting new sets can be split naturally into a collection of pairs and a collection of triples of which at most one is empty. This results in
collections of pairs and triples, respectively. Now, for each and with , construct two rooted binary phylogenetic trees as described in the definition of the set of intermezzo trees associated with . This yields
pairwise nonisomorphic trees. Second, suppose that
is odd. Replace
and in with a partition of into sets. Additionally, add and . Analogous to the first case, this results incollections of pairs and triples, respectively. Again, for each and with construct two rooted binary phylogenetic trees as described in the definition of the set of intermezzo trees associated with . Noting that the two trees for and are isomorphic, it follows that the construction yields
pairwise nonisomorphic trees. Since the proof of Theorem 4.2 generalizes to a set of intermezzo trees, the corollary now follows for both cases.∎
5 Bounding the number of cherries
The main result of this section is the following theorem.
Theorem 5.1
Let be a collection of rooted binary phylogenetic trees. Let be the maximum element in . Then solving CPSExistence for takes time
where . In particular, the running time is polynomial in if and are constant.
Let be a rooted binary phylogenetic tree. We denote by the recursively defined set of trees that contains and , and that satisfies the following property.
(P) If a tree is in and is a cherry in , then and are also contained in .
We refer to as the set of cherrypicked trees of . Intuitively, contains each tree that can be obtained from by repeatedly deleting a leaf of a cherry.
To establish Theorem 5.1, we consider the set of cherrypicked trees of
. First, we develop a new vector representation for each tree in
and show that the size of is at most . We then construct an automaton whose number of states is and that recognizes whether or not a word that contains each element in precisely once is a cherrypicking sequence for . Lastly, we show how to use a product automaton construction to solve CPSExistence for a set of rooted binary phylogenetic trees in time that is polynomial if the number of cherries and the number of trees in is bounded by a constant.We start with a simple lemma, which shows that deleting a leaf of a cherry never increases the number of cherries.
Lemma 5.2
Let be a rooted binary phylogenetic tree, and let be an element of a cherry in . Then,
Let be the unique element in such that is a cherry in . Observe that each cherry of other than is also a cherry of . Now, let be the parent of the parent of in , and let be the child of that is not the parent of . If is a leaf, then it is easily checked that is a cherry in and, so . On the other hand, if is not a leaf, then is not part of a cherry in and, so, . ∎
We now define a labeled tree that will play an important role throughout the remainder of this section. Let be a rooted binary phylogenetic tree with cherries . Obtain a tree from as follows.

Set to be .

Delete all leaves of that are not part of a cherry.

Suppress any resulting degree2 vertex.

If the root, say , has degree one, delete .

For each cherry with , label the parent of and with , and delete the two leaves and .

Bijectively label the nonleaf vertices of with .
We call the index tree of . By construction, is a labeled rooted binary tree that is unique up to relabeling the internal vertices. To illustrate, an example of the construction of an index tree is shown in Figure 3. The next observation follows immediately from the construction of an index tree.
Observation 5.3
Let be a rooted binary phylogenetic tree, and let be the index tree associated with . The size of is . In particular, if the number of cherries in is constant, the size of is .
We next define a particular vector relative to a given set. Let be a finite set, let be an element that is not in , and let be a nonnegative integer. We call
an vector if each element in appears at most once in , each is an element in , and each is an element in . Now consider the following two vectors:
and
We say that has the suffixproperty relative to if, for each , the vector component is equal to or satisfies each of the following equations
Lastly, if has the property that for each , we call the empty vector. Note that the empty vector satisfies the suffixproperty relative to every vector.
Building on the definition of an vector, we now describe a vector representation of a rooted binary phylogenetic tree that can be constructed by using its index tree as a guide. Roughly, the representation associates a caterpillartype structure to each vertex in the index tree. Let be a rooted binary phylogenetic tree, let , and let . For two vertices and in , we say that (resp. ) is an ancestor (resp. descendant) of (resp. ) if there is a directed path from to in . Throughout this section, we regard a vertex of to be an ancestor and a descendant of itself. The most recent common ancestor of is the vertex in whose set of descendants contains and no descendant of , except itself, has this property. We denote by . Now, let be the set of all cherries in . First, for each leaf in , let be the maximal pendant caterpillar in with cherry . We denote this by
where and . Second, for each nonleaf vertex labeled in with , let be the vertex in such that
Comments
There are no comments yet.