1 Introduction
The systematic comparison of hierarchies is a fundamental task in a wide range of domains. Entities in biology often stand in a hierarchical relationship to one another (Hainke et al. (2012), Dutkowski et al. (2013)). A phylogenetic tree, for example, groups all descendants of a common ancestor and matching metrics are used in order to evaluate the dissimilarity between such trees (Damian Bogdanowicz (2013)). The most common way to measure the distance between trees is the tree edit distance (Tai (1979), Yoshino and Hirata (2017)). In this work we consider unordered labeled trees in which the order among siblings is irrelevant. The tree edit distance between trees is defined by the minimalcost sequence of node edit operations (insert, delete, relabel) that transforms one tree into another. It is shown in Tai (1979) that there is a strong relationship between a Tai mapping and a sequence of editing operations, i.e. given a Tai mapping the tree edit distance can easily be computed.
A Tai mapping defines a onetoone node mapping between trees that preserves ancestor relationships (see Figure 0(a)). We use and to denote ancestor orders, i.e. if is an ancestor of , and if or . Formally, for any two trees , a Tai mapping is defined as a mapping such that for any distinct , we have that
(1) 
Equation 1 combines onetoone and ancestor order property with the conjunction. Note that a Tai mapping can equivalently be defined combining both properties, i.e. for any distinct we have that
(2) 
For a weight function the weight of is defined as . The problem of finding a maximum weight Tai mapping is known to be NPcomplete (Zhang et al. (1992)) and MAX SNPhard (Zhang and Jiang (1994)
). Hence, different integer linear programming (ILP) based algorithms have been proposed to compute such a mapping (
Kondo et al. (2014), Do et al. (2019), Böcker et al. (2013), Hong et al. (2017)). A naive ILP formulation Böcker et al. (2013) contains a constraint for each pair of edges such (2) does not hold, giving rise to variables and constraints, where and are the number of nodes of the two input trees, and thus does not allow to practically solve even moderatesized instances (Kondo et al. (2014), Böcker et al. (2013)). In Hong et al. (2017) the authors therefore divide the problem into subproblems with constraints each, by utilizing a dynamic programming approach from Fukagawa et al. (2011). In Do et al. (2019), on the other hand, the authors introduced two classes of valid inequalities of the Tai mapping polytope, crossing edges clique constraints and semiindependent clique constraints, and use them as cutting planes in a branchandbound scheme. Their implementation achieved a 13fold speedup compared to the naive ILP in experiments on perturbed human singlecell data. Here, we derive a class of valid inequalities that generalize both types of previously introduced clique constraints. We formalize them as special classes of anti Tai mappings defined as follows.Anti Tai mapping
A mapping that consists of edges for which (1) does not hold is called an anti Tai mapping (see Figure 0(b)). More formally, an anti Tai mapping is defined as a binary relation^{1}^{1}1Note that anti Tai mapping is not really a mapping any more. Hence, it should be understood as anti ”Tai mapping”. if for any distinct we have that
(3) 
Let be a graph on vertex set such that there is an edge if and only if (3) holds. Then a maximumweight independent set in corresponds to a maximumweight Tai mapping, while a maximum clique corresponds to a maximumweight anti Tai mapping. Recall that the stable set polytope, denoted by STAB, is the convex hull of stable (independent) sets of and that it is contained in the fractional stable set polytope
(4) 
see Grötschel et al. (1993). It is known that STAB=QSTAB if and only if the graph is perfect, in which case linear optimization problems over STAB can be solved in polynomial time Grötschel et al. (1993). As the problem of finding a maximumweight Tai mapping is NPhard Zhang et al. (1992), it follows that, unless P=NP, the graph defined above is not perfect and furthermore that STABQSTAB
. Heuristically, one can still use the clique constraints in (
4) in a cutting plane algorithm for solving an integer program for the maximumweight Tai mapping problem, if it is possible to efficiently solve the corresponding separation problem, which amounts to finding a maximumweight anti Tai mapping. While it is not known whether such a separation problem is polynomially solvable in general, we manage to answer this question in the affirmative when one of the two trees is a path. Specifically, we construct a dynamic program to compute the maximumweight anti Tai mapping for the case where one tree is a path in time . Furthermore, we define a semiindependent antimatching (siantimatching) as a restricted class of an anti Tai mapping and use this to provide a polynomially computable lower bound on the maximumweight anti Tai mapping in the case of two trees. More precisely, let denote that and are on the same roottoleaf path (“comparable”), i.e. if or . Otherwise, we say that and are incomparable. A mapping is a siantimatching if for any distinct :(5) 
Note that any pair satisfying (5) also satisfies (3), and hence the set of all such pairs form a subgraph of the graph defined above. It follows that a maximumweight siantimatching can be used to provide a valid clique constraint for the Tai mapping problem. Motivated by this, we introduce a dynamic program that optimally solves the maximumweight siantimatching problem in time . We observe further that the same dynamic program can be combined with our optimal pathtree anti Tai mapping algorithm mentioned above (that is, when of the two trees is a path) to obtain a polynomially computable lower bound on the maximumweight anti Tai mapping in the general case.
The rest of the paper is structured as follows. In Section 2, we study the properties of an siantimatching and show that it induces a partition of the matched vertices in each tree into a path and an independent set such that the path in one tree is mapped (by the antimatching) to the independent set in the other and vice versa. We use this structural result to derive a dynamic program for computing a maximumweight siantimatching. In Section 3, we give a dynamic programming formulation for the anti Tai mapping problem between a tree and a path, and combine this result with our dynamic programming idea from Section 2 to give a dynamic program that provides a lower bound on the maximumweight anti Tai mapping between two trees. We conclude in Section 4.
2 Siantimatching problem
In this section, we present an efficient algorithm for this special case of anti Tai mapping. Surprisingly, siantimatching turns out to be solvable in polynomial time and we provide a dynamic program that solves the problem directly on two unordered labeled trees and in time . Our dynamic program strongly relies on the decomposition theorem (Theorem 2.1) that states that every siantimatching can be decomposed into an antichain and a path in , and a path and an antichain in such that an antichain in maps into a path in , and a path in maps into an antichain in (see Figure 3). We further argue that there exists an order in that decomposition which in turn provides the way of computing an entry in a dynamic table.
2.1 Decomposition theorem
We will first show that siantimatching can get nicely decomposed, which will form the basis for our algorithm.
Definition 2.1.
Let and denote two trees. We say that any two are semiindependent pair of edges (siedges) if holds true.
Let be a set of semiindependent edges between trees and . Since here we allow edges with common vertices (i.e. it is not a matching set), we will refer to any such set as semiindependent antimatching or, in short, siantimatching^{2}^{2}2 Typically in an antimatching, any pair of distinct edges have a common endpoint. Note that we use the notion of antimatching in a slightly different way, i.e. any pair of distinct edges could have a common endpoint..
For any and siantimatching let . If is a singleton, e.g. , we will often omit the set notation and write . Analogously, for any let . Furthermore, let and denote all nodes in and that are incident to siedges in . We say that roottoleaf path in tree is maximal if no other roottoleaf path contains more nodes of than . Let .
Lemma 2.1.
Let denote an siantimatching and let denote a maximal path in . Then is an antichain, i.e. any two distinct nodes in are incomparable.
Proof.
Suppose the opposite, i.e. there exist , , such that (see Figure 2). Since is maximal, there must exist such that , and , . Thus, nodes and must be comparable with nodes in and . Hence, it must be that , where the notation denotes a path in a tree with a given start and end node, while LCA stands for a lowest common ancestor of a given set of nodes. Which is a contradiction with the fact that and cannot lie on a common path in . ∎
Lemma 2.2.
Let be siantimatching and be an antichain. Then either or , for some , lie on a single roottoleaf path in .
Proof.
Suppose nodes in do not lie on a path, i.e. there exist such that . Note that contains only a single node. Let denote that unique node. Since for all it follows that and for all . Thus, all nodes in must lie on the path . ∎
Lemma 2.3.
Let denote siantimatching. Then there exist a maximal path such that all nodes in lie on a roottoleaf path in .
Proof.
Let be an arbitrary maximal path and an antichain (by Lemma 2.1). Suppose not all nodes in lie on a path, i.e. there exist such that nodes in do not all lie on a single path in . Then the path containing is also maximal since otherwise there would exist , , and , such that , which is a contradiction to the fact that . Hence, there exists a single node such that for all . Therefore, is also an antichain and by Lemma 2.2 it follows that is a path and the claim of the lemma follows. ∎
Now we are ready to state the main result of this section that immediately follows from previous lemmas:
Theorem 2.1 (Decomposition theorem).
For any siantimatching there exists a partition of into sets , for some roottoleaf path , and an antichain, such that is an antichain in and all lie on some roottoleaf path in .
2.2 DynamicProgramming
In the following, we will explain our algorithm for the siantimatching problem based on the decomposition theorem 2.1. Given siantimatching and for let denote the lowest ancestor of in . Analogously denote for . We start by partitioning antichains into equivalence classes.
Definition 2.2.
Partition into equivalence classes such that belong to the same equivalence class if . Analogously partition into equivalence classes .
Without loss of generality we will assume that , for , and similarly , for (see Figure 4). Furthermore, for let . Given the above decomposition of our problem, with the following theorem we show that we have an order that would allow for dynamic programming to be used.
Theorem 2.2.
Let denote siantimatching, and partitioning of and , respectively, and and equivalence classes of and , respectively. Then for
for all and .
Proof.
Let’s assume that for some , there exists such that
for all , and . That implies that there must exist some such that
and . Note that it holds that , since we assumed that , which implies a contradiction with the fact that is siantimatching.
Let for some and for all . But then there
exists such that for , we have that . Note
that it holds that , since we assumed that , which again implies a contradiction with
the fact that is siantimatching.
∎
We now see that any siantimatching has the structure illustrated in Figure 5. Particularly, if we separate elements of into equivalence classes of relation then for from distinct equivalence classes we have for all and . Analogous claim holds for . In other words, images of distinct equivalence classes under are not interwoven, allowing for a dynamic programming approach to be used.
Definition 2.3.
We will denote the set of vertices of the subtree rooted in the vertex with , the set of children of the vertex with , the root of a tree with and the unique parent of vertex with .
Before stating the main result, note the following.
Lemma 2.4.
Let , , w.l.o.g. , . Then either or .
Proof.
Assume both and . Then and which is a contradiction with the assumption that is an siantimatching. ∎
Theorem 2.3.
Let be the weight of maximum weight antichain in the subtree rooted at vertex with weight on vertex . Then,
, where
Proof.
Let and . The union of antichains of any family of subtrees rooted in , is an antichain and so any subset of is an siantimatching. Since we have for all , and for all , , inductively any set found by the dynamic program is an siantimatching. Conversely, let be a nonempty siantimatching. If , we have , where and are respectively the least and the greatest element of . If , w.l.o.g. . By lemma 2.4 it is not a loss of generality to assume . Partition where and . We have where and are respectively the least and the greatest element of . By proceeding analogously with the subtrees rooted in and and the siantimatching , we obtain .
∎
It should be noted that it is possible to implement the algorithm from theorem 2.3 with running time . Specifically, for a fixed the maximum weight antichains rooted in are independent of eachother, allowing us to simply accumulate them while iterating through .
3 Anti Tai mapping problem
We first present a quadratic algorithm for computing optimal anti Tai mapping in the case when one of the trees consists of a single roottoleaf path. It generalizes an algorithm from Do et al. (2019) which computes optimal anti Tai mapping in the case when both trees consist of a single roottoleaf path.
Theorem 3.1.
Optimal anti Tai mapping of path and tree rooted in satisfies
Proof.
Let be edge selected at any point of the dynamic program. Then any element of forms an anti Tai mapping with . Inductively, any set found by the dynamic program is anti Tai mapping. Conversely, let be an anti Tai mapping. If then for all and for all so inductively there is a sequence of steps corresponding to . ∎
Running time of the algorithm presented by theorem 3.1 is . We see this by noting that the dynamic program has states and that for a fixed , computing all amounts to a single tree traversal.
Theorem 2.3 therefore suggests a natural generalization of theorem 3.1 to the case of two arbitrary trees by simply replacing with .
Corollary 3.0.
, where
3.1 Implementation
The algorithm of corollary 1 can be implemented with asymptotic running time of , analogously to the optimal siantimatching algorithm of theorem 2.3. We provide a sample implementation at Blažević (2021). It is also worth noting, for practical purposes, that those algorithms admit a straightforward parallelization. Specifically, all can be checked in parallel.
3.2 Anti Tai mapping on DAGs
In the following section, we will make two straightforward observations regarding generalizations of anti Tai mapping to more general directed acyclic graphs (DAGs). First note that theorem 3.1 cannot be extended to DAGs since it depends on sets of descendants of distinct children being disjoint. Observation 3.1 notes that we can, however, do somewhat better than falling back to computing anti Tai mapping on pairs of paths (as is done in Do et al. (2019)) by extending a path to an arbitrary topological ordering. Observation 3.2 relates maximumweight anti Tai mapping to maximum weight antichain in the graph constructed by the product of two DAGs solving the problem when one of the DAGs is a path in polynomial time.
Observation 3.1.
Let and be directed acyclic graphs, , , arbitrary topological ordering of vertices of . Then optimal anti Tai mapping of and satisfies where
The observation follows from the fact that by definition of topological ordering we have incomparable with .
Observation 3.2.
Let and be directed acyclic graphs, , . Let where . Then the weight of optimal anti Tai mapping of and is equal to the weight of a maximum weight antichain in .
The observation follows from the fact that is a transitively closed directed acyclic graph and is an anti Tai mapping if and only if .
4 Conclusion
In this paper we consider the problem of generating cutting planes for the natural integer programming formulation of the Tai mapping problem. Our cutting planes are based on finding a maximumweight clique in a subgraph defined on the set of pairs of nodes of the two given trees, which we call an anti Tai mapping. For the special class of siantimatching, we give a decomposition theorem that describes its precise structure and hence allows us to use dynamic programming to find the maximumweight siantimatching in time. Inspired by this result, we also obtain a dynamic program that provides a polynomially computable lower bound on the maximumweight anti Tai mapping. Whether the latter problem is NPhard remains an interesting open question.
References
 Tai (1979) K.C. Tai, The treetotree correction problem, J. ACM 26 (1979) 422–433.
 Zhang et al. (1992) K. Zhang, R. Statman, D. Shasha, On the editing distance between unordered labeled trees, Information Processing Letters 42 (1992) 133–139.
 Hainke et al. (2012) K. Hainke, J. Rahnenführer, R. Fried, Cumulative disease progression models for crosssectional data: A review and comparison, Biometrical Journal 54 (2012) 617–640.
 Dutkowski et al. (2013) J. Dutkowski, M. Kramer, M. A. Surma, R. Balakrishnan, J. M. Cherry, N. J. Krogan, T. Ideker, A gene ontology inferred from molecular networks, Nature Biotechnology 31 (2013) 38–45.
 Damian Bogdanowicz (2013) K. G. Damian Bogdanowicz, On a matching distance between rooted phylogenetic trees, International Journal of Applied Mathematics and Computer Science 23 (2013) 669–684.
 Yoshino and Hirata (2017) T. Yoshino, K. Hirata, Tai mapping hierarchy for rooted labeled trees through common subforest, Theory Comput. Syst. 60 (2017) 759–783.
 Zhang and Jiang (1994) K. Zhang, T. Jiang, Some max snphard results concerning unordered labeled trees, Inf. Process. Lett. 49 (1994) 249–254.
 Kondo et al. (2014) S. Kondo, K. Otaki, M. Ikeda, A. Yamamoto, Fast computation of the tree edit distance between unordered trees using ip solvers, 2014, pp. 156–167. doi:10.1007/9783319118123_14.
 Do et al. (2019) V. H. Do, M. Blazevic, P. Monteagudo, L. Borozan, K. M. Elbassioni, S. Laue, F. R. Ringeling, D. Matijevic, S. Canzar, Dynamic pseudotime warping of complex singlecell trajectories, in: L. J. Cowen (Ed.), Research in Computational Molecular Biology, RECOMB 2019, Washington, DC, USA, May 58, 2019, Proceedings, volume 11467 of Lecture Notes in Computer Science, Springer, 2019, pp. 294–296.
 Böcker et al. (2013) S. Böcker, S. Canzar, G. W. Klau, The generalized robinsonfoulds metric, in: A. Darling, J. Stoye (Eds.), Algorithms in Bioinformatics, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 156–169.
 Hong et al. (2017) E. Hong, Y. Kobayashi, A. Yamamoto, Improved methods for computing distances between unordered trees using integer programming, 2017. doi:10.1007/9783319711478_4.
 Fukagawa et al. (2011) D. Fukagawa, T. Tamura, A. Takasu, E. Tomita, T. Akutsu, A cliquebased method for the edit distance between unordered trees and its application to analysis of glycan structures, BMC Bioinformatics 12 (2011) S13.

Grötschel et al. (1993)
M. Grötschel, L. Lovász, A. Schrijver, Geometric Algorithms and Combinatorial Optimization, volume 2 of
Algorithms and Combinatorics, second corrected ed., Springer, 1993.  Blažević (2021) M. Blažević, An implementation of anti Tai mapping, https://github.com/krofna/antitai, 2021.
Comments
There are no comments yet.