Reflections on kernelizing and computing unrooted agreement forests

12/14/2020
by   Rim van Wersch, et al.
0

Phylogenetic trees are leaf-labelled trees used to model the evolution of species. Here we explore the practical impact of kernelization (i.e. data reduction) on the NP-hard problem of computing the TBR distance between two unrooted binary phylogenetic trees. This problem is better-known in the literature as the maximum agreement forest problem, where the goal is to partition the two trees into a minimum number of common, non-overlapping subtrees. We have implemented two well-known reduction rules, the subtree and chain reduction, and five more recent, theoretically stronger reduction rules, and compare the reduction achieved with and without the stronger rules. We find that the new rules yield smaller reduced instances and thus have clear practical added value. In many cases they also cause the TBR distance to decrease in a controlled fashion. Next, we compare the achieved reduction to the known worst-case theoretical bounds of 15k-9 and 11k-9 respectively, on the number of leaves of the two reduced trees, where k is the TBR distance, observing in both cases a far larger reduction in practice. As a by-product of our experimental framework we obtain a number of new insights into the actual computation of TBR distance. We find, for example, that very strong lower bounds on TBR distance can be obtained efficiently by randomly sampling certain carefully constructed partitions of the leaf labels, and identify instances which seem particularly challenging to solve exactly. The reduction rules have been implemented within our new solver Tubro which combines kernelization with an Integer Linear Programming (ILP) approach. Tubro also incorporates a number of additional features, such as a cluster reduction and a practical upper-bounding heuristic, and it can leverage combinatorial insights emerging from the proofs of correctness of the reduction rules to simplify the ILP.

READ FULL TEXT

page 13

page 17

research
11/16/2018

A tight kernel for computing the tree bisection and reconnection distance between two phylogenetic trees

In 2001 Allen and Steel showed that, if subtree and chain reduction rule...
research
06/09/2022

Deep kernelization for the Tree Bisection and Reconnnect (TBR) distance in phylogenetics

We describe a kernel of size 9k-8 for the NP-hard problem of computing t...
research
05/04/2019

New reduction rules for the tree bisection and reconnection distance

Recently it was shown that, if the subtree and chain reduction rules hav...
research
07/22/2023

Agreement forests of caterpillar trees: complexity, kernelization and branching

Given a set X of species, a phylogenetic tree is an unrooted binary tree...
research
11/14/2018

A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest

We give a 2-approximation algorithm for the Maximum Agreement Forest pro...
research
02/20/2022

Cyclic generators and an improved linear kernel for the rooted subtree prune and regraft distance

The rooted subtree prune and regraft (rSPR) distance between two rooted ...
research
05/14/2020

Algorithmic Techniques for Necessary and Possible Winners

We investigate the practical aspects of computing the necessary and poss...

Please sign up or login with your details

Forgot password? Click here to reset