The Tandem Duplication Distance is NP-hard

06/12/2019
by   Manuel Lafond, et al.
0

In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB AXXB. For example, Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Recently, due to the genomic sequence analysis in several types of cance...
research
04/05/2020

On the Tandem Duplication Distance

A tandem duplication denotes the process of inserting a copy of a segmen...
research
02/16/2022

On the Complexity of Some Variations of Sorting by Transpositions

One of the main challenges in Computational Biology is to find the evolu...
research
01/07/2020

Computing the rearrangement distance of natural genomes

The computation of genomic distances has been a very active field of com...
research
07/07/2020

Natural family-free genomic distance

A classical problem in comparative genomics is to compute the rearrangem...
research
12/05/2020

r-Gathering Problems on Spiders:Hardness, FPT Algorithms, and PTASes

We consider the min-max r-gathering problem described as follows: We are...
research
04/09/2018

Hardness of Consensus Problems for Circular Strings and Time Series Averaging

Consensus problems for strings and sequences appear in numerous applicat...

Please sign up or login with your details

Forgot password? Click here to reset