Exemplar or Matching: Modeling DCJ Problems with Unequal Content Genome Data

05/18/2017
by   Zhaoming Yin, et al.
0

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In this paper, we compare two mainstream methods to deal with duplications and associate them with Indels: one by deletion, namely DCJ-Indel-Exemplar distance; versus the other by gene matching, namely DCJ-Indel-Matching distance. We design branch-and-bound algorithms with set of optimization methods to compute exact distances for both. Furthermore, median problems are discussed in alignment with both of these distance methods, which are to find a median genome that minimizes distances between itself and three given genomes. Lin-Kernighan (LK) heuristic is leveraged and powered up by sub-graph decomposition and search space reduction technologies to handle median computation. A wide range of experiments are conducted on synthetic data sets and real data sets to show pros and cons of these two distance metrics per se, as well as putting them in the median computation scenario.

READ FULL TEXT
research
03/07/2023

Investigating the complexity of the double distance problems

Two genomes over the same set of gene families form a canonical pair whe...
research
02/04/2019

Distances between Data Sets Based on Summary Statistics

The concepts of similarity and distance are crucial in data mining. We c...
research
04/01/2020

k-Median clustering under discrete Fréchet and Hausdorff distances

We give the first near-linear time (1+)-approximation algorithm for k-me...
research
03/04/2020

Pivot Selection for Median String Problem

The Median String Problem is W[1]-Hard under the Levenshtein distance, t...
research
07/07/2020

Natural family-free genomic distance

A classical problem in comparative genomics is to compute the rearrangem...
research
09/27/2019

Computing the Inversion-Indel Distance

The inversion distance, that is the distance between two unichromosomal ...
research
06/26/2019

Generalized Median Graph via Iterative Alternate Minimizations

Computing a graph prototype may constitute a core element for clustering...

Please sign up or login with your details

Forgot password? Click here to reset