Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string

12/04/2019
by   P. Mirabal, et al.
0

Strings are a natural representation of biological data such as DNA, RNA and protein sequences. The problem of finding a string that summarizes a set of sequences has direct application in relative compression algorithms for genome and proteome analysis, where reference sequences need to be chosen. Median strings have been used as representatives of a set of strings in different domains. However, several formulations of those problems are NP-Complete. Alternatively, heuristic approaches that iteratively refine an initial coarse solution by applying edit operations have been proposed. Recently, we investigated the selection of the optimal edit operations to speed up convergence without spoiling the quality of the approximated median string. We propose a novel algorithm that outperforms state of the art heuristic approximations to the median string in terms of convergence speed by estimating the effect of a perturbation in the minimization of the expressions that define the median strings. We present corpus of comparative experiments to validate these results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2020

Approximating the Median under the Ulam Metric

We study approximation algorithms for variants of the median string prob...
research
03/04/2020

Pivot Selection for Median String Problem

The Median String Problem is W[1]-Hard under the Levenshtein distance, t...
research
04/20/2020

Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions

Algorithms to find optimal alignments among strings, or to find a parsim...
research
08/24/2022

Hierarchical Relative Lempel-Ziv Compression

Relative Lempel-Ziv (RLZ) parsing is a dictionary compression method in ...
research
12/10/2021

How Compression and Approximation Affect Efficiency in String Distance Measures

Real-world data often comes in compressed form. Analyzing compressed dat...
research
07/04/2012

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

The need to measure sequence similarity arises in information extraction...
research
02/17/2016

Lexis: An Optimization Framework for Discovering the Hierarchical Structure of Sequential Data

Data represented as strings abounds in biology, linguistics, document mi...

Please sign up or login with your details

Forgot password? Click here to reset