Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

12/02/2018
by   Domenico Cantone, et al.
0

Unbalanced translocations are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of tumor suppressor genes. Despite of their central role in genomic sequence analysis, little attention has been devoted to the problem of matching sequences allowing for this kind of chromosomal alteration. In this paper we investigate the approximate string matching problem when the edit operations are non-overlapping unbalanced translocations of adjacent factors. In particular, we first present a O(nm^3)-time and O(m^2)-space algorithm based on the dynamic-programming approach. Then we improve our first result by designing a second solution which makes use of the Directed Acyclic Word Graph of the pattern. In particular, we show that under the assumptions of equiprobability and independence of characters, our algorithm has a O(n^2_σ m) average time complexity, for an alphabet of size σ, still maintaining the O(nm^3)-time and the O(m^2)-space complexity in the worst case. To the best of our knowledge this is the first solution in literature for the approximate string matching problem allowing for unbalanced translocations of factors.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2021

Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

In this paper we investigate the approximate string matching problem whe...
research
01/13/2018

Longest Common Prefixes with k-Errors and Applications

Although real-world text datasets, such as DNA sequences, are far from b...
research
03/03/2023

On Sensitivity of Compact Directed Acyclic Word Graphs

Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a...
research
01/31/2022

Fuzzy Segmentations of a String

This article discusses a particular case of the data clustering problem,...
research
08/16/2019

Efficient Online String Matching Based on Characters Distance Text Sampling

Searching for all occurrences of a pattern in a text is a fundamental pr...
research
02/28/2018

Fast Lempel-Ziv Decompression in Linear Space

We consider the problem of decompressing the Lempel-Ziv 77 representatio...

Please sign up or login with your details

Forgot password? Click here to reset