Text Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

01/03/2021
by   Domenico Cantone, et al.
0

In this paper we investigate the approximate string matching problem when the allowed edit operations are non-overlapping unbalanced translocations of adjacent factors. Such kind of edit operations take place when two adjacent sub-strings of the text swap, resulting in a modified string. The two involved substrings are allowed to be of different lengths. Such large-scale modifications on strings have various applications. They are among the most frequent chromosomal alterations, accounted for 30% of all losses of heterozygosity, a major genetic event causing inactivation of cancer suppressor genes. In addition, among other applications, they are frequent modifications accounted in musical or in natural language information retrieval. However, despite of their central role in so many fields of text processing, little attention has been devoted to the problem of matching strings allowing for this kind of edit operation. In this paper we present three algorithms for solving the problem, all of them with a (nm^3) worst-case and a (m^2)-space complexity, where m and n are the length of the pattern and of the text, respectively. particular, our first algorithm is based on the dynamic-programming approach. Our second solution improves the previous one by making use of the Directed Acyclic Word Graph of the pattern. Finally our third algorithm is based on an alignment procedure. We also show that under the assumptions of equiprobability and independence of characters, our second algorithm has a (nlog^2_σ m) average time complexity, for an alphabet of size σ≥ 4.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2018

Sequence Searching Allowing for Non-Overlapping Adjacent Unbalanced Translocations

Unbalanced translocations are among the most frequent chromosomal altera...
research
01/07/2020

Complexity Issues of String to Graph Approximate Matching

The problem of matching a query string to a directed graph, whose vertic...
research
03/03/2023

On Sensitivity of Compact Directed Acyclic Word Graphs

Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a...
research
04/23/2018

Longest Common Factor Made Fully Dynamic

In the longest common factor (LCF) problem, we are given two strings S a...
research
08/16/2019

Efficient Online String Matching Based on Characters Distance Text Sampling

Searching for all occurrences of a pattern in a text is a fundamental pr...
research
10/08/2018

Approximate Online Pattern Matching in Sub-linear Time

We consider the approximate pattern matching problem under edit distance...
research
01/31/2022

Fuzzy Segmentations of a String

This article discusses a particular case of the data clustering problem,...

Please sign up or login with your details

Forgot password? Click here to reset