Fast computation of approximate weak common intervals in multiple indeterminate strings

04/05/2023
by   Daniel Doerr, et al.
0

In ongoing work to define a principled method for syntenic block discovery and structuring, work based on homology-derived constraints and a generalization of common intervals, we faced a fundamental computational problem: how to determine quickly, among a set of indeterminate strings (strings whose elements consist of subsets of characters), contiguous intervals that would share a vast majority of their elements, but allow for sharing subsets of characters subsumed by others, and also for certain elements to be missing from certain genomes. An algorithm for this problem in the special case of determinate strings (where each element is a single character of the alphabet, i.e., "normal" strings) was described by Doerr et al., but its running time would explode if generalized to indeterminate strings. In this paper, we describe an algorithm for computing these special common intervals in time close to that of the simpler algorithm of Doerr et al. and show that can compute these intervals in just a couple of hours for large collections (tens to hundreds) of bacterial genomes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Faster STR-EC-LCS Computation

The longest common subsequence (LCS) problem is a central problem in str...
research
06/30/2022

Computing the Parameterized Burrows–Wheeler Transform Online

Parameterized strings are a generalization of strings in that their char...
research
09/07/2020

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Finding the common subsequences of L multiple strings has many applicati...
research
04/16/2019

Heuristic algorithms for the Longest Filled Common Subsequence Problem

At CPM 2017, Castelli et al. define and study a new variant of the Longe...
research
04/05/2020

On the Tandem Duplication Distance

A tandem duplication denotes the process of inserting a copy of a segmen...
research
08/19/2020

Modular Subset Sum, Dynamic Strings, and Zero-Sum Sets

The modular subset sum problem consists of deciding, given a modulus m, ...
research
08/19/2022

Merging Sorted Lists of Similar Strings

Merging T sorted, non-redundant lists containing M elements into a singl...

Please sign up or login with your details

Forgot password? Click here to reset