A Compact DAG for Storing and Searching Maximal Common Subsequences

07/25/2023
by   Alessio Conte, et al.
0

Maximal Common Subsequences (MCSs) between two strings X and Y are subsequences of both X and Y that are maximal under inclusion. MCSs relax and generalize the well known and widely used concept of Longest Common Subsequences (LCSs), which can be seen as MCSs of maximum length. While the number both LCSs and MCSs can be exponential in the length of the strings, LCSs have been long exploited for string and text analysis, as simple compact representations of all LCSs between two strings, built via dynamic programming or automata, have been known since the '70s. MCSs appear to have a more challenging structure: even listing them efficiently was an open problem open until recently, thus narrowing the complexity difference between the two problems, but the gap remained significant. In this paper we close the complexity gap: we show how to build DAG of polynomial size-in polynomial time-which allows for efficient operations on the set of all MCSs such as enumeration in Constant Amortized Time per solution (CAT), counting, and random access to the i-th element (i.e., rank and select operations). Other than improving known algorithmic results, this work paves the way for new sequence analysis methods based on MCSs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2020

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Finding the common subsequences of L multiple strings has many applicati...
research
01/17/2020

Duplication with transposition distance to the root for q-ary strings

We study the duplication with transposition distance between strings of ...
research
09/16/2023

Parallel Longest Common SubSequence Analysis In Chapel

One of the most critical problems in the field of string algorithms is t...
research
04/28/2020

Approximating longest common substring with k mismatches: Theory and practice

In the problem of the longest common substring with k mismatches we are ...
research
01/16/2020

Faster STR-EC-LCS Computation

The longest common subsequence (LCS) problem is a central problem in str...
research
12/03/2022

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Given two equally long, uniformly random binary strings, the expected le...
research
04/23/2018

Longest Common Factor Made Fully Dynamic

In the longest common factor (LCF) problem, we are given two strings S a...

Please sign up or login with your details

Forgot password? Click here to reset