A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

09/07/2020
by   Jin Cao, et al.
0

Finding the common subsequences of L multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for L strings is NP-hard, e.g., the computational complexity is exponential in L. In this paper, we develop a randomized algorithm, referred to as Random-MCS, for finding a random instance of Maximal Common Subsequence (MCS) of multiple strings. A common subsequence is maximal if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in L, and therefore is suitable for large L. Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of Random-MCS often yields a solution to LCS.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2023

A Compact DAG for Storing and Searching Maximal Common Subsequences

Maximal Common Subsequences (MCSs) between two strings X and Y are subse...
research
04/05/2023

Fast computation of approximate weak common intervals in multiple indeterminate strings

In ongoing work to define a principled method for syntenic block discove...
research
12/22/2017

Longest common substring with approximately k mismatches

In the longest common substring problem we are given two strings of leng...
research
12/03/2022

The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

Given two equally long, uniformly random binary strings, the expected le...
research
04/28/2020

Approximating longest common substring with k mismatches: Theory and practice

In the problem of the longest common substring with k mismatches we are ...
research
05/16/2023

Finding Maximal Exact Matches in Graphs

We study the problem of finding maximal exact matches (MEMs) between a q...
research
11/14/2022

Growing Random Strings in CA

We discuss a class of cellular automata (CA) able to produce long random...

Please sign up or login with your details

Forgot password? Click here to reset