The Chvátal-Sankoff problem: Understanding random string comparison through stochastic processes

12/03/2022
by   Alexander Tiskin, et al.
0

Given two equally long, uniformly random binary strings, the expected length of their longest common subsequence (LCS) is asymptotically proportional to the strings' length. Finding the proportionality coefficient γ, i.e. the limit of the normalised LCS length for two random binary strings of length n →∞, is a very natural problem, first posed by Chvátal and Sankoff in 1975, and as yet unresolved. This problem has relevance to diverse fields ranging from combinatorics and algorithm analysis to coding theory and computational biology. Using methods of statistical mechanics, as well as some existing results on the combinatorial structure of LCS, we link constant γ to the parameters of a certain stochastic particle process. These parameters are determined by a specific (large) system of polynomial equations with integer coefficients, which implies that γ is an algebraic number. Short of finding an exact closed-form solution for such a polynomial system, which appears to be unlikely, our approach essentially resolves the Chvátal-Sankoff problem, albeit in a somewhat unexpected way with a rather negative flavour.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Space-Efficient STR-IC-LCS Computation

One of the most fundamental method for comparing two given strings A and...
research
06/23/2022

Longest Common Subsequence: Tabular vs. Closed-Form Equation Computation of Subsequence Probability

The Longest Common Subsequence Problem (LCS) deals with finding the long...
research
05/07/2021

Improved Approximation for Longest Common Subsequence over Small Alphabets

This paper investigates the approximability of the Longest Common Subseq...
research
06/09/2021

The zero-rate threshold for adversarial bit-deletions is less than 1/2

We prove that there exists an absolute constant δ>0 such any binary code...
research
09/07/2020

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Finding the common subsequences of L multiple strings has many applicati...
research
07/25/2023

A Compact DAG for Storing and Searching Maximal Common Subsequences

Maximal Common Subsequences (MCSs) between two strings X and Y are subse...
research
02/02/2018

From Clustering Supersequences to Entropy Minimizing Subsequences for Single and Double Deletions

A binary string transmitted via a memoryless i.i.d. deletion channel is ...

Please sign up or login with your details

Forgot password? Click here to reset