Weighted Shortest Common Supersequence Problem Revisited

A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings W_1 and W_2 and a threshold Freq on probability, and we are asked to compute the shortest (standard) string S such that both W_1 and W_2 match subsequences of S (not necessarily the same) with probability at least Freq. Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold Freq, are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in O(n^2√(z)logz) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in O(n^2-ε) time or in O^*(z^0.5-ε) time unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in O(n^f(z)) time, for any function f(z), unless P=NP.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset