Weighted Shortest Common Supersequence Problem Revisited

A weighted string, also known as a position weight matrix, is a sequence of probability distributions over some alphabet. We revisit the Weighted Shortest Common Supersequence (WSCS) problem, introduced by Amir et al. [SPIRE 2011], that is, the SCS problem on weighted strings. In the WSCS problem, we are given two weighted strings W_1 and W_2 and a threshold Freq on probability, and we are asked to compute the shortest (standard) string S such that both W_1 and W_2 match subsequences of S (not necessarily the same) with probability at least Freq. Amir et al. showed that this problem is NP-complete if the probabilities, including the threshold Freq, are represented by their logarithms (encoded in binary). We present an algorithm that solves the WSCS problem for two weighted strings of length n over a constant-sized alphabet in O(n^2√(z)logz) time. Notably, our upper bound matches known conditional lower bounds stating that the WSCS problem cannot be solved in O(n^2-ε) time or in O^*(z^0.5-ε) time unless there is a breakthrough improving upon long-standing upper bounds for fundamental NP-hard problems (CNF-SAT and Subset Sum, respectively). We also discover a fundamental difference between the WSCS problem and the Weighted Longest Common Subsequence (WLCS) problem, introduced by Amir et al. [JDA 2010]. We show that the WLCS problem cannot be solved in O(n^f(z)) time, for any function f(z), unless P=NP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2022

Cartesian Tree Subsequence Matching

Park et al. [TCS 2020] observed that the similarity between two (numeric...
research
08/19/2021

A Conditional Lower Bound for Episode Matching

Given two strings S and P, the Episode Matching problem is to compute th...
research
07/20/2021

Hardness of Detecting Abelian and Additive Square Factors in Strings

We prove 3SUM-hardness (no strongly subquadratic-time algorithm, assumin...
research
04/05/2020

On the Tandem Duplication Distance

A tandem duplication denotes the process of inserting a copy of a segmen...
research
04/30/2018

On improving the approximation ratio of the r-shortest common superstring problem

The Shortest Common Superstring problem (SCS) consists, for a set of str...
research
01/13/2019

Longest Common Subsequence on Weighted Sequences

We consider the general problem of the Longest Common Subsequence (LCS) ...
research
09/19/2022

Convergence of the number of period sets in strings

Consider words of length n. The set of all periods of a word of length n...

Please sign up or login with your details

Forgot password? Click here to reset