Practical KMP/BM Style Pattern-Matching on Indeterminate Strings

04/18/2022
by   Hossein Dehghani, et al.
0

In this paper we describe two simple, fast, space-efficient algorithms for finding all matches of an indeterminate pattern p = p[1..m] in an indeterminate string x = x[1..n], where both p and x are defined on a "small" ordered alphabet Σ - say, σ = |Σ| ≤ 9. Both algorithms depend on a preprocessing phase that replaces Σ by an integer alphabet Σ_I of size σ_I = σ which (reversibly, in time linear in string length) maps both x and p into equivalent regular strings y and q, respectively, on Σ_I, whose maximum (indeterminate) letter can be expressed in a 32-bit word (for σ≤ 4, thus for DNA sequences, an 8-bit representation suffices). We first describe an efficient version KMP Indet of the venerable Knuth-Morris-Pratt algorithm to find all occurrences of q in y (that is, of p in x), but, whenever necessary, using the prefix array, rather than the border array, to control shifts of the transformed pattern q along the transformed string y. We go on to describe a similar efficient version BM Indet of the Boyer- Moore algorithm that turns out to execute significantly faster than KMP Indet over a wide range of test cases. A noteworthy feature is that both algorithms require very little additional space: Θ(m) words. We conjecture that a similar approach may yield practical and efficient indeterminate equivalents to other well-known pattern-matching algorithms, in particular the several variants of Boyer-Moore.

READ FULL TEXT
research
03/01/2022

Quantum jumbled pattern matching

Let S_1, S_2 ∈Σ^* strings, we say that S_1 jumble match S_2 if they are ...
research
09/05/2019

A Simple Reduction for Full-Permuted Pattern Matching Problems on Multi-Track Strings

In this paper we study a variant of string pattern matching which deals ...
research
03/14/2019

The Parameterized Position Heap of a Trie

Let Σ and Π be disjoint alphabets of respective size σ and π. Two string...
research
04/20/2020

Summarizing Diverging String Sequences, with Applications to Chain-Letter Petitions

Algorithms to find optimal alignments among strings, or to find a parsim...
research
07/03/2022

Suffix sorting via matching statistics

We introduce a new algorithm for constructing the generalized suffix arr...
research
08/23/2021

On Specialization of a Program Model of Naive Pattern Matching in Strings (Extended Abstract)

We have proved that for any pattern p the tail recursive program model o...
research
07/14/2020

On a combinatorial generation problem of Knuth

The well-known middle levels conjecture asserts that for every integer n...

Please sign up or login with your details

Forgot password? Click here to reset