Online algorithms for finding distinct substrings with length and multiple prefix and suffix conditions

07/09/2022
by   Laurentius Leonard, et al.
0

Let two static sequences of strings P and S, representing prefix and suffix conditions respectively, be given as input for preprocessing. For the query, let two positive integers k_1 and k_2 be given, as well as a string T given in an online manner, such that T_i represents the length-i prefix of T for 1 ≤ i ≤ |T|. In this paper we are interested in computing the set 𝑎𝑛𝑠_𝑖 of distinct substrings w of T_i such that k_1 ≤ |w| ≤ k_2, and w contains some p ∈ P as a prefix and some s ∈ S as a suffix. More specifically, the counting problem is to output |𝑎𝑛𝑠_𝑖|, whereas the reporting problem is to output all elements of 𝑎𝑛𝑠_𝑖, for each iteration i. Let σ denote the alphabet size, and for a sequence of strings A, ‖ A‖=∑_u∈ A|u|. Then, we show that after O((‖ P‖ +‖ S‖)logσ)-time preprocessing, the solutions for the counting and reporting problems for each iteration up to i can be output in O(|T_i| logσ) and O(|T_i| logσ + |𝑎𝑛𝑠_𝑖|) total time. The preprocessing time can be reduced to O(‖ P‖ +‖ S‖) for integer alphabets of size polynomial with regard to ‖ P‖ +‖ S‖. Our algorithms have possible applications to network traffic classification.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset