Sliding Window String Indexing in Streams

01/23/2023
by   Philip Bille, et al.
0

Given a string S over an alphabet Σ, the 'string indexing problem' is to preprocess S to subsequently support efficient pattern matching queries, i.e., given a pattern string P report all the occurrences of P in S. In this paper we study the 'streaming sliding window string indexing problem'. Here the string S arrives as a stream, one character at a time, and the goal is to maintain an index of the last w characters, called the 'window', for a specified parameter w. At any point in time a pattern matching query for a pattern P may arrive, also streamed one character at a time, and all occurrences of P within the current window must be returned. The streaming sliding window string indexing problem naturally captures scenarios where we want to index the most recent data (i.e. the window) of a stream while supporting efficient pattern matching. Our main result is a simple O(w) space data structure that uses O(log w) time with high probability to process each character from both the input string S and the pattern string P. Reporting each occurrence from P uses additional constant time per reported occurrence. Compared to previous work in similar scenarios this result is the first to achieve an efficient worst-case time per character from the input stream. We also consider a delayed variant of the problem, where a query may be answered at any point within the next δ characters that arrive from either stream. We present an O(w + δ) space data structure for this problem that improves the above time bounds to O(log(w/δ)). In particular, for a delay of δ = ϵ w we obtain an O(w) space data structure with constant time processing per character. The key idea to achieve our result is a novel and simple hierarchical structure of suffix trees of independent interest, inspired by the classic log-structured merge trees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2018

Sliding Suffix Tree

We consider a sliding window over a stream of characters from some finit...
research
07/08/2020

String Indexing for Top-k Close Consecutive Occurrences

The classic string indexing problem is to preprocess a string S into a c...
research
03/01/2019

Parallel Index-based Stream Join on a Multicore CPU

There is increasing interest in using multicore processors to accelerate...
research
07/04/2023

Sliding suffix trees simplified

Sliding suffix trees (Fiala Greene, 1989) for an input text T over a...
research
11/15/2018

Vectorized Character Counting for Faster Pattern Matching

Many modern sequence alignment tools implement fast string matching usin...
research
04/02/2020

On Locating Paths in Compressed Cardinal Trees

A compressed index is a data structure representing a text within compre...
research
05/14/2023

Dynamic Convex Hulls under Window-Sliding Updates

We consider the problem of dynamically maintaining the convex hull of a ...

Please sign up or login with your details

Forgot password? Click here to reset