Contextual Pattern Matching

10/14/2020
by   Gonzalo Navarro, et al.
0

The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all the positions where pattern P appears" can produce huge outputs when P appears in an area shared by many documents. All those occurrences are essentially the same. In this paper we propose a new query that can be more appropriate in these collections, which we call contextual pattern matching. The basic query of this type gives, in addition to P, a context length ℓ, and asks to report the occurrences of all distinct strings XPY, with |X|=|Y|=ℓ. While this query is easily solved in optimal time and linear space, we focus on using space related to the repetitiveness of the text collection and present the first solution of this kind. Letting be the maximum of the number of runs in the BWT of the text T[1..n] and of its reverse, our structure uses O(log(n/)) space and finds the c contextual occurrences XPY of (P,ℓ) in time O(|P| + c log n). We give other space/time tradeoffs as well, for compressed and uncompressed indexes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2019

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Document listing on string collections is the task of finding all docume...
research
06/03/2021

Position Heaps for Cartesian-tree Matching on Strings and Tries

The Cartesian-tree pattern matching is a recently introduced scheme of p...
research
12/18/2020

The Parameterized Suffix Tray

Let Σ and Π be disjoint alphabets, respectively called the static alphab...
research
03/01/2022

Quantum jumbled pattern matching

Let S_1, S_2 ∈Σ^* strings, we say that S_1 jumble match S_2 if they are ...
research
04/12/2018

Fast Prefix Search in Little Space, with Applications

It has been shown in the indexing literature that there is an essential ...
research
04/09/2021

A Probabilistic Framework for Lexicon-based Keyword Spotting in Handwritten Text Images

Query by String Keyword Spotting (KWS) is here considered as a key techn...
research
06/28/2020

Random Access in Persistent Strings

We consider compact representations of collections of similar strings th...

Please sign up or login with your details

Forgot password? Click here to reset