Pattern Matching on Grammar-Compressed Strings in Linear Time

11/09/2021
by   Moses Ganardi, et al.
0

The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern p of length m and a text t of length n, does p occur in t? Multiple versions of this basic question have been considered, and by now we know algorithms that are fast both in practice and in theory. However, the rapid increase in the amount of generated and stored data brings the need of designing algorithms that operate directly on compressed representations of data. In the compressed pattern matching problem we are given a compressed representation of the text, with n being the length of the compressed representation and N being the length of the text, and an uncompressed pattern of length m. The most challenging (and yet relevant when working with highly repetitive data, say biological information) scenario is when the chosen compression method is capable of describing a string of exponential length (in the size of its representation). An elegant formalism for such a compression method is that of straight-line programs, which are simply context-free grammars describing exactly one string. While it has been known that compressed pattern matching problem can be solved in O(m+nlog N) time for this compression method, designing a linear-time algorithm remained open. We resolve this open question by presenting an O(n+m) time algorithm that, given a context-free grammar of size n that produces a single string t and a pattern p of length m, decides whether p occurs in t as a substring. To this end, we devise improved solutions for the weighted ancestor problem and the substring concatenation problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2017

Compressed Indexing with Signature Grammars

The compressed indexing problem is to preprocess a string S of length n ...
research
03/02/2018

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Can we analyze data without decompressing it? As our data keeps growing,...
research
07/01/2021

Compression by Contracting Straight-Line Programs

In grammar-based compression a string is represented by a context-free g...
research
08/12/2020

Cadences in Grammar-Compressed Strings

Cadences are structurally maximal arithmetic progressions of indices cor...
research
02/19/2020

Fast and linear-time string matching algorithms based on the distances of q-gram occurrences

Given a text T of length n and a pattern P of length m, the string match...
research
07/23/2018

Data Race Detection on Compressed Traces

We consider the problem of detecting data races in program traces that h...
research
03/13/2023

Optimal Square Detection Over General Alphabets

Squares (fragments of the form xx, for some string x) are arguably the m...

Please sign up or login with your details

Forgot password? Click here to reset