Constant-delay enumeration for SLP-compressed documents

09/25/2022
by   Martin Muñoz, et al.
0

We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm which evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay which is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant-delay guarantee.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2022

Efficient enumeration algorithms for annotated grammars

We introduce annotated grammars, an extension of context-free grammars w...
research
01/25/2021

Spanner Evaluation over SLP-Compressed Documents

We consider the problem of evaluating regular spanners over compressed d...
research
10/12/2020

Constant-delay enumeration algorithms for document spanners over nested documents

Some of the most relevant document schemas used online, such as XML and ...
research
12/22/2018

Enumeration on Trees with Tractable Combined Complexity and Efficient Updates

We give an algorithm to enumerate the results on trees of monadic second...
research
07/24/2018

Constant-Delay Enumeration for Nondeterministic Document Spanners

We consider the information extraction approach known as document spanne...
research
08/30/2019

Annotated Document Spanners

We introduce annotated document spanners, which are document spanners th...
research
06/28/2022

Which arithmetic operations can be performed in constant time in the RAM model with addition?

In the literature of algorithms, the specific computation model is often...

Please sign up or login with your details

Forgot password? Click here to reset