Grammar-Compressed Indexes with Logarithmic Search Time

04/01/2020
by   Francisco Claude, et al.
0

Let a text T[1..n] be the only string generated by a context-free grammar with g (terminal and nonterminal) symbols, and of size G (measured as the sum of the lengths of the right-hand sides of the rules). Such a grammar, called a grammar-compressed representation of T, can be encoded using essentially G g bits. We introduce the first grammar-compressed index that uses O(G n) bits and can find the occ occurrences of patterns P[1..m] in time O((m^2+occ) G). We implement the index and demonstrate its practicality in comparison with the state of the art, on highly repetitive text collections.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2020

Grammar compression with probabilistic context-free grammar

We propose a new approach for universal lossless text compression, based...
research
05/28/2021

Grammar Index By Induced Suffix Sorting

Pattern matching is the most central task for text indices. Most recent ...
research
06/29/2023

Computing all-vs-all MEMs in grammar-compressed text

We describe a compression-aware method to compute all-vs-all maximal exa...
research
10/04/2021

FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns

The run-length compressed Burrows-Wheeler transform (RLBWT) used in conj...
research
02/08/2021

Efficient construction of the extended BWT from grammar-compressed DNA sequencing reads

We present an algorithm for building the extended BWT (eBWT) of a string...
research
11/08/2017

A compressed dynamic self-index for highly repetitive text collections

We present a novel compressed dynamic self-index for highly repetitive t...
research
02/28/2020

Learning Directly from Grammar Compressed Text

Neural networks using numerous text data have been successfully applied ...

Please sign up or login with your details

Forgot password? Click here to reset