Computing all-vs-all MEMs in grammar-compressed text

06/29/2023
by   Diego Díaz-Domínguez, et al.
0

We describe a compression-aware method to compute all-vs-all maximal exact matches (MEM) among strings of a repetitive collection 𝒯. The key concept in our work is the construction of a fully-balanced grammar 𝒢 from 𝒯 that meets a property that we call fix-free: the expansions of the nonterminals that have the same height in the parse tree form a fix-free set (i.e., prefix-free and suffix-free). The fix-free property allows us to compute the MEMs of 𝒯 incrementally over 𝒢 using a standard suffix-tree-based MEM algorithm, which runs on a subset of grammar rules at a time and does not decompress nonterminals. By modifying the locally-consistent grammar of Christiansen et al 2020., we show how we can build 𝒢 from 𝒯 in linear time and space. We also demonstrate that our MEM algorithm runs on top of 𝒢 in O(G +occ) time and uses O(log G(G+occ)) bits, where G is the grammar size, and occ is the number of MEMs in 𝒯. In the conclusions, we discuss how our idea can be modified to implement approximate pattern matching in compressed space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/01/2020

Grammar-Compressed Indexes with Logarithmic Search Time

Let a text T[1..n] be the only string generated by a context-free gramma...
research
10/18/2022

Computing MEMs on Repetitive Text Collections

We consider the problem of computing the Maximal Exact Matches (MEMs) of...
research
12/05/2022

Space-efficient conversions from SLPs

Given a straight-line program with g rules for a text T [1..n], we can b...
research
08/12/2020

Cadences in Grammar-Compressed Strings

Cadences are structurally maximal arithmetic progressions of indices cor...
research
01/14/2020

Simulation computation in grammar-compressed graphs

Like [1], we present an algorithm to compute the simulation of a query p...
research
07/17/2017

The Power of Constraint Grammars Revisited

Sequential Constraint Grammar (SCG) (Karlsson, 1990) and its extensions ...
research
07/17/2023

Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data

Grammar compression is a general compression framework in which a string...

Please sign up or login with your details

Forgot password? Click here to reset