Computing MEMs on Repetitive Text Collections

10/18/2022
by   Gonzalo Navarro, et al.
0

We consider the problem of computing the Maximal Exact Matches (MEMs) of a given pattern P[1..m] on a large repetitive text collection T[1..n], which is represented as a (hopefully much smaller) run-length context-free grammar of size g_rl. We show that the problem can be solved in time O(m^2 log^ϵ n), for any constant ϵ > 0, on a data structure of size O(g_rl). Further, on a locally consistent grammar of size O(δlogn/δ), the time decreases to O(mlog m(log m + log^ϵ n)). The value δ is a function of the substring complexity of T and Ω(δlogn/δ) is a tight lower bound on the compressibility of repetitive texts T, so our structure has optimal size in terms of n and δ. We extend our results to the problem of finding q-MEMs, which must appear at least q times in T.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/29/2023

Computing all-vs-all MEMs in grammar-compressed text

We describe a compression-aware method to compute all-vs-all maximal exa...
research
08/31/2022

Space-efficient data structure for next/previous larger/smaller value queries

Given an array of size n from a total order, we consider the problem of ...
research
07/01/2021

Compression by Contracting Straight-Line Programs

In grammar-based compression a string is represented by a context-free g...
research
12/05/2022

Space-efficient conversions from SLPs

Given a straight-line program with g rules for a text T [1..n], we can b...
research
03/26/2018

On the Approximation Ratio of Greedy Parsings

Shannon's entropy is a clear lower bound for statistical compression. Th...
research
11/30/2018

Faster Attractor-Based Indexes

String attractors are a novel combinatorial object encompassing most kno...
research
06/25/2021

Approximate Maximum Halfspace Discrepancy

Consider the geometric range space (X, ℋ_d) where X ⊂ℝ^d and ℋ_d is the ...

Please sign up or login with your details

Forgot password? Click here to reset