Online LZ77 Parsing and Matching Statistics with RLBWTs

02/16/2018
by   Hideo Bannai, et al.
0

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Wheeler Transform (BWT) are all fundamental elements of stringology. In a series of recent papers, Policriti and Prezza (DCC 2016 and Algorithmica, CPM 2017) showed how we can use an augmented run-length compressed BWT (RLBWT) of the reverse T^R of a text T, to compute offline the LZ77 parse of T in O (n r) time and O (r) space, where n is the length of T and r is the number of runs in the BWT of T^R. In this paper we first extend a well-known technique for updating an unaugmented RLBWT when a character is prepended to a text, to work with Policriti and Prezza's augmented RLBWT. This immediately implies that we can build online the LZ77 parse of T while still using O (n r) time and O (r) space; it also seems likely to be of independent interest. Our experiments, using an extension of Ohno, Takabatake, I and Sakamoto's (IWOCA 2017) implementation of updating, show our approach is both time- and space-efficient for repetitive strings. We then show how to augment the RLBWT further --- albeit making it static again and increasing its space by a factor proportional to the size of the alphabet --- such that later, given another string S and O ( n)-time random access to T, we can compute the matching statistics of S with respect to T in O (|S| n) time.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2017

Closing in on Time and Space Optimal Construction of Compressed Indexes

Fast and space-efficient construction of compressed indexes such as comp...
research
01/18/2022

Computing Longest (Common) Lyndon Subsequences

Given a string T with length n whose characters are drawn from an ordere...
research
04/18/2018

On Abelian Longest Common Factor with and without RLE

We consider the Abelian longest common factor problem in two scenarios: ...
research
05/25/2022

Substring Complexities on Run-length Compressed Strings

Let S_T(k) denote the set of distinct substrings of length k in a string...
research
04/09/2018

From Regular Expression Matching to Parsing

Given a regular expression R and a string Q the regular expression match...
research
03/29/2018

Prefix-Free Parsing for Building Big BWTs

High-throughput sequencing technologies have led to explosive growth of ...
research
11/30/2018

Faster Attractor-Based Indexes

String attractors are a novel combinatorial object encompassing most kno...

Please sign up or login with your details

Forgot password? Click here to reset