An Optimal-Time RLBWT Construction in BWT-runs Bounded Space

02/16/2022
by   Takaaki Nishimoto, et al.
0

The compression of highly repetitive strings (i.e., strings with many repetitions) has been a central research topic in string processing, and quite a few compression methods for these strings have been proposed thus far. Among them, an efficient compression format gathering increasing attention is the run-length Burrows–Wheeler transform (RLBWT), which is a run-length encoded BWT as a reversible permutation of an input string on the lexicographical order of suffixes. State-of-the-art construction algorithms of RLBWT have a serious issue with respect to (i) non-optimal computation time or (ii) a working space that is linearly proportional to the length of an input string. In this paper, we present r-comp, the first optimal-time construction algorithm of RLBWT in BWT-runs bounded space. That is, the computational complexity of r-comp is O(n + r logr) time and O(rlogn) bits of working space for the length n of an input string and the number r of equal-letter runs in BWT. The computation time is optimal (i.e., O(n)) for strings with the property r=O(n/logn), which holds for most highly repetitive strings. Experiments using a real-world dataset of highly repetitive strings show the effectiveness of r-comp with respect to computation time and space.

READ FULL TEXT

page 9

page 13

page 19

page 21

page 23

page 27

page 29

page 31

research
06/09/2020

Optimal-Time Queries on BWT-runs Compressed Indexes

Although a significant number of compressed indexes for highly repetitiv...
research
07/05/2023

Linear-time computation of generalized minimal absent words for multiple strings

A string w is called a minimal absent word (MAW) for a string S if w doe...
research
08/19/2020

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

The Burrows-Wheeler-Transform (BWT), a reversible string transformation,...
research
04/03/2020

Enumeration of LCP values, LCP intervals and Maximal repeats in BWT-runs Bounded Space

Lcp-values, lcp-intervals, and maximal repeats are powerful tools in var...
research
07/16/2020

Substring Complexity in Sublinear Space

Shannon's entropy is a definitive lower bound for statistical compressio...
research
02/14/2023

Compressibility-Aware Quantum Algorithms on Strings

Sublinear time quantum algorithms have been established for many fundame...
research
08/10/2019

A theory of incremental compression

The ability to find short representations, i.e. to compress data, is cru...

Please sign up or login with your details

Forgot password? Click here to reset