Optimal Rank and Select Queries on Dictionary-Compressed Text

11/03/2018
by   Nicola Prezza, et al.
0

Let γ be the size of a string attractor for a string S of length n over an alphabet of size σ. By known reductions, we can build a string attractor such that γ is upper-bounded by the number r of equal-letter runs in the Burrows-Wheeler transform of S, the size z of its Lempel-Ziv 77 factorization, the sizes g and b of a grammar and a Macro Scheme generating S, or the size e of the smallest compact automaton recognizing all S's suffixes. It is known that, within O (γ^ϵ n(n/γ) / n) words of space, random access on S can be performed in optimal O((n/γ)/ n) time. In this paper we show that, within O (σγ^ϵ n(n/γ)/ n) words of space, also rank and select queries can be supported in O((n/γ)/ n) time. We provide lower bounds showing that, when σ∈ O(polylog n), these space-time upper bounds are optimal. Our structures match the lower bounds also on most dictionary compression schemes. This is the first result showing that rank and select queries on the text can be supported efficiently in a space bounded by the size of a compressed Burrows-Wheeler transform or a general Macro Scheme, and improves existing bounds for LZ77-compressed text by a n time-factor in select queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2018

Universal Compressed Text Indexing

The rise of repetitive datasets has lately generated a lot of interest i...
research
10/30/2017

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high...
research
10/23/2019

Resolution of the Burrows-Wheeler Transform Conjecture

Burrows-Wheeler Transform (BWT) is an invertible text transformation tha...
research
11/13/2020

Substring Query Complexity of String Reconstruction

Suppose an oracle knows a string S that is unknown to us and we want to ...
research
11/20/2019

Grammar Compressed Sequences with Rank/Select Support

Sequence representations supporting not only direct access to their symb...
research
07/17/2023

Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data

Grammar compression is a general compression framework in which a string...
research
03/02/2018

Fine-Grained Complexity of Analyzing Compressed Data: Quantifying Improvements over Decompress-And-Solve

Can we analyze data without decompressing it? As our data keeps growing,...

Please sign up or login with your details

Forgot password? Click here to reset