Resolution of the Burrows-Wheeler Transform Conjecture

10/23/2019
by   Dominik Kempa, et al.
0

Burrows-Wheeler Transform (BWT) is an invertible text transformation that permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the main component of some of the most popular lossless compression methods as well as of compressed indexes, central in modern bioinformatics. The compression ratio of BWT-based compressors, such as bzip2, is quantified by the number r of maximal equal-letter runs in the BWT. This is also (up to polylog n factors, where n is the length of the text) the space used by the state-of-the-art BWT-based indexes, such as the recent r-index [Gagie et al., SODA 2018]. The output size of virtually every known compression method is known to be either within a polylog n factor from z, the size of Lempel-Ziv (LZ77) parsing of the text, or significantly larger (by a n^ϵ factor for ϵ > 0). The value of r has resisted, however, all attempts and until now, no non-trivial upper bounds on r were known. In this paper, we show that every text satisfies r=O(zlog^2 n). This result has a number of immediate implications: (1) it proves that a large body of work related to BWT automatically applies to the so-far disjoint field of Lempel–Ziv indexing and compression, e.g., it is possible to obtain full functionality of the suffix tree and the suffix array in O(z polylog n) space; (2) it lets us relate the number of runs in the BWT of the text and its reverse; (3) it shows that many fundamental text processing tasks can be solved in the optimal time assuming that the text is compressible by a sufficiently large polylog n factor using LZ77.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2020

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

The Burrows-Wheeler-Transform (BWT), a reversible string transformation,...
research
11/03/2018

Optimal Rank and Select Queries on Dictionary-Compressed Text

Let γ be the size of a string attractor for a string S of length n over ...
research
11/20/2018

Avoiding conjugacy classes on the 5-letter alphabet

We construct an infinite word w over the 5-letter alphabet such that for...
research
04/27/2020

In-Place Bijective Burrows-Wheeler Transforms

One of the most well-known variants of the Burrows-Wheeler transform (BW...
research
03/29/2021

A Fast and Small Subsampled R-index

The r-index (Gagie et al., JACM 2020) represented a breakthrough in comp...
research
04/05/2018

On Undetected Redundancy in the Burrows-Wheeler Transform

The Burrows-Wheeler-Transform (BWT) is an invertible permutation of a te...
research
05/03/2022

Computing Maximal Unique Matches with the r-index

In recent years, pangenomes received increasing attention from the scien...

Please sign up or login with your details

Forgot password? Click here to reset