A theoretical and experimental analysis of BWT variants for string collections

02/26/2022
by   Davide Cenzato, et al.
0

The extended Burrows-Wheeler-Transform (eBWT), introduced by Mantaci et al. [Theor. Comput. Sci., 2007], is a generalization of the Burrows-Wheeler-Transform (BWT) to multisets of strings. While the original BWT is based on the lexicographic order, the eBWT uses the omega-order, which differs from the lexicographic order in important ways. A number of tools are available that compute the BWT of string collections; however, the data structures they generate in most cases differ from the one originally defined, as well as from each other. In this paper, we review the differences between these BWT variants, both from a theoretical and from a practical point of view, comparing them on several real-life datasets with different characteristics. We find that the differences can be extensive, depending on the dataset characteristics, and are largest on collections of many highly similar short sequences. The widely-used parameter r, the number of runs of the BWT, also shows notable variation between the different BWT variants; on our datasets, it varied by a multiplicative factor of up to 4.2.

READ FULL TEXT
research
12/02/2022

Computing the optimal BWT of very large string collections

It is known that the exact form of the Burrows-Wheeler-Transform (BWT) o...
research
03/25/2019

Algorithms to compute the Burrows-Wheeler Similarity Distribution

The Burrows-Wheeler transform (BWT) is a well studied text transformatio...
research
04/06/2020

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it p...
research
08/19/2020

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

The Burrows-Wheeler-Transform (BWT), a reversible string transformation,...
research
05/11/2022

A New Class of String Transformations for Compressed Text Indexing

Introduced about thirty years ago in the field of Data Compression, the ...
research
09/19/2018

The Read-Optimized Burrows-Wheeler Transform

The advent of high-throughput sequencing has resulted in massive genomic...
research
02/04/2019

A New Class of Searchable and Provably Highly Compressible String Transformations

The Burrows-Wheeler Transform is a string transformation that plays a fu...

Please sign up or login with your details

Forgot password? Click here to reset