Teaching the Burrows-Wheeler Transform via the Positional Burrows-Wheeler Transform

08/21/2022
by   Travis Gagie, et al.
0

The Burrows-Wheeler Transform (BWT) is often taught in undergraduate courses on algorithmic bioinformatics, because it underlies the FM-index and thus important tools such as Bowtie and BWA. Its admirers consider the BWT a thing of beauty but, despite thousands of pages being written about it over nearly thirty years, to undergraduates seeing it for the first time it still often seems like magic. Some who persevere are later shown the Positional BWT (PBWT), which was published twenty years after the BWT. In this paper we argue that the PBWT should be taught before the BWT. We first use the PBWT's close relation to a right-to-left radix sort to explain how to use it as a fast and space-efficient index for positional search on a set of strings (that is, given a pattern and a position, quickly list the strings containing that pattern starting in that position). We then observe that prefix search (listing all the strings that start with the pattern) is an easy special case of positional search, and that prefix search on the suffixes of a single string is equivalent to substring search in that string (listing all the starting positions of occurrences of the pattern in the string). Storing naïvely a PBWT of the suffixes of a string is space-inefficient but, in even reasonably small examples, most of its columns are nearly the same. It is not difficult to show that if we store a PBWT of the cyclic shifts of the string, instead of its suffixes, then all the columns are exactly the same – and equal to the BWT of the string. Thus we can teach the BWT and the FM-index via the PBWT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/03/2018

Right-to-left online construction of parameterized position heaps

Two strings of equal length are said to parameterized match if there is ...
research
03/14/2019

The Parameterized Position Heap of a Trie

Let Σ and Π be disjoint alphabets of respective size σ and π. Two string...
research
02/28/2019

Sequentiality of String-to-Context Transducers

Transducers extend finite state automata with outputs, and describe tran...
research
09/19/2022

MARIA: Multiple-alignment r-index with aggregation

There now exist compact indexes that can efficiently list all the occurr...
research
02/04/2019

A New Class of Searchable and Provably Highly Compressible String Transformations

The Burrows-Wheeler Transform is a string transformation that plays a fu...
research
08/11/2023

Breaking a Barrier in Constructing Compact Indexes for Parameterized Pattern Matching

A parameterized string (p-string) is a string over an alphabet (Σ_s∪Σ_p)...
research
04/09/2020

Pattern Discovery in Colored Strings

We consider the problem of identifying patterns of interest in colored s...

Please sign up or login with your details

Forgot password? Click here to reset