Efficient construction of the extended BWT from grammar-compressed DNA sequencing reads

We present an algorithm for building the extended BWT (eBWT) of a string collection from its grammar-compressed representation. Our technique exploits the string repetitions captured by the grammar to boost the computation of the eBWT. Thus, the more repetitive the collection is, the lower are the resources we use per input symbol. We rely on a new grammar recently proposed at DCC'21 whose nonterminals serve as building blocks for inducing the eBWT. A relevant application for this idea is the construction of self-indexes for analyzing sequencing reads – massive and repetitive string collections of raw genomic data. Self-indexes have become increasingly popular in Bioinformatics as they can encode more information in less space. Our efficient eBWT construction opens the door to perform accurate bioinformatic analyses on more massive sequence datasets, which are not tractable with current eBWT construction techniques.

READ FULL TEXT

page 1

page 3

page 9

page 11

page 13

page 15

research
11/13/2020

A grammar compressor for collections of reads with applications to the construction of the BWT

We describe a grammar for DNA sequencing reads from which we can compute...
research
04/01/2020

Grammar-Compressed Indexes with Logarithmic Search Time

Let a text T[1..n] be the only string generated by a context-free gramma...
research
03/18/2020

Grammar compression with probabilistic context-free grammar

We propose a new approach for universal lossless text compression, based...
research
05/24/2021

Fast and Space-Efficient Construction of AVL Grammars from the LZ77 Parsing

Grammar compression is, next to Lempel-Ziv (LZ77) and run-length Burrows...
research
04/12/2022

Efficient Construction of the BWT for Repetitive Text Using String Compression

We present a new semi-external algorithm that builds the Burrows-Wheeler...
research
02/28/2020

Learning Directly from Grammar Compressed Text

Neural networks using numerous text data have been successfully applied ...
research
05/16/2020

Quantum string comparison method

We propose a quantum string comparison method whose main building blocks...

Please sign up or login with your details

Forgot password? Click here to reset