Space-Efficient Computation of the LCP Array from the Burrows-Wheeler Transform

01/16/2019
by   Nicola Prezza, et al.
0

We show that the Longest Common Prefix Array of a text collection of total size n on alphabet [1, σ] can be computed from the Burrows-Wheeler transformed collection in O(n log σ) time using o(n log σ) bits of working space on top of the input and output. Our result improves (on small alphabets) and generalizes (to string collections) the previous solution from Beller et al., which required O(n) bits of extra working space. We also show how to merge the BWTs of two collections of total size n within the same time and space bounds. An engineered implementation of our first algorithm on DNA alphabet induces the LCP of a large (16 GiB) collection of short (100 bases) reads at a rate of 2.92 megabases per second using in total 1.5 Bytes per base in RAM. Our second algorithm merges the BWTs of two short-reads collections of 8 GiB each at a rate of 1.7 megabases per second and uses 0.625 Bytes per base in RAM. An extension of this algorithm that computes also the LCP array of the merged collection processes the data at a rate of 1.48 megabases per second and uses 1.625 Bytes per base in RAM.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2019

Space-Efficient Construction of Compressed Suffix Trees

We show how to build several data structures of central importance to st...
research
05/17/2018

External memory BWT and LCP computation for sequence collections with applications

We propose an external memory algorithm for the computation of the BWT a...
research
11/08/2019

Space Efficient Construction of Lyndon Arrays in Linear Time

We present the first linear time algorithm to construct the 2n-bit versi...
research
05/30/2019

Inducing the Lyndon Array

In this paper we propose a variant of the induced suffix sorting algorit...
research
12/02/2022

Computing the optimal BWT of very large string collections

It is known that the exact form of the Burrows-Wheeler-Transform (BWT) o...
research
06/24/2015

Optimize Unsynchronized Garbage Collection in an SSD Array

Solid state disks (SSDs) have advanced to outperform traditional hard dr...
research
06/09/2023

Space-time Trade-offs for the LCP Array of Wheeler DFAs

Recently, Conte et al. generalized the longest-common prefix (LCP) array...

Please sign up or login with your details

Forgot password? Click here to reset