Closing in on Time and Space Optimal Construction of Compressed Indexes

by   Dominik Kempa, et al.

Fast and space-efficient construction of compressed indexes such as compressed suffix array (CSA) and compressed suffix tree (CST) has been a major open problem until recently, when Belazzougui [STOC 2014] described an algorithm able to build both of these data structures in O(n) (randomized; later improved by the same author to deterministic) time and O(n/_σn) words of space, where n is the length of the string and σ is the alphabet size. Shortly after, Munro et al. [SODA 2017] described another deterministic construction using the same time and space based on different techniques. It has remained an elusive open problem since then whether these bounds are optimal or, assuming non-wasteful text encoding, the construction achieving O(n / _σn) time and space is possible. In this paper we provide a first algorithm that can achieve these bounds. We show a deterministic algorithm that constructs CSA and CST using O(n / _σ n + r ^11 n) time and O(n / _σ n + r ^10 n) working space, where r is the number of runs in the Burrows-Wheeler transform of the input text. As one of the applications of our techniques we show how to compute the LZ77 parsing in O(n/_σn + r^11n+z^10n) time and O(n/_σn + r^9n) space, which is optimal for highly repetitive strings.


page 1

page 2

page 3

page 4


Fast Lempel-Ziv Decompression in Linear Space

We consider the problem of decompressing the Lempel-Ziv 77 representatio...

On Abelian Longest Common Factor with and without RLE

We consider the Abelian longest common factor problem in two scenarios: ...

Online LZ77 Parsing and Matching Statistics with RLBWTs

Lempel-Ziv 1977 (LZ77) parsing, matching statistics and the Burrows-Whee...

String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure

Burrows-Wheeler transform (BWT) is an invertible text transformation tha...

Space-Efficient Construction of Compressed Suffix Trees

We show how to build several data structures of central importance to st...

Optimal Construction of Hierarchical Overlap Graphs

Genome assembly is a fundamental problem in Bioinformatics, where for a ...

Efficient constructions of the Prefer-same and Prefer-opposite de Bruijn sequences

The greedy Prefer-same de Bruijn sequence construction was first present...

Please sign up or login with your details

Forgot password? Click here to reset