Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph

08/04/2023
by   Hiroki Arimura, et al.
0

In this paper, we present the first study of the computational complexity of converting an automata-based text index structure, called the Compact Directed Acyclic Word Graph (CDAWG), of size e for a text T of length n into other text indexing structures for the same text, suitable for highly repetitive texts: the run-length BWT of size r, the irreducible PLCP array of size r, and the quasi-irreducible LPF array of size e, as well as the lex-parse of size O(r) and the LZ77-parse of size z, where r, z ≤ e. As main results, we showed that the above structures can be optimally computed from either the CDAWG for T stored in read-only memory or its self-index version of size e without a text in O(e) worst-case time and words of working space. To obtain the above results, we devised techniques for enumerating a particular subset of suffixes in the lexicographic and text orders using the forward and backward search on the CDAWG by extending the results by Belazzougui et al. in 2015.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2018

Universal Compressed Text Indexing

The rise of repetitive datasets has lately generated a lot of interest i...
research
08/07/2023

Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space

In the last decades, the necessity to process massive amounts of textual...
research
12/20/2017

Text Indexing and Searching in Sublinear Time

We introduce the first index that can be built in o(n) time for a text o...
research
11/20/2022

Optimal resizable arrays

A resizable array is an array that can grow and shrink by the addition o...
research
12/05/2017

Optimal Quasi-Gray Codes: Does the Alphabet Matter?

A quasi-Gray code of dimension n and length ℓ over an alphabet Σ is a se...
research
08/29/2023

Chunked Lists versus Extensible Arrays for Text Inversion

In our 2017 work on in-memory list-based text inversion [Hawking and Bil...
research
03/03/2023

On Sensitivity of Compact Directed Acyclic Word Graphs

Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a...

Please sign up or login with your details

Forgot password? Click here to reset