Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space

08/07/2023
by   Dominik Kempa, et al.
0

In the last decades, the necessity to process massive amounts of textual data fueled the development of compressed text indexes: data structures efficiently answering queries on a given text while occupying space proportional to the compressed representation of the text. A widespread phenomenon in compressed indexing is that more powerful queries require larger indexes. For example, random access, the most basic query, can be supported in O(δlognlogσ/δlog n) space (where n is the text length, σ is the alphabet size, and δ is text's substring complexity), which is the asymptotically smallest space to represent a string, for all n, σ, and δ (Kociumaka, Navarro, Prezza; IEEE Trans. Inf. Theory 2023). The other end of the hierarchy is occupied by indexes supporting the powerful suffix array (SA) queries. The currently smallest one takes O(rlogn/r) space, where r≥δ is the number of runs in the BWT of the text (Gagie, Navarro, Prezza; J. ACM 2020). We present a new compressed index that needs only O(δlognlogσ/δlog n) space to support SA functionality in O(log^4+ϵ n) time. This collapses the hierarchy of compressed data structures into a single point: The space required to represent the text is simultaneously sufficient for efficient SA queries. Our result immediately improves the space complexity of dozens of algorithms, which can now be executed in optimal compressed space. In addition, we show how to construct our index in O(δ polylog n) time from the LZ77 parsing of the text. For highly repetitive texts, this is up to exponentially faster than the previously best algorithm. To obtain our results, we develop numerous techniques of independent interest, including the first O(δlognlogσ/δlog n)-size index for LCE queries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Optimal-Time Queries on BWT-runs Compressed Indexes

Although a significant number of compressed indexes for highly repetitiv...
research
06/24/2021

Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays

The suffix array, describing the lexicographic order of suffixes of a gi...
research
08/04/2023

Optimally Computing Compressed Indexing Arrays Based on the Compact Directed Acyclic Word Graph

In this paper, we present the first study of the computational complexit...
research
11/11/2022

Efficient Immediate-Access Dynamic Indexing

In a dynamic retrieval system, documents must be ingested as they arrive...
research
06/09/2020

Faster Queries on BWT-runs Compressed Indexes

Although a significant number of compressed indexes for highly repetitiv...
research
11/19/2020

Subpath Queries on Compressed Graphs: a Survey

Text indexing is a classical algorithmic problem that has been studied f...
research
05/26/2023

CARAMEL: A Succinct Read-Only Lookup Table via Compressed Static Functions

Lookup tables are a fundamental structure in many data processing and sy...

Please sign up or login with your details

Forgot password? Click here to reset