Fully-Functional Suffix Trees and Optimal Text Searching in BWT-runs Bounded Space

09/08/2018
by   Travis Gagie, et al.
0

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is r, the number of runs in their Burrows-Wheeler Transforms (BWTs). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used O(r) space and was able to efficiently count the number of occurrences of a pattern of length m in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms ofr. Since then, a number of other indexes with space bounded by other measures of repetitiveness --- the number of phrases in the Lempel-Ziv parse, the sizeof the smallest grammar generating (only) the text, the size of the smallest automaton recognizing the text factors --- have been proposed for efficiently locating,but not directly counting, the occurrences of a pattern. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the occ occurrences efficiently within O(r) space (in loglogarithmic time each), and reaching optimal time O(m+occ) within O(r(n/r)) space, on a RAM machine with words of w=Ω( n) bits. Within O(r (n/r)) space, our index can also count in optimal time O(m).Raising the space to O(r w_σ(n/r)), we support count and locate in O( m(σ)/w) and O( m(σ)/w+occ) time, which is optimal in the packed setting and had not been obtained before in compressed space. We also describe a structure using O(r(n/r)) space that replaces the text and extracts any text substring ...

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2018

Faster Attractor-Based Indexes

String attractors are a novel combinatorial object encompassing most kno...
research
04/02/2020

On Locating Paths in Compressed Cardinal Trees

A compressed index is a data structure representing a text within compre...
research
12/20/2017

Text Indexing and Searching in Sublinear Time

We introduce the first index that can be built in o(n) time for a text o...
research
03/29/2021

A Fast and Small Subsampled R-index

The r-index (Gagie et al., JACM 2020) represented a breakthrough in comp...
research
04/11/2020

Grammar-compressed Self-index with Lyndon Words

We introduce a new class of straight-line programs (SLPs), named the Lyn...
research
11/08/2017

A compressed dynamic self-index for highly repetitive text collections

We present a novel compressed dynamic self-index for highly repetitive t...
research
11/19/2020

Subpath Queries on Compressed Graphs: a Survey

Text indexing is a classical algorithmic problem that has been studied f...

Please sign up or login with your details

Forgot password? Click here to reset