Optimal Substring-Equality Queries with Applications to Sparse Text Indexing

03/05/2018
by   Nicola Prezza, et al.
0

We consider the problem of encoding a string of length n from an alphabet [0,σ-1] so that access and substring-equality queries (that is, determining the equality of any two substrings) can be answered efficiently. A clear lower bound on the size of any prefix-free encoding of this kind is nσ + Θ( (nσ)) bits. We describe a new encoding matching this lower bound when σ≤ n^O(1) while supporting queries in optimal O(1)-time in the cell-probe model, and show how to extend the result to the word-RAM model using Θ(^2n) bits of additional space. Using our new encoding, we obtain the first optimal-space algorithms for several string-processing problems in the word-RAM model with rewritable input. In particular, we describe the first in-place algorithm computing the LCP array in O(n n) expected time and the first in-place Monte Carlo solutions to the sparse suffix sorting, sparse LCP array construction, and suffix selection problems. Our algorithms are also the first running in sublinear time for small enough sets of input suffixes. Combining these solutions, we obtain the first optimal-space and sublinear-time algorithm for building the sparse suffix tree.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2019

Indexing Graph Search Trees and Applications

We consider the problem of compactly representing the Depth First Search...
research
06/24/2021

Breaking the O(n)-Barrier in the Construction of Compressed Suffix Arrays

The suffix array, describing the lexicographic order of suffixes of a gi...
research
09/19/2018

Encoding two-dimensional range top-k queries revisited

We consider the problem of encoding two-dimensional arrays, whose elemen...
research
04/08/2019

String Synchronizing Sets: Sublinear-Time BWT Construction and Optimal LCE Data Structure

Burrows-Wheeler transform (BWT) is an invertible text transformation tha...
research
11/04/2019

Optimal Adaptive Detection of Monotone Patterns

We investigate adaptive sublinear algorithms for detecting monotone patt...
research
02/14/2023

Compressibility-Aware Quantum Algorithms on Strings

Sublinear time quantum algorithms have been established for many fundame...
research
07/16/2020

Substring Complexity in Sublinear Space

Shannon's entropy is a definitive lower bound for statistical compressio...

Please sign up or login with your details

Forgot password? Click here to reset