Compressed Indexing with Signature Grammars

11/22/2017
by   Anders Roy Christiansen, et al.
0

The compressed indexing problem is to preprocess a string S of length n into a compressed representation that supports pattern matching queries. That is, given a string P of length m report all occurrences of P in S. We present a data structure that supports pattern matching queries in O(m + occ ( n + ^ϵ z)) time using O(z (n / z)) space where z is the size of the LZ77 parse of S and ϵ > 0, when the alphabet is small or the compression ratio is at least polynomial. We also present two data structures for the general case; one where the space is increased by O(z z), and one where the query time changes from worst-case to expected. In all cases, the results improve the previously best known solutions. Notably, this is the first data structure that decides if P occurs in S in O(m) time using O(z(n/z)) space. Our results are mainly obtained by a novel combination of a randomized grammar construction algorithm with well known techniques relating pattern matching to 2D-range reporting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Top Tree Compression of Tries

We present a compressed representation of tries based on top tree compre...
research
09/26/2019

String Indexing with Compressed Patterns

Given a string S of length n, the classic string indexing problem is to ...
research
11/09/2021

Pattern Matching on Grammar-Compressed Strings in Linear Time

The most fundamental problem considered in algorithms for text processin...
research
04/16/2019

Compressed Indexes for Fast Search of Semantic Data

The sheer increase in volume of RDF data demands efficient solutions for...
research
02/04/2021

Gapped Indexing for Consecutive Occurrences

The classic string indexing problem is to preprocess a string S into a c...
research
11/08/2021

Graphs can be succinctly indexed for pattern matching in O(|E|^2 + |V|^5 / 2) time

For the first time we provide a succinct pattern matching index for arbi...
research
09/21/2020

Space/time-efficient RDF stores based on circular suffix sorting

In recent years, RDF has gained popularity as a format for the standardi...

Please sign up or login with your details

Forgot password? Click here to reset