FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns

10/04/2021
by   Jin Jie Deng, et al.
0

The run-length compressed Burrows-Wheeler transform (RLBWT) used in conjunction with the backward search introduced in the FM index is the centerpiece of most compressed indexes working on highly-repetitive data sets like biological sequences. Compared to grammar indexes, the size of the RLBWT is often much bigger, but queries like counting the occurrences of long patterns can be done much faster than on any existing grammar index so far. In this paper, we combine the virtues of a grammar with the RLBWT by building the RLBWT on top of a special grammar based on induced suffix sorting. Our experiments reveal that our hybrid approach outperforms the classic RLBWT with respect to the index sizes, and with respect to query times on biological data sets for sufficiently long patterns.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

Grammar Index By Induced Suffix Sorting

Pattern matching is the most central task for text indices. Most recent ...
research
04/01/2020

Grammar-Compressed Indexes with Logarithmic Search Time

Let a text T[1..n] be the only string generated by a context-free gramma...
research
05/10/2023

Acceleration of FM-index Queries Through Prefix-free Parsing

FM-indexes are a crucial data structure in DNA alignment, for example, b...
research
06/09/2020

Faster Queries on BWT-runs Compressed Indexes

Although a significant number of compressed indexes for highly repetitiv...
research
01/14/2020

Simulation computation in grammar-compressed graphs

Like [1], we present an algorithm to compute the simulation of a query p...
research
11/20/2019

Grammar Compressed Sequences with Rank/Select Support

Sequence representations supporting not only direct access to their symb...
research
06/10/2020

Tailoring r-index for metagenomics

A basic problem in metagenomics is to assign a sequenced read to the cor...

Please sign up or login with your details

Forgot password? Click here to reset