Faster Queries on BWT-runs Compressed Indexes

06/09/2020
by   Takaaki Nishimoto, et al.
0

Although a significant number of compressed indexes for highly repetitive strings have been proposed thus far, developing compressed indexes that support faster queries remains a challenge. Run-length Burrows-Wheeler transform (RLBWT) is a lossless data compression by a reversible permutation of an input string and run-length encoding, and it has become a popular research topic in string processing. Recently, Gagie et al. presented r-index, an efficient compressed index on RLBWT whose space usage does not depend on text length. In this paper, we present a new compressed index on RLBWT, which we call r-index-f, in which r-index is improved for faster locate queries. We introduce a novel division of RLBWT into blocks, which we call balanced BWT-sequence as follows: the RLBWT of a string is divided into several blocks, and a parent-child relationship between each pair of blocks is defined. In addition, we present a novel backward search algorithm on the balanced BWT-sequences, resulting in faster locate queries of r-index-f. We also present new algorithms for solving the queries of count query, extract query, decompression and prefix search on r-index-f.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2020

Optimal-Time Queries on BWT-runs Compressed Indexes

Although a significant number of compressed indexes for highly repetitiv...
research
04/02/2020

On Locating Paths in Compressed Cardinal Trees

A compressed index is a data structure representing a text within compre...
research
08/07/2023

Collapsing the Hierarchy of Compressed Data Structures: Suffix Arrays in Optimal Compressed Space

In the last decades, the necessity to process massive amounts of textual...
research
08/28/2019

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inve...
research
10/04/2021

FM-Indexing Grammars Induced by Suffix Sorting for Long Patterns

The run-length compressed Burrows-Wheeler transform (RLBWT) used in conj...
research
12/08/2021

RLBWT Tricks

Experts would probably have guessed that compressed sparse bitvectors we...
research
07/17/2020

Adaptive Exact Learning in a Mixed-Up World: Dealing with Periodicity, Errors and Jumbled-Index Queries in String Reconstruction

We study the query complexity of exactly reconstructing a string from ad...

Please sign up or login with your details

Forgot password? Click here to reset