Engineering Rank/Select Data Structures for Big-Alphabet Strings

05/23/2023
by   Diego Arroyuelo, et al.
0

Big-alphabet strings are common in several scenarios such as information retrieval and natural-language processing. The efficient storage and processing of such strings usually introduces several challenges that are not witnessed in smaller-alphabets strings. This paper studies the efficient implementation of one of the most effective approaches for dealing with big-alphabet strings, namely the alphabet-partitioning approach. The main contribution is a compressed data structure that supports the fundamental operations rank and select efficiently. We show experimental results that indicate that our implementation outperforms the current realizations of the alphabet-partitioning approach. In particular, the time for operation select can be improved by about 80 alphabet-partitioning schemes. We also show the impact of our data structure on several applications, like the intersection of inverted lists (where improvements of up to 60 representation of run-length compressed strings, and the distributed-computation processing of rank and select operations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Top Tree Compression of Tries

We present a compressed representation of tries based on top tree compre...
research
08/04/2020

A Data-Structure for Approximate Longest Common Subsequence of A Set of Strings

Given a set of k strings I, their longest common subsequence (LCS) is th...
research
04/17/2022

An n H_k-compressed searchable partial-sums data structure for static sequences of sublogarithmic positive integers

We consider the space needed to store a searchable partial-sums data str...
research
11/20/2019

Grammar Compressed Sequences with Rank/Select Support

Sequence representations supporting not only direct access to their symb...
research
06/02/2022

Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors

Bit vectors are fundamental building blocks of many succinct data struct...
research
06/25/2018

Handling Massive N-Gram Datasets Efficiently

This paper deals with the two fundamental problems concerning the handli...
research
04/25/2019

SafeStrings: Representing Strings as Structured Data

Strings are ubiquitous in code. Not all strings are created equal, some ...

Please sign up or login with your details

Forgot password? Click here to reset