Faster Wavelet Trees with Quad Vectors

02/18/2023
by   Matteo Ceregini, et al.
0

Given a text, rank and select queries return the number of occurrences of a character up to a position (rank) or the position of a character with a given rank (select). These queries have applications in, e.g., compression, computational geometry, and pattern matching in the form of the backwards search – the backbone of many compressed full-text indices. A wavelet tree is a compact data structure that for a text of length n over an alphabet of size σ requires only n⌈logσ⌉(1+o(1)) bits of space and can answer rank and select queries in Θ(logσ) time. Wavelet trees are used in the applications described above. In this paper, we show how to improve query performance of wavelet trees by using a 4-ary tree instead of a binary tree as basis of the wavelet tree. To this end, we present a space-efficient rank and select data structure for quad vectors. The 4-ary tree layout of a wavelet tree helps to halve the number of cache misses during queries and thus reduces the query latency. Our experimental evaluation shows that our 4-ary wavelet tree can improve the latency of rank and select queries by a factor of ≈ 2 compared to the wavelet tree implementations contained in the widely used Succinct Data Structure Library (SDSL).

READ FULL TEXT

page 12

page 14

research
06/02/2022

Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors

Bit vectors are fundamental building blocks of many succinct data struct...
research
11/08/2017

Run Compressed Rank/Select for Large Alphabets

Given a string of length n that is composed of r runs of letters from th...
research
02/19/2020

Translating Between Wavelet Tree and Wavelet Matrix Construction

The wavelet tree (Grossi et al. [SODA, 2003]) and wavelet matrix (Claude...
research
08/15/2023

Another virtue of wavelet forests?

A wavelet forest for a text T [1..n] over an alphabet σ takes n H_0 (T) ...
research
02/26/2020

Bitvectors with runs and the successor/predecessor problem

The successor and predecessor problem consists of obtaining the closest ...
research
02/02/2023

Optimal Heaviest Induced Ancestors

We revisit the Heaviest Induced Ancestors (HIA) problem that was introdu...
research
02/26/2020

Revisiting compact RDF stores based on k2-trees

We present a new compact representation to efficiently store and query l...

Please sign up or login with your details

Forgot password? Click here to reset