Fast Prefix Search in Little Space, with Applications

04/12/2018
by   Djamal Belazzougui, et al.
0

It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution. Traditionally, prefix search is solved by data structures that are also dictionaries---they actually contain the strings in S. For very large collections stored in slow-access memory, we propose much more compact data structures that support weak prefix searches---they return the ranks of matching strings provided that some string in S starts with the given prefix. In fact, we show that our most space-efficient data structure is asymptotically space-optimal. Previously, data structures such as String B-trees (and more complicated cache-oblivious string data structures) have implicitly supported weak prefix queries, but they all have query time that grows logarithmically with the size of the string collection. In contrast, our data structures are simple, naturally cache-efficient, and have query time that depends only on the length of the prefix, all the way down to constant query time for strings that fit in one machine word. We give several applications of weak prefix searches, including exact prefix counting and approximate counting of tuples matching conjunctive prefix conditions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

Top Tree Compression of Tries

We present a compressed representation of tries based on top tree compre...
research
03/29/2019

Data structures to represent sets of k-long DNA sequences

The analysis of biological sequencing data has been one of the biggest a...
research
05/02/2020

Pointer-Machine Algorithms for Fully-Online Construction of Suffix Trees and DAWGs on Multiple Strings

We deal with the problem of maintaining the suffix tree indexing structu...
research
10/18/2019

b-Bit Sketch Trie: Scalable Similarity Search on Integer Sketches

Recently, randomly mapping vectorial data to strings of discrete symbols...
research
06/14/2019

Dynamic Path-Decomposed Tries

A keyword dictionary is an associative array whose keys are strings. Rec...
research
10/14/2020

Contextual Pattern Matching

The research on indexing repetitive string collections has focused on th...
research
10/05/2022

Double-Ended Palindromic Trees: A Linear-Time Data Structure and Its Applications

The palindromic tree (a.k.a. eertree) is a linear-size data structure th...

Please sign up or login with your details

Forgot password? Click here to reset