Suffix Trees, DAWGs and CDAWGs for Forward and Backward Tries

04/09/2019
by   Shunsuke Inenaga, et al.
0

The suffix tree, DAWG, and CDAWG are fundamental indexing structures of a string, with a number of applications in bioinformatics, information retrieval, data mining, etc. An edge-labeled rooted tree (trie) is a natural generalization of a string. Breslauer [TCS 191(1-2): 131-144, 1998] proposed the suffix tree for a backward trie, where the strings in the trie are read in the leaf-to-root direction. In contrast to a backward trie, we call a usual trie as a forward trie. Despite a few follow-up works after Breslauer's paper, indexing forward/backward tries is not well understood yet. In this paper, we show a full perspective on the sizes of indexing structures such as suffix trees, DAWGs, and CDAWGs for forward and backward tries. In particular, we show that the size of the DAWG for a forward trie with n nodes is Ω(σ n), where σ is the number of distinct characters in the trie. This becomes Ω(n^2) for a large alphabet. Still we show that there is a compact O(n)-space representation of the DAWG for a forward trie over any alphabet, and present an O(n σ)-time O(n)-space algorithm to construct such a representation of the DAWG for a growing forward trie.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2020

Pointer-Machine Algorithms for Fully-Online Construction of Suffix Trees and DAWGs on Multiple Strings

We deal with the problem of maintaining the suffix tree indexing structu...
research
07/04/2023

Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets

The directed acyclic word graph (DAWG) of a string y of length n is the ...
research
01/11/2023

Linear Time Online Algorithms for Constructing Linear-size Suffix Trie

The suffix trees are fundamental data structures for various kinds of st...
research
11/29/2021

Bounding the Last Mile: Efficient Learned String Indexing

We introduce the RadixStringSpline (RSS) learned index structure for eff...
research
11/30/2022

Gapped String Indexing in Subquadratic Space and Sublinear Query Time

In Gapped String Indexing, the goal is to compactly represent a string S...
research
02/14/2016

Large-Scale Reasoning with OWL

With the growth of the Semantic Web in size and importance, more and mor...
research
11/09/2022

Limit theorems for forward and backward processes of numbers of non-empty urns in infinite urn schemes

We study the joint asymptotics of forward and backward processes of numb...

Please sign up or login with your details

Forgot password? Click here to reset