Linear-time Computation of DAWGs, Symmetric Indexing Structures, and MAWs for Integer Alphabets

07/04/2023
by   Yuta Fujishige, et al.
0

The directed acyclic word graph (DAWG) of a string y of length n is the smallest (partial) DFA which recognizes all suffixes of y with only O(n) nodes and edges. In this paper, we show how to construct the DAWG for the input string y from the suffix tree for y, in O(n) time for integer alphabets of polynomial size in n. In so doing, we first describe a folklore algorithm which, given the suffix tree for y, constructs the DAWG for the reversed string of y in O(n) time. Then, we present our algorithm that builds the DAWG for y in O(n) time for integer alphabets, from the suffix tree for y. We also show that a straightforward modification to our DAWG construction algorithm leads to the first O(n)-time algorithm for constructing the affix tree of a given string y over an integer alphabet. Affix trees are a text indexing structure supporting bidirectional pattern searches. We then discuss how our constructions can lead to linear-time algorithms for building other text indexing structures, such as linear-size suffix tries and symmetric CDAWGs in linear time in the case of integer alphabets. As a further application to our O(n)-time DAWG construction algorithm, we show that the set 𝖬𝖠𝖶(y) of all minimal absent words (MAWs) of y can be computed in optimal, input- and output-sensitive O(n + |𝖬𝖠𝖶(y)|) time and O(n) working space for integer alphabets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/11/2023

Linear Time Online Algorithms for Constructing Linear-size Suffix Trie

The suffix trees are fundamental data structures for various kinds of st...
research
05/03/2020

Efficiently Testing Simon's Congruence

Simon's congruence ∼_k is defined as follows: two words are ∼_k-equivale...
research
11/16/2019

Constructing the Bijective BWT

The Burrows-Wheeler transform (BWT) is a permutation whose applications ...
research
09/26/2022

Inferring strings from position heaps in linear time

Position heaps are index structures of text strings used for the exact s...
research
12/20/2021

String Sampling with Bidirectional String Anchors

The minimizers sampling mechanism is a popular mechanism for string samp...
research
07/28/2019

Minimal Absent Words in Rooted and Unrooted Trees

We extend the theory of minimal absent words to (rooted and unrooted) tr...
research
04/09/2019

Suffix Trees, DAWGs and CDAWGs for Forward and Backward Tries

The suffix tree, DAWG, and CDAWG are fundamental indexing structures of ...

Please sign up or login with your details

Forgot password? Click here to reset