Linear-time Minimization of Wheeler DFAs

by   Jarno Alanko, et al.

Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is playing an important role in the emerging field of compressed data structures: as opposed to general automata, WDFAs can be stored in just logσ + O(1) bits per edge, σ being the alphabet's size, and support optimal-time pattern matching queries on the substring closure of the language they recognize. An important step to achieve further compression is minimization. When the input 𝒜 is a general deterministic finite-state automaton (DFA), the state-of-the-art is represented by the classic Hopcroft's algorithm, which runs in O(|𝒜|log |𝒜|) time. This algorithm stands at the core of the only existing minimization algorithm for Wheeler DFAs, which inherits its complexity. In this work, we show that the minimum WDFA equivalent to a given input WDFA can be computed in linear O(|𝒜|) time. When run on de Bruijn WDFAs built from real DNA datasets, an implementation of our algorithm reduces the number of nodes from 14 million nodes per second.


page 1

page 2

page 3

page 4


Efficiently Testing Simon's Congruence

Simon's congruence ∼_k is defined as follows: two words are ∼_k-equivale...

Faster Compression of Deterministic Finite Automata

Deterministic finite automata (DFA) are a classic tool for high throughp...

Graphs can be succinctly indexed for pattern matching in O(|E|^2 + |V|^5 / 2) time

For the first time we provide a succinct pattern matching index for arbi...

On Indexing and Compressing Finite Automata

An index for a finite automaton is a powerful data structure that suppor...

Succinct Representation for (Non)Deterministic Finite Automata

Deterministic finite automata are one of the simplest and most practical...

New Linear-time Algorithm for SubTree Kernel Computation based on Root-Weighted Tree Automata

Tree kernels have been proposed to be used in many areas as the automati...

Please sign up or login with your details

Forgot password? Click here to reset