On Indexing and Compressing Finite Automata

07/15/2020
by   Nicola Cotumaccio, et al.
0

An index for a finite automaton is a powerful data structure that supports locating paths labeled with a query pattern, thus solving pattern matching on the underlying regular language. In this paper, we solve the long-standing problem of indexing arbitrary finite automata. Our solution consists in finding a partial co-lexicographic order of the states and proving, as in the total order case, that states reached by a given string form one interval on the partial order, thus enabling indexing. We provide a lower bound stating that such an interval requires O(p) words to be represented, p being the order's width (i.e. the size of its largest antichain). Indeed, we show that p determines the complexity of several fundamental problems on finite automata: (i) Letting σ be the alphabet size, we provide an encoding for NFAs using ⌈logσ⌉ + 2⌈log p⌉ + 2 bits per transition and a smaller encoding for DFAs using ⌈logσ⌉ + ⌈log p⌉ + 2 bits per transition. This is achieved by generalizing the Burrows-Wheeler transform to arbitrary automata. (ii) We show that indexed pattern matching can be solved in Õ(m· p^2) query time on NFAs. (iii) We provide a polynomial-time algorithm to index DFAs, while matching the optimal value for p. On the other hand, we prove that the problem is NP-hard on NFAs. (iv) We show that, in the worst case, the classic powerset construction algorithm for NFA determinization generates an equivalent DFA of size 2^p(n-p+1)-1, where n is the number of NFA's states.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

Graphs can be succinctly indexed for pattern matching in O(|E|^2 + |V|^5 / 2) time

For the first time we provide a succinct pattern matching index for arbi...
research
07/14/2023

Random Wheeler Automata

Wheeler automata were introduced in 2017 as a tool to generalize existin...
research
05/17/2019

Simulations in Rank-Based Büchi Automata Complementation

The long search for an optimal complementation construction for Büchi au...
research
10/07/2004

Automated Pattern Detection--An Algorithm for Constructing Optimally Synchronizing Multi-Regular Language Filters

In the computational-mechanics structural analysis of one-dimensional ce...
research
07/10/2019

Sparse Regular Expression Matching

We present the first algorithm for regular expression matching that can ...
research
11/03/2021

Linear-time Minimization of Wheeler DFAs

Wheeler DFAs (WDFAs) are a sub-class of finite-state automata which is p...
research
05/09/2023

Sorting Finite Automata via Partition Refinement

Wheeler nondeterministic finite automata (WNFAs) were introduced as a ge...

Please sign up or login with your details

Forgot password? Click here to reset