Enumerating Regular Languages in Constant Delay
We study the task, for a given language L, of enumerating the (generally infinite) sequence of its words, without repetitions, while bounding the delay between two consecutive words. To allow for constant delay bounds, we assume a model where we produce each word by editing the preceding word with a small edit script, rather than writing out the word from scratch. In particular, this witnesses that the language is orderable, i.e., we can write its words as an infinite sequence such that the Levenshtein edit distance between any two consecutive words is bounded by a constant. For instance, (a+b)^* is orderable (with a variant of the Gray code), but a^* + b^* is not. We characterize which regular languages are enumerable in this sense, and show that this can be decided in PTIME in an input deterministic finite automaton (DFA) for the language. In fact, we show that, given a DFA A recognizing a language L, we can compute in PTIME automata A_1, …, A_t such that L is partitioned as L(A_1) ⊔…⊔ L(A_t) and every L(A_i) is orderable in this sense. Further, we show that this is optimal, i.e., we cannot partition L into less than t orderable languages. In the case where L is orderable, we show that the ordering can be computed as a constant-delay algorithm: specifically, the algorithm runs in a suitable pointer machine model, and produces a sequence of constant-length edit scripts to visit the words of L without repetitions, with constant delay between each script. In fact, we show that we can achieve this while only allowing the edit operations push and pop at the beginning and end of the word, which implies that the word can in fact be maintained in a double-ended queue. We also show results on the complexity of a related problem, and study the model where push-pop edits are only allowed at the end of the word.
READ FULL TEXT