An NFA is a quintuple , where is the finite set of states, is the finite alphabet, is the transition relation, is the set of initial states, and is the set of accepting states. We write to denote that . A finite sequence is called a run; it can be summarized as . The NFA recognizes the language . The NFA is a DFA if and for every and there is exactly one with . The NFA is a UFA if for every word there is at most one accepting run for , i.e., a run with and . Clearly, any DFA is a UFA.
We use the notation and to hide polylogarithmic factors; i.e., and .
2 UFA Complementation
Given two finite automata , recognizing languages , respectively, the state complexity of union (or intersection, or complement, etc.) is how many states may be needed for an automaton that recognizes (or , or , etc.). It depends on the type of automaton considered, such as NFAs, DFAs, or UFAs.
The state complexity has been well studied for various types of automata and language operations, see, e.g.,  and the references therein for some known results. For example, it was shown in  that complementing an NFA with states may require states. However, the state complexity for UFAs is not yet fully understood. It was shown only in 2018 by Raskin  that the state complexity for UFAs and complement is not polynomial:
Proposition 2.1 ().
For any there exists a UFA with states and unary alphabet (i.e., ) such that any NFA that recognizes has at least states.
This super-polynomial blowup (even for unary alphabet and even if the output automaton is allowed to be ambiguous) refuted a conjecture that it may be possible to complement UFAs with a polynomial blowup . A non-trivial upper bound (for general alphabets and outputting a UFA) was shown by Jirásek et al. :
Proposition 2.2 ().
Let be a UFA with states that recognizes a language . Then there exists a UFA with at most states that recognizes the language .
An almost tight analysis  of Jirásek et al.’s construction yields a slight improvement:
Proposition 2.3 ().
Let be a UFA with states that recognizes a language . Then there exists a UFA with at most states that recognizes the language .
In this section we improve the lower bound from Proposition 2.1:
For infinitely many there is a UFA with states and alphabet and finite such that any NFA that recognizes has at least states.
Like Proposition 2.1, the lower bound holds even for NFAs (not just UFAs) that recognize the complement language. Unlike Proposition 2.1, the lower bound in Theorem 2.4 uses a binary alphabet, i.e., .
2.1 Communication Complexity
Let be an -variate boolean formula in disjunctive normal form (DNF). DNF has width if every is a conjunction of at most literals. We call such a -DNF. For conjunctive normal form (CNF) formulas the width and -CNFs are defined analogously. DNF is said to be unambiguous if for every input at most one of the conjunctions evaluates to true, . For any boolean function define
as the least such that can be written as a -DNF;
as the least such that can be written as a -CNF;
as the least such that can be written as an unambiguous -DNF.
Note that . The following is a recent result :
Theorem 2.5 ([1, Theorem 1]).
For infinitely many there exists a boolean function with and .
In words, for infinitely many there is an unambiguous -DNF such that any equivalent CNF requires width . The bound is almost tight, as every unambiguous -DNF has an equivalent -CNF; see [6, Section 3].
We need results on two-party communication complexity; see  for the standard textbook. Consider a “two-party” function . A set (with and ) is called a rectangle. Rectangles cover a set if . For , the cover number is the least number of rectangles that cover . The nondeterministic (resp., co-nondeterministic) communication complexity of is defined as (resp., ). Note that . The nondeterministic communication complexity can be interpreted as the number of bits that two parties, holding inputs and , respectively, need to communicate in a nondeterministic (i.e., based on guessing and checking) protocol in order to establish that ; see [10, Chapter 2] for details.
The following is a “lifting” theorem, which allows us to transfer lower bounds on the DNF width of a boolean function to the nondeterministic communication complexity of a two-party function.
Theorem 2.6 ([6, Theorem 4]).
For any there is a function with such that for any function the function defined by
satisfies (and thus also ).
Finally, we need the following simple lemma:
If a two-party function admits an NFA with states, i.e., there is an NFA with states and , then .
Let be an NFA with . We show that is covered by at most rectangles. Indeed, equals
(Alternatively, in terms of a nondeterministic protocol, the first party, holding , produces a run for from an initial state to a state and then sends the name of , which takes bits, to the other party. The other party then produces a run for from to an accepting state.) ∎
2.2 Proof of Theorem 2.4
For , let be the function from Theorem 2.5, i.e., has an unambiguous -DNF with (hence, ) and . Let with and be the two-party functions from Theorem 2.6, with . The UFA from the statement of Theorem 2.4 will recognize .
First we argue that has an unambiguous DNF of small width. Indeed, and have unambiguous
-DNFs, which can be extracted from the deterministic decision tree of. By plugging these unambiguous -DNFs for and into the unambiguous -DNF for (and “multiplying out”), one obtains an unambiguous -DNF, say , for .
Over the variables of , there exist at most different conjunctions of at most literals. So consists of at most conjunctions. From we obtain a UFA that recognizes , as follows. Each initial state of corresponds to a conjunction in . When reading the input , the UFA checks that the corresponding assignment to the variables satisfies the conjunction represented by the initial state. This check requires at most states for each initial state. Thus, has at most states in total.
3 Separation of Regular Languages by UFAs
In [3, Conjecture 2], Colcombet conjectured that for any NFAs with there is a polynomial-sized UFA with and . Related separability questions are classical in formal language theory and have attracted renewed attention; see, e.g,  and the references therein. Separating automata have also been used recently to elegantly describe quasi-polynomial time algorithms for solving parity games in an automata theoretic framework; see [2, Chapter 3] and .
In this section we refute the above-mentioned conjecture by Colcombet, even when :
For any there are NFAs with states and alphabet and finite and such that any UFA that recognizes has at least states.
Loosely speaking, in our construction, NFAs recognize (sparse) set disjointness and its complement. For write and define for
Define also where is such that the th letter of is if and only if , and similarly for . Note that each contain times the letter . To prove Theorem 3.1 it suffices to prove the following lemma.
For any let . There are NFAs with states and alphabet and and . Any UFA that recognizes has at least states.
In the rest of the section we prove Lemma 3.2. We use known results from communication complexity to show that any UFA for needs super-polynomially many states. We will give a self-contained proof of the existence of polynomial-sized NFAs for and its complement, but the main argument also comes from communication complexity, as we remark below at the end of the section.
3.1 Communication Complexity
Recall from Section 2.1 the notions of rectangles and rectangles covering a set. For a two-party function , the partition number is the least number of pairwise disjoint rectangles that cover . Note that . The unambiguous communication complexity of is defined as . Note that . Denote by the communication matrix, with entries . Denote by the rank over the reals of a matrix . The following lemma, the “rank bound”, is often used for lower bounds on the deterministic communication complexity (a concept we do not need here), but it holds even for unambiguous communication complexity:
Let . Then .
For , let be pairwise disjoint rectangles that cover . Each defines a rank- matrix with if and only if and . It follows from the pairwise disjointness that . Hence . ∎
The following lemma and its proof are analogous to Lemma 2.7.
If a two-party function admits a UFA with states, i.e., there is a UFA with states and , then .
Let be a UFA with . We show that is covered by at most pairwise disjoint rectangles. Indeed, equals
and the rectangles do not overlap, as is unambiguous. ∎
3.2 Proof of Lemma 3.2
First we prove the statement on UFAs. Write . Let be the two-party function with if and only if . It is shown, e.g., in [10, Example 2.12] that the communication matrix has full rank . Let be such that if and only if . Then is a principal submatrix of , so . Using Lemmas 3.4 and 3.3 it follows that any UFA, say , that recognizes has at least states. With , it follows that has states.
It is easy to see that there is an NFA, , with states and . Indeed, we can assume that the input is of the form ; otherwise accepts. NFA guesses such that and then checks it.
Finally, we show that there is an NFA, , with states and . We can assume that the input is of the form ; otherwise rejects. NFA “hard-codes” polynomially many sets . It guesses such that and and then checks it. It remains to show that there exist sets such that for any there is with and . The argument uses the probabilistic method and is due to ; see also [10, Example 2.12]. We reproduce it here due to its elegance and brevity.
Fix . Say that a set separates if and . A random set (each is in
with probability) separates with probability . Thus, choosing random sets independently, the probability that none of them separates is
By the union bound, since , the probability that there exists such that none of random sets separates is less than . Equivalently, the probability that for all at least one of random sets separates is positive. It follows that there are such that each is separated by some . ∎
The proof above is based on known arguments from communication complexity. Indeed, they show, for and the function from above, that and and . This gap is in a sense the largest possible, as holds for all two-party functions . We even have , where is the deterministic communication complexity [10, Theorem 2.11].
In the main results, Theorems 3.1 and 2.4, we have obtained super-polynomial but quasi-polynomial lower bounds on UFA complementation and separation. These bounds are not known to be tight; indeed, in both cases the best known upper bound is exponential. At the same time, we have transferred techniques from communication complexity relatively directly. More concretely, both main theorems hinge on a finite language where is a two-party function whose communication complexity is in a sense extreme. This suggests two kinds of opportunities for future work:
Can other techniques from communication complexity improve the lower bounds further? Perhaps by somehow iterating a two-party function or via multi-party communication complexity?
Can techniques for proving upper bounds on communication complexity be adapted to prove upper bounds on the size of automata?
-  K. Balodis, S. Ben-David, M. Göös, S. Jain, and R. Kothari. Unambiguous DNFs and Alon-Saks-Seymour. In 62nd IEEE Annual Symposium on Foundations of Computer Science (FOCS), 2021. To appear. Available at https://arxiv.org/abs/2102.08348.
-  M. Bojańczyk and W. Czerwiński. An Automata Toolbox. 2018. Available at https://www.mimuw.edu.pl/~bojan/paper/automata-toolbox-book.
-  T. Colcombet. Unambiguity in automata theory. In 17th International Workshop on Descriptional Complexity of Formal Systems (DCFS), volume 9118 of Lecture Notes in Computer Science, pages 3–18. Springer, 2015.
-  W. Czerwiński, L. Daviaud, N. Fijalkow, M. Jurdziński, R. Lazić, and P. Parys. Universal trees grow inside separating automata: Quasi-polynomial lower bounds for parity games. In Proceedings of the 2019 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 2333–2349, 2019.
-  W. Czerwiński and S. Lasota. Regular separability of one counter automata. Logical Methods in Computer Science, 15(2), 2019.
-  M. Göös. Lower bounds for clique vs. independent set. In IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), pages 1066–1076. IEEE Computer Society, 2015.
-  M. Holzer and M. Kutrib. Nondeterministic descriptional complexity of regular languages. International Journal of Foundations of Computer Science, 14(6):1087–1102, 2003.
-  E. Indzhev and S. Kiefer. On complementing unambiguous automata and graphs with many cliques and cocliques. Technical report, arxiv.org, 2021. Available at https://arxiv.org/abs/2105.07470.
-  J. Jirásek Jr., G. Jirásková, and J. Sebej. Operations on unambiguous finite automata. International Journal of Foundations of Computer Science, 29(5):861–876, 2018.
-  E. Kushilevitz and N. Nisan. Communication Complexity. Cambridge University Press, 1997.
-  M. Raskin. A superpolynomial lower bound for the size of non-deterministic complement of an unambiguous automaton. In 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018), volume 107 of Leibniz International Proceedings in Informatics (LIPIcs), pages 138:1–138:11, 2018.
-  A.A. Razborov. Applications of matrix methods to the theory of lower bounds in computational complexity. Combinatorica, 10:81–93, 1990.