Lower Bounds on Unambiguous Automata Complementation and Separation via Communication Complexity

09/19/2021 ∙ by Mika Göös, et al. ∙ 0

We use results from communication complexity, both new and old ones, to prove lower bounds for problems on unambiguous finite automata (UFAs). We show: (1) Complementing UFAs with n states requires in general at least n^Ω̃(log n) states, improving on a bound by Raskin. (2) There are languages L_n such that both L_n and its complement are recognized by NFAs with n states but any UFA that recognizes L_n requires n^Ω(log n) states, refuting a conjecture by Colcombet on separation.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Preliminaries

Finite Automata

An NFA is a quintuple , where is the finite set of states, is the finite alphabet, is the transition relation, is the set of initial states, and is the set of accepting states. We write to denote that . A finite sequence is called a run; it can be summarized as . The NFA  recognizes the language . The NFA  is a DFA if and for every and there is exactly one with . The NFA  is a UFA if for every word there is at most one accepting run for , i.e., a run with and . Clearly, any DFA is a UFA.

Notation and

We use the notation and to hide polylogarithmic factors; i.e., and .

2 UFA Complementation

Given two finite automata , recognizing languages , respectively, the state complexity of union (or intersection, or complement, etc.) is how many states may be needed for an automaton that recognizes (or , or , etc.). It depends on the type of automaton considered, such as NFAs, DFAs, or UFAs.

The state complexity has been well studied for various types of automata and language operations, see, e.g., [9] and the references therein for some known results. For example, it was shown in [7] that complementing an NFA with  states may require states. However, the state complexity for UFAs is not yet fully understood. It was shown only in 2018 by Raskin [11] that the state complexity for UFAs and complement is not polynomial:

Proposition 2.1 ([11]).

For any there exists a UFA  with  states and unary alphabet  (i.e., ) such that any NFA that recognizes has at least states.

This super-polynomial blowup (even for unary alphabet and even if the output automaton is allowed to be ambiguous) refuted a conjecture that it may be possible to complement UFAs with a polynomial blowup [3]. A non-trivial upper bound (for general alphabets and outputting a UFA) was shown by Jirásek et al. [9]:

Proposition 2.2 ([9]).

Let be a UFA with states that recognizes a language . Then there exists a UFA with at most states that recognizes the language .

An almost tight analysis [8] of Jirásek et al.’s construction yields a slight improvement:

Proposition 2.3 ([8]).

Let be a UFA with states that recognizes a language . Then there exists a UFA with at most states that recognizes the language .

In this section we improve the lower bound from Proposition 2.1:

Theorem 2.4.

For infinitely many there is a UFA  with  states and alphabet and finite such that any NFA that recognizes has at least states.

Like Proposition 2.1, the lower bound holds even for NFAs (not just UFAs) that recognize the complement language. Unlike Proposition 2.1, the lower bound in Theorem 2.4 uses a binary alphabet, i.e., .

In the rest of the section we prove Theorem 2.4. The proof uses concepts and results from communication complexity, in particular a recent result from [1].

2.1 Communication Complexity

Let be an -variate boolean formula in disjunctive normal form (DNF). DNF  has width  if every is a conjunction of at most literals. We call such  a -DNF. For conjunctive normal form (CNF) formulas the width and -CNFs are defined analogously. DNF  is said to be unambiguous if for every input at most one of the conjunctions  evaluates to true, . For any boolean function define

  • as the least such that can be written as a -DNF;

  • as the least such that can be written as a -CNF;

  • as the least such that can be written as an unambiguous -DNF.

Note that . The following is a recent result [1]:

Theorem 2.5 ([1, Theorem 1]).

For infinitely many  there exists a boolean function with and .

In words, for infinitely many  there is an unambiguous -DNF such that any equivalent CNF requires width . The bound is almost tight, as every unambiguous -DNF has an equivalent -CNF; see [6, Section 3].

We need results on two-party communication complexity; see [10] for the standard textbook. Consider a “two-party” function . A set (with and ) is called a rectangle. Rectangles cover a set if . For , the cover number is the least number of rectangles that cover . The nondeterministic (resp., co-nondeterministic) communication complexity of  is defined as (resp., ). Note that . The nondeterministic communication complexity can be interpreted as the number of bits that two parties, holding inputs and , respectively, need to communicate in a nondeterministic (i.e., based on guessing and checking) protocol in order to establish that ; see [10, Chapter 2] for details.

The following is a “lifting” theorem, which allows us to transfer lower bounds on the DNF width of a boolean function to the nondeterministic communication complexity of a two-party function.

Theorem 2.6 ([6, Theorem 4]).

For any there is a function with such that for any function the function defined by

satisfies (and thus also ).

Finally, we need the following simple lemma:

Lemma 2.7.

If a two-party function admits an NFA with  states, i.e., there is an NFA with  states and , then .


Let be an NFA with . We show that is covered by at most rectangles. Indeed, equals

(Alternatively, in terms of a nondeterministic protocol, the first party, holding , produces a run for  from an initial state to a state  and then sends the name of , which takes bits, to the other party. The other party then produces a run for  from to an accepting state.) ∎

2.2 Proof of Theorem 2.4

For , let be the function from Theorem 2.5, i.e., has an unambiguous -DNF with (hence, ) and . Let with and be the two-party functions from Theorem 2.6, with . The UFA  from the statement of Theorem 2.4 will recognize .

First we argue that has an unambiguous DNF of small width. Indeed, and have unambiguous

-DNFs, which can be extracted from the deterministic decision tree of 

. By plugging these unambiguous -DNFs for and  into the unambiguous -DNF for  (and “multiplying out”), one obtains an unambiguous -DNF, say , for .

Over the variables of , there exist at most different conjunctions of at most literals. So consists of at most conjunctions. From  we obtain a UFA  that recognizes , as follows. Each initial state of  corresponds to a conjunction in . When reading the input , the UFA checks that the corresponding assignment to the variables satisfies the conjunction represented by the initial state. This check requires at most states for each initial state. Thus, has at most states in total.

On the other hand, by Theorem 2.6, we have . So by Lemma 2.7 any NFA that recognizes has at least states. Any NFA that recognizes can be transformed into an NFA that recognizes by taking a product with a DFA that has states. It follows that any NFA that recognizes has at least states. ∎

3 Separation of Regular Languages by UFAs

In [3, Conjecture 2], Colcombet conjectured that for any NFAs with there is a polynomial-sized UFA  with and . Related separability questions are classical in formal language theory and have attracted renewed attention; see, e.g, [5] and the references therein. Separating automata have also been used recently to elegantly describe quasi-polynomial time algorithms for solving parity games in an automata theoretic framework; see [2, Chapter 3] and [4].

In this section we refute the above-mentioned conjecture by Colcombet, even when :

Theorem 3.1.

For any there are NFAs with  states and alphabet and finite and such that any UFA that recognizes has at least states.

Loosely speaking, in our construction, NFAs recognize (sparse) set disjointness and its complement. For write and define for

Define also where is such that the th letter of is  if and only if , and similarly for . Note that each contain times the letter . To prove Theorem 3.1 it suffices to prove the following lemma.

Lemma 3.2.

For any let . There are NFAs with states and alphabet and and . Any UFA that recognizes has at least states.

In the rest of the section we prove Lemma 3.2. We use known results from communication complexity to show that any UFA for needs super-polynomially many states. We will give a self-contained proof of the existence of polynomial-sized NFAs for and its complement, but the main argument also comes from communication complexity, as we remark below at the end of the section.

3.1 Communication Complexity

Recall from Section 2.1 the notions of rectangles and rectangles covering a set. For a two-party function , the partition number is the least number of pairwise disjoint rectangles that cover . Note that . The unambiguous communication complexity of  is defined as . Note that . Denote by the communication matrix, with entries . Denote by the rank over the reals of a matrix . The following lemma, the “rank bound”, is often used for lower bounds on the deterministic communication complexity (a concept we do not need here), but it holds even for unambiguous communication complexity:

Lemma 3.3.

Let . Then .


For , let be pairwise disjoint rectangles that cover . Each defines a rank- matrix with if and only if and . It follows from the pairwise disjointness that . Hence . ∎

The following lemma and its proof are analogous to Lemma 2.7.

Lemma 3.4.

If a two-party function admits a UFA with  states, i.e., there is a UFA with  states and , then .


Let be a UFA with . We show that is covered by at most pairwise disjoint rectangles. Indeed, equals

and the rectangles do not overlap, as is unambiguous. ∎

3.2 Proof of Lemma 3.2

First we prove the statement on UFAs. Write . Let be the two-party function with if and only if . It is shown, e.g., in [10, Example 2.12] that the communication matrix has full rank . Let be such that if and only if . Then is a principal submatrix of , so . Using Lemmas 3.4 and 3.3 it follows that any UFA, say , that recognizes has at least states. With , it follows that has states.

It is easy to see that there is an NFA, , with  states and . Indeed, we can assume that the input is of the form ; otherwise accepts. NFA  guesses such that and then checks it.

Finally, we show that there is an NFA, , with  states and . We can assume that the input is of the form ; otherwise rejects. NFA  “hard-codes” polynomially many sets . It guesses such that and and then checks it. It remains to show that there exist sets such that for any there is with and . The argument uses the probabilistic method and is due to [12]; see also [10, Example 2.12]. We reproduce it here due to its elegance and brevity.

Fix . Say that a set separates  if and . A random set (each is in 

with probability 

) separates  with probability . Thus, choosing random sets  independently, the probability that none of them separates  is

By the union bound, since , the probability that there exists such that none of random sets separates  is less than . Equivalently, the probability that for all at least one of random sets separates  is positive. It follows that there are such that each  is separated by some . ∎

The proof above is based on known arguments from communication complexity. Indeed, they show, for and the function  from above, that and and . This gap is in a sense the largest possible, as holds for all two-party functions . We even have , where is the deterministic communication complexity [10, Theorem 2.11].

4 Conclusions

In the main results, Theorems 3.1 and 2.4, we have obtained super-polynomial but quasi-polynomial lower bounds on UFA complementation and separation. These bounds are not known to be tight; indeed, in both cases the best known upper bound is exponential. At the same time, we have transferred techniques from communication complexity relatively directly. More concretely, both main theorems hinge on a finite language where is a two-party function whose communication complexity is in a sense extreme. This suggests two kinds of opportunities for future work:

  • Can other techniques from communication complexity improve the lower bounds further? Perhaps by somehow iterating a two-party function or via multi-party communication complexity?

  • Can techniques for proving upper bounds on communication complexity be adapted to prove upper bounds on the size of automata?