Regular languages have a long history of study in classical theoretical computer science, going back to Kleene in the 1950s . The definition is extremely robust: there are many equivalent characterizations ranging from machine models (e.g., deterministic or non-deterministic finite automata,
-space Turing machines
), to grammars (e.g., regular expressions, prefix grammars), to algebraic structures (e.g., recognition via monoids, the syntactic congruence, or rational series). Regular languages are also very well-behaved in the sense that most natural questions are decidable (e.g., is the language infinite?), and most natural operations on languages (e.g., union, complement) are computable. Perhaps for this reason, regular languages are also a useful pedagogical tool, serving as a toy model for theory of computation students to cut their teeth on.
We liken regular languages to the symmetric111A symmetric Boolean function is such that the value of only depends on the Hamming weight of the input. Boolean functions. That is, both are a restricted, (usually) tractable special case of a much more general object, and often the common thread between a number of interesting examples. We suggest that these special cases should be studied and thoroughly understood first, to test proof techniques, to make conjectures, and to gain familiarity with the setting.
In this work, we hope to understand the regular languages from the lens of another great innovation of theoretical computer science—query complexity, particularly quantum query complexity. Not only is query complexity one of the few models in which provable lower bounds are possible, but is also often the case that efficient algorithms actually achieve the query lower bound. In this case, it is possible that the query lower bound implies an algorithm which was otherwise thought not to exist, as was famously the case for Grover’s search algorithm.
In the case of query complexity, symmetric functions are extremely well-understood with complete characterizations known for deterministic, randomized, and quantum algorithms in both the zero-error and bounded-error settings . However, to the authors’ knowledge, regular languages have not been studied in the query complexity model despite the fact that they appear frequently in query-theoretic applications.
For example, consider the OR function over Boolean strings. This corresponds to deciding membership in the language recognized by the regular expression . Similarly, the parity function is just membership in the regular language . It is well known that the quantum query complexity of OR is , whereas parity is known to require quantum queries. Yet, there is a two-state deterministic finite automaton for each language. This raises the question: what is the difference between these two languages that causes the dramatic discrepancy between their quantum query complexities? More generally, can we decide the quantum query complexity of a regular language given a description of the machine recognizing it? Are all quantum query complexities even possible, given that the deterministic query complexity of a regular language seems, intuitively, to be either or ? We answer all of these questions in this paper.
The main contribution of this work is the complete characterization of the quantum query complexity of regular languages (up to some technical details), manifest as the following trichotomy: every regular language has quantum query complexity , , or . In the process, we get an identical trichotomy for approximate degree, and dichotomies—in this case, or —for a host of other complexity measures including deterministic complexity, randomized query complexity, sensitivity, block sensitivity, and certificate complexity.
Many of the canonical examples of regular languages fall easily into one of the three categories via well-studied algorithms or lower bounds. For example, the upper bound for the OR function results from Grover’s famous search algorithm, and the lower bounds for OR and parity functions are straightforward applications of either the polynomial method  or adversary method . Nevertheless, it turns out that there exists a vast class of regular languages which have neither a trivial lower bound nor an obvious upper bound resulting from a straightforward application of Grover’s algorithm. A central challenge of the trichotomy theorem for quantum query complexity was showing that these languages do actually admit a quadratic quantum speedup.
One such example is the language , where . Although there is no finite witness for the language (e.g., to find by Grover search), we show that it nevertheless has an quantum algorithm. More generally, this language belongs to a subfamily of regular languages known as star-free languages because they have regular expressions which avoid Kleene star (albeit with the addition of the complement operation). Like regular languages, the star-free languages have many equivalent characterizations: counter-free automata , predicates expressible in either linear temporal logic or first-order logic [17, 21], the preimages of finite aperiodic monoids , or cascades of reset automata . The star-free languages are those regular languages which can be decided in queries. As a result, reducing a problem to any one of the myriad equivalent representations of these languages yields a quadratic quantum speedup for that problem.
Let us take McNaughton’s characterization of star-free languages in first-order logic as one example . That is, every star-free language can be expressed as a sentence in first-order logic over the natural numbers with the less-than relation and predicates for , such that is true if input symbol is . We can easily express the OR function as , or the more complicated language as
Our result gives an algorithm for this sentence and arbitrarily complex sentences like it. We see this as a far-reaching generalization of Grover’s algorithm, which extends the Grover speedup to a much wider range of string processing problems than was previously known.222Readers familiar with descriptive complexity will recall that has a similar, but somewhat more general characterization in first-order logic. It follows that all star-free languages, which have quantum query complexity , are in . Conversely, we will show that regular languages not in have quantum query complexity . Thus, another way to state the trichotomy is that very roughly speaking regular languages in have complexity , regular languages in but not have complexity , and everything else has complexity .
Our main result is the following:
Theorem 1 (informal).
Every333There are two caveats to this statement: 1) The query complexity may oscillate between asymptotically different functions, 2) the query complexity may also be zero. For the formal statement of this theorem see Section 3. regular language has quantum query complexity , , or . Furthermore, each query upper bound results from an explicit algorithm.
The theorem and its proof have several consequences which we highlight below.
Algebraic characterization: We give a characterization of each class of regular languages in terms of the monoids that recognize them. That is, the monoid is either a rectangular band, aperiodic, or finite. In particular, given a description of the machine, grammar, etc. generating the language, we can decide its membership in one of the three classes by explicitly calculating its syntactic monoid and checking a small number of conditions. See Section 3.
Related complexity measures: Many of the lower bounds are derived from lower bounds on other query measures. To this end, we prove query dichotomies for deterministic complexity, randomized query complexity, sensitivity, block sensitivity, and certificate complexity—they are all either or for regular languages. By standard relationships between the measures, this shows that approximate degree and quantum query complexity are either or . See Section 6.
Generalization of Grover’s algorithm: The algorithm using queries for star-free regular languages extends to a variety of other settings given that the star-free languages enjoy a myriad of equivalent characterizations. The characterization of star-free languages as first-order sentences over the natural numbers with the less-than relation shows that the algorithm for star-free languages is a broad generalization of Grover’s algorithm. See Section 4 for the description and proof of the star-free algorithm and Section 1.3 for applications.
Star-free algorithm from faster unstructured search: The algorithm for star-free languages results from many nested calls to Grover search, using the speedup due to multiple marked items. However, a careful analysis reveals that whenever this speedup is required, the marked items are consecutive. We show that these Grover search calls can then be replaced by any unstructured search algorithm. Therefore, any model of computation that has faster-than-brute-force unstructured search will have an associated speedup for star-free languages. Consider, for example, the model of quantum computation of Aaronson, Bouland, Fitzsimons, and Lee in which non-collapsing measurements are allowed . It was shown that unstructured search in that model requires at most queries, and therefore, star-free languages can be solved in queries as well.
Finally, we stress that this trichotomy is only possible due to the extreme uniformity in the structure of regular languages. In particular, the trichotomy does not extend to another basic model of computation, the context-free languages.
For all limit computable444We say that a number is limit computable if there exists a Turing machine which on input outputs some rational number such that . , there exists a context-free language such that and for all . Furthermore, if an additive -approximation to is computable in time, then . In particular, any algebraic has this property.
In fact, the converse also holds.
Let be a context-free language such that . Then, is limit computable.
1.2 Proof Techniques
Most of the lower bounds are derived from a dichotomy theorem for sensitivity—the sensitivity of a regular language is either or . In particular, we show that the language of sensitive bits for a regular language is itself regular. Therefore, by the pumping lemma for regular languages, we are able to boost any nonconstant number of sensitive bits to sensitive bits, from which the dichotomy follows.
The majority of the work required for the classification centers around the quantum query algorithm for star-free languages. The proof is based on Schützenberger’s characterization of star-free languages as those languages recognized by finite aperiodic monoids. Starting from an aperiodic monoid, Schützenberger constructs a star-free language recursively based on the “rank” of the monoid elements involved. Roughly speaking, this process culminates in a decomposition of any star-free language into star-free languages of smaller rank. Although this decomposition does not immediately give rise to an algorithm, the notion of rank proves to be a particularly useful algebraic invariant. Specifically, we use it to show that given a query algorithm for membership in some star-free language , we can construct a query algorithm for . This “infix” algorithm is the key subroutine for much of the general star-free algorithm.
Consider the language , where . We call this the dynamic AND-OR language, for reasons which may not be evident from the regular expression alone. Think of the ’s as delimiting the string into some number of blocks over . We take the OR of each block and the AND of those results to decide if the string is in the language. That is, if there is some pair of consecutive ’s with no intervening , then that block evaluates to , and the whole string is not in the language. It has long been known that the quantum query complexity of the AND-OR tree, or more generally Boolean formulas with constant depth, is . In that case, however, the tree or formula is fixed in advance and not allowed to change with the input. Nevertheless, our quantum algorithm for star-free languages implies that even the dynamic version of the AND-OR language (as well as the dynamic generalization of constant-depth Boolean formulas ) can be decided with queries and, moreover, there is an efficient quantum algorithm.
Next consider the language of balanced parentheses, where the parentheses are only allowed to nest levels deep. When is unbounded, this is called the Dyck language. When this is the language of strings of the form , which has a simple Grover search speedup—search for (( or )). However, the language quickly becomes more interesting as increases. Nevertheless, for any constant , this language is known to be star free , and therefore has an quantum algorithm by our classification. To the authors’ knowledge, no quadratic speedup for arbitrary constant was known prior to this publication.
1.4 Related Work
We are not the first to study regular languages in a query-complexity setting. One such example is work in property testing by Alon, Krivelevich, Newman, and Szegedy. They show that regular languages can be tested555We say a language is testable with constantly many queries if there exists a randomized algorithm such that given a word , the algorithm accepts if , and the algorithm rejects if at least many positions of must be changed in order to create a word in . The algorithm is given many queries to . with queries . Interestingly, Alon et al. also show that there exist context-free grammars which do not admit constant query property testers . In Section 7, we show that context-free languages can have query complexity outside the trichotomy.
A second example comes from work of Tesson and Thérien on the communication complexity of regular languages . As with query complexity, several important functions in communication complexity happen to be regular, e.g., inner product, disjointness, greater-than, and index. They show that for several measures of communication complexity, the complexity is , , , or . Clearly, there are many parallels with this work, but surprisingly the classes of regular languages involved are different. Also, communication complexity is traditionally more difficult than query complexity, yet the authors appear to have skipped over query complexity—we assume because quantum query complexity is necessary to get an interesting result.
There are also striking parallels in work of Childs and Kothari, who conjecture a dichotomy for the quantum query complexity of minor-closed graph properties . Minor-closed graph properties are not, to our knowledge, directly related to regular languages, but they are morally similar in that both are very uniform—(almost) every part of the input is treated the same by the property. Childs and Kothari show that such properties have query complexity , except for forbidden subgraph properties which are and , and are conjectured to be . Even some of the proof techniques are similar—the proof that forbidden subgraph properties are could be phrased in terms of block sensitivity, like our lower bound for non-trivial languages.
Finally, we are aware of one more result on the complexity of star-free languages prior to our work. It is possible to show that star-free languages have quantum query complexity, just barely enough to separate them from non-star-free languages. This result is a combination of two existing results: Chandra, Fortune, and Lipton  show that star-free languages have (very slightly) super-linear size circuits; Bun, Kothari, and Thaler show that linear size circuits have (moderately) sublinear quantum query complexity . This connection was pointed out to us by Robin Kothari.
This section introduces both regular languages and basic query complexity measures and their relationships. In particular, we will focus on algebraic definitions of regular languages as they serve as the basis for many of the results in this paper. Readers familiar with query complexity can skip much of the introduction on that topic, but may still want to read Section 2.2.2 on extending the complexity measures to larger alphabets.
2.1 Regular languages
The regular languages are those languages that can be constructed from , , and singletons for all using the operations of concatenation (e.g., ), union (e.g., ), and Kleene star666Let be a set of strings. Define , that is, the concatenation of zero or more strings in . We will also use to capture one or more strings. (). A regular expression for a regular language is an explicit expression for how to construct the language, traditionally writing for alternation (instead of union), and omitting some brackets by writing for and for . For example, over the alphabet , the OR function can be written as regular expression , and the languages of all strings such that there are no two consecutive 1’s is .
The class of regular languages has extremely robust definitions and many equivalent characterizations. For instance, some machine-based definitions777We assume familiarity with the basic machine models for regular languages—see  for an introduction. include those languages accepted by deterministic finite automata (DFA), or by non-deterministic finite automata (NFA), or even by alternating finite automata. Regular languages also arise by weakening Turing machines, for example by making the machine read-only or limiting the machine to space.
For our purposes, some of the most useful definitions of regular languages are algebraic in nature. In particular, regular languages arise as the preimage of a subset of a finite monoid under monoid homomorphism. First, we say that language is recognized by a monoid if there exists a monoid homomorphism (where is a monoid under concatenation) and a subset such that
Theorem 4 (folklore).
A language is recognized by a finite monoid iff it is regular.
In fact, starting from a regular language, we can specify a finite monoid recognizing it through the so-called syntactic congruence. Given language , the syntactic congruence is an equivalence relation on such that if for all . Thus, divides into equivalence classes. Furthermore, is a monoid congruence because and imply . This means the equivalence classes of under are actually congruence classes (because they can be multiplied), defining a monoid which we call the syntactic monoid of . Finally, it is not hard to see that the map , from a string to its congruence class, is a homomorphism. Therefore, by Theorem 4, the syntactic monoid for any regular language is finite.
The most important subclass of regular languages are the star-free languages. These languages are recognized by a variant of regular expressions where complement () is allowed but Kleene star is not. We call these star-free regular expressions. For convenience, star-free regular expressions sometimes contain the intersection operation since it follows by De Morgan’s laws.
Note that star-free languages are not necessarily finite. For example, can also be expressed as , the complement of the empty language. Similarly, is , the set of strings which do not contain a string other than . Once again, an algebraic characterization of star-free languages will be particularly useful for us. First, we say that a monoid is aperiodic if for all there exists an integer such that .
Theorem 5 (Schützenberger ).
A language is recognized by a finite aperiodic monoid iff it is star free.
We also define a subset of the star-free languages, which we call the trivial languages. Intuitively, the trivial languages are those languages for which membership can be decided by the first and last characters of the input string,888More generally, trivial languages are decided by a constant size prefix and/or suffix of the input, but the processing we do to formalize the trichotomy theorem compresses those substrings to length 1. See Section 3. which we formalize as those languages accepted by trivial regular expressions. A trivial regular expression is any Boolean combination of the languages , , and for .
The algebraic characterization of trivial languages will need to use both the properties of the monoid and the properties of the homomorphism onto the monoid. To that end, we say that language is recognized by a monoid homomorphism if for some subset . Finally, a monoid is a rectangular band if for , each element is idempotent, , and satisfies the rectangular property, .
Theorem 6 (Appendix B).
A language is recognized by morphism such that is a finite rectangular band iff it is trivial.
2.2 Query complexity
This section serves as a brief overview of query complexity, a model of computation where algorithms are charged based on the number of input bits they reveal (the input is initially hidden) rather than the actual computation being done. To model that the input is hidden, all query algorithms must access their inputs via an indexing oracle—a function which takes some index and outputs the value of the corresponding input bit. We use the standard notion of oracles in the quantum setting. That is, for oracle function , the quantum algorithm can apply the -qubit transformation which flips the last qubit if applied to the first qubits evaluates to 1.
Formally, the quantum query complexity of a function is a function such that is the minimum number of oracle calls for a quantum circuit to decide (with bounded error) the value of for input strings of length . An astute reader may notice that we only defined the indexing function over bits and that regular languages are defined over arbitrary finite alphabets . However, one can always transform the function so that each symbol of is encoded by bits. In fact, we will show later that this only affects the query complexity by a constant factor for regular languages.
One can similarly define deterministic query complexity (), bounded-error randomized query complexity (), and zero-error randomized query complexity () by counting the number of input symbols accessed in these models. Closely related to quantum query complexity is a notion of approximation by polynomials called approximate degree, denoted . The approximate degree of a function is the minimum degree of a polynomial such that for all .
We conclude by defining several query complexity measures which are useful tools in proving lower bounds in the more standard models of computation above. Fix a function . Let be some input. We say that some input symbol is sensitive if changing only changes the value of the function on that input. The sensitivity of is equal to its total number of sensitive symbols. The sensitivity of , denoted , is the maximum sensitivity over all inputs .
Similarly, the block sensitivity at an input is the maximum number of disjoint blocks (i.e., subsets of the input bits) such that changing one entire block changes the value of the function. The block sensitivity of , denoted , is the maximum block sensitivity over all inputs .
A certificate is a partial assignment of the input symbols such that evaluates to the same value on all inputs consistent with the certificate. The certificate complexity of an input is the minimum certificate size (i.e., the number of bits assigned in the partial assignment). The certificate complexity of , denoted , is the maximum certificate complexity over all inputs.
Finally, when clear from context, we will often let a language denote its characteristic function when used as an argument in the various complexity measures. For example, for language, we will write as the quantum query complexity of the function where iff .
There are many relationships between the different complexity measures that will be useful throughout this paper. For example, the proposition below follows from the fact that some models of computation can easily simulate others.
For all ,
In Section 5, we prove a dichotomy theorem for block sensitivity—it is either or . This is particularly useful since nearly all complexity measures are polynomially related to block sensitivity:
Theorem 8 ().
For all , we have the following relationships for block sensitivity:
|Lower bounds||Upper bounds|
Notice that for nearly all complexity measures , we have for some constants . The exception is sensitivity, for which it is famously open whether a polynomial in sensitivity upper bounds block sensitivity. There is, however, an exponential relation due to Simon.
Theorem 9 (Simon ).
For all , .
If any query complexity measure in is , then all of them are .
2.2.2 Alphabet size
In this section, we discuss how alphabet size affects the various query measures. Recall that the query complexity measures above are usually defined for Boolean functions. Nevertheless, we would like to extend the known relationships between the complexity measures to functions over larger (yet constant) alphabets. While it is true that many of these relationships generalize without too much work, we would like to avoid reproving the results one at a time.
Our solution is to simply encode symbols of as binary strings of length . If the size of the alphabet is not a power of two, we can simply map the extra binary strings to arbitrary elements of . This maps a language to a language over binary strings. Since regular languages are closed under inverse morphism, is regular if is regular.
It is also easy to see that almost all complexity measures are changed by at most a constant factor when converting to a binary alphabet. For example, since for any bit we look at, there is some symbol we can examine that tells us that bit. In the other direction, , since we can query the entire encoding of any symbol we query. Similarly, the encoding changes , , , , , and (with some additional work) , by at most a constant factor. The exception is block sensitivity.
It is clear that , since for any sensitive block of symbols there is some way to flip it, and this changes some block of bits. In the other direction, a block of sensitive bits gives a block of sensitive symbols in the obvious way, but then disjoint blocks of bits will not necessarily map to disjoint blocks of symbols, so it is difficult to say more for general languages.
Let be a regular language. Then, there exists constant such that for all .
We borrow a dichotomy result999Note that Corollary 27 is true for any alphabet size and does not depend on Theorem 11, so the argument is not circular. from Section 5, namely Corollary 27—any flat regular language has sensitivity either or . Since is a regular language and not necessarily flat, we also borrow Theorem 12 from Section 3—membership in reduces to membership in some flat language based on some finite suffix of the input string. Therefore, for every length , the sensitivity is either constant or , which we use to split the proof into two cases.
If the sensitivity is constant, then is also constant. This implies that is constant by Theorem 9. Therefore, is also constant since . If the sensitivity is not constant, then it is linear by the dichotomy theorem. Therefore, implies block sensitivity is linear for both languages from which the theorem follows. ∎
With this theorem, every regular language and its encoding have the same complexity for all of the measures we are interested in, up to constants. Therefore, we will lift known relationships between complexity measures in the Boolean setting to the general alphabet setting without further comment.
3 Formal Statement
The naïve version of the trichotomy theorem states that the quantum query complexity of a language is always , , or . Unfortunately, this is not strictly true. We now explain the difficulty and a technique which we call “flattening” that allows us to formalize this statement.
Let us see why flattening is necessary. Consider any language which has large quantum query complexity (e.g., parity) and take its intersection with
, the language of even length strings. When the input length is odd, we know without any queries that the string cannot be in the language. When the input length is even, we have to solve the parity problem, which requiresqueries. Thus, the query complexity oscillates drastically between and depending on the length of the input. Strictly speaking, this means the complexity is neither , , nor ; the naïve statement of the trichotomy is false.
We want to state the trichotomy only for languages which are length-independent. Fortunately, a DFA cannot count how many symbols it reads. With finite state, the best a DFA can do is count modulo some constant. Thus, if there is any dependence on length, it is periodic. We introduce flattening as a procedure to remove these periodicities from the language.
Before continuing with flattening, we address a different way to handle length dependence. That is, redefine the quantum query complexity of a function to be the minimum number of quantum oracle calls needed to compute the function on inputs of length up to (rather than exactly ). For this definition, notice that the quantum query complexity is nondecreasing. In Appendix A.1 we show that trichotomy theorem holds for all regular languages under this definition as a simple consequence of Theorem 1, the trichotomy theorem for flat languages. To be clear, we will continue to use the standard definition of quantum query complexity for the remainder of the paper.
The main idea behind flattening is to eliminate a language’s dependence on length by dividing the strings into blocks. For any string of length , we can reimagine as a length string over . This operation can be applied to a language by keeping only strings of length divisible by and projecting them to the alphabet . Flattening a regular language applies this operation to the language for some carefully chosen , removing anomalies in the query complexity due to the input length. Nevertheless, we argue that the language and its flattened version are essentially the same since we are simply blocking characters together. We formalize this in the following theorem.
Let be a regular language recognized by a monoid . There exists an integer and a finite family of flat regular languages over alphabet such that testing membership in reduces (in fewer than queries) to testing membership in some . Furthermore, the same monoid recognizes and every .
The full proof is in Appendix A with the rest of the details about flattening a language. The key property of a flattened language is the following:
Let be a flat regular language. For any non-empty string , and any non-zero length , there exists a string of length such that for any ,
That is, and belong to the same congruence class.
In other words, for any non-empty string , we can replace (substring) occurrences of with some string of every (non-zero) length, without changing membership in the language. Notice that a flat regular language cannot have a length dependence, otherwise we would replace the first few letters with something slightly longer or shorter to reduce the problem to whichever nearby length is easiest.
To summarize, any regular language can be reduced (or flattened) to a collection of flat regular languages. Some of the those languages may be easier than others, but they are all length-independent, and thus suitable for our trichotomy theorem. See Appendix A for details.
3.2 Formal Statement of Main Result
We are now ready to formally state Theorem 1. Technically, there are a few regular languages (even flat languages), which can be decided with zero queries, strictly from the length of the input. This divides the languages into the following four classes (i.e., a tetrachotomy).
Every flattened regular language has quantum query complexity , , , or according to the smallest class in the following hierarchy that contains the language.
[itemsep = 0pt]
Degenerate: One of the four languages , , , or .
Trivial: The set of languages which have trivial regular expressions.
Star free: The set of languages which have star-free regular expressions.
Regular: The set of languages which have regular expressions.
Note that each class is contained in the next.
Nevertheless, we refer to this classification as a trichotomy. We either think of degenerate and trivial languages under the category of “constant query regular languages” or, alternatively, disregard the degenerate languages entirely because they are uninteresting.
As it turns out, the regular expression descriptions, some of which were already mentioned in Section 2, are not particularly useful for the classification. We will prefer the following algebraic/monoid definitions of the languages, and use them throughout. We prove they coincide with the regular expression characterizations in Appendix B.
Let be a regular language.
[itemsep = 0pt]
is degenerate iff it is recognized by morphism such that .
is trivial iff it is recognized by morphism such that is a finite rectangular band.
is star free iff it is recognized by a finite aperiodic monoid.
is regular iff it is recognized by a finite monoid.
3.3 Structure of the proof
We separate the proof of the trichotomy into two natural pieces: upper bounds (Section 4) and lower bounds (Section 6). The upper bounds are derived directly from the monoid characterizations of the various classes. Given a flat language, we construct explicit algorithms using at most 0 queries for degenerate languages, 2 queries for trivial languages, queries for star-free languages, and queries for regular languages.
The lower bound section aims to prove that these are the only possible classes. First, we show that any non-degenerate language requires at least one quantum query. We then show that any nontrivial language requires quantum queries. At this point, we will appeal to a dichotomy theorem for the block sensitivity of regular languages, which we prove in Section 5. From this dichotomy and standard relationships between the complexity measures, we get that any regular language requiring quantum queries actually requires queries. Finally, we show that any non-star-free language requires queries, completing the proof.
4 Upper Bounds
In this section, we will describe the algorithms for achieving the query upper bounds in Theorem 1. As a warm-up, we will first consider every class besides the star-free languages. Each algorithm will follow trivially from the monoid characterization of each class.
Any regular language has an time deterministic algorithm. The trivial languages have constant-time deterministic algorithms. The degenerate languages have -query deterministic algorithms.
Let be a regular language. Let be the homomorphism onto its syntactic monoid such that . Let . We have that iff . Since is finite and is specified by a finite mapping from characters to monoid elements, this product is computable in linear time.
Suppose is trivial. Consider input where and . By the rectangular band property, we have . That is, iff .
Suppose is degenerate. Consider some input . If , then iff . If , then so iff . Since the query algorithm knows the length in advance, no queries are needed to determine the membership of . ∎
Of course, the existence of these deterministic algorithms implies their corresponding query upper bounds as well. Much more interesting is the quantum algorithm for star-free languages to which the remainder of this section is dedicated. Much like Proposition 15, we will use the monoid characterization as our starting point for the algorithm; however, before delving directly into the details of the algorithm, we give some techniques and ideas that will be pervasive throughout.
4.1 Proof techniques
In this section, we introduce a basic substring search operation and a decomposition theorem (due to Schützenberger) for aperiodic monoids.
4.1.1 Splitting and infix search
Consider the language over the alphabet , that is, the problem of finding a substring of the form . We call the problem of finding a contiguous substring satisfying a predicate infix search. Since is star free, our trichotomy theorem implies that infix search for the language is possible with queries.
Consider the following algorithm for : Grover search for an index in the middle of a substring , searching outwards to verify that there is a substring of the form immediately before the index (suffix search) and a substring of the form immediately after (prefix search). More precisely, we can use Grover search to check whether a substring is all s, then binary search to determine how far the s extend on either side of the index, and finally check for s on either end.
We introduce a few ideas necessary to prove this algorithm for is efficient, and to generalize it to arbitrary languages. The first tool we need is Grover search, to help us search for the position of the substring. In particular, we use a version of Grover search which is faster when there are multiple marked items.101010
In this section, we will need the speedup from multiple marked items. However, whenever we require the speedup, the marked items will be consecutive. In this case, we can derive the same speedup from any unstructured search algorithm by searching over indices at fixed intervals (a “grid” on the input). In more detail: we search for a grid size , starting from and halving until is less than the number of consecutive marked items (which is unknown). Hence, the set of indices divisible by will intersect some marked item and the search on indices will succeed in queries. Since the last search dominates the runtime, the entire procedure requires queries.
In fact, there are other models of computation where unstructured search uses queries for (for instance, ). It will turn out that the procedure described above still accelerates search for multiple consecutive marked items. This will translate to a query algorithm for star-free languages. In particular, the runtime in Theorem 18 becomes .
Theorem 16 (Grover search).
Given oracle access to a string of length which is 1 on at least indices, there exists is a quantum algorithm which returns a random index on which the oracle evaluates to 1 in queries with constant probability.
queries with constant probability.
Next, the solution to used the fact that given an index, we can search outwards for a substring before the index and after. Notice that the index has “split” the regular language into two closely related languages. It is not clear every language has this property, so we introduce a notion of splitting for arbitrary regular languages.
We say that a language splits as if
[ itemsep = 0pt ]
for a constant, and
for all and decompositions , there exists such that and .
Formally, splits as . In fact, every star-free language splits as where the and are also star free. We will prove this in the next section in Theorem 23. We delay the proof until we have the definitions to show that the languages and are in some sense no harder than the language itself.
Supposing we can determine membership for and efficiently, a combination of Grover search and exponential search will solve the infix search problem, as shown below.
Theorem 18 (Infix search).
Let language split as . Suppose and are for all . Then, .
We perform an exponential search—doubling with initially set to 1—until the algorithm succeeds. Let be the input and suppose there is a substring of belonging to of length at least and at most , for some power of two . Search for an index such that and for some . This implies the substring is in .
Since testing each index requires at most queries and is constant, there are queries to the string to test a particular index . Recall that we assumed the matching substring has length at least , and thus, there are indices of for which the prefix/suffix queries will return true. Hence, there are at most total Grover iterations (Theorem 16), and the final algorithm requires only queries. ∎
4.1.2 Aperiodic monoids and Schützenberger’s proof
At its core, the algorithm for star-free languages uses Schützenberger’s characterization of star-free languages, which we recall from Section 2.
If language is recognized by a finite aperiodic monoid, then is star free.
We will show that Schützenberger’s proof can be modified to produce a algorithm for any star-free language starting from the aperiodic monoid recognizing it. Central to this modification will be the notion of splitting introduced in the previous section. In this section we give the basic prerequisites and outline for Schützenberger’s proof which will eventually culminate in a formal justification of splitting based on the properties of aperiodic monoids.
Let be a finite aperiodic monoid recognizing some language . Recall that where is a surjective monoid homomorphism, and is some subset of the monoid. Thus, to show that is star free, it suffices to show that is star free for each .
One of the central ideas in Schützenberger’s proof is to consider these languages in order of the size of the ideal111111Let be a monoid and be a subset. We say is a right ideal if , is a left ideal if , and is an ideal if . For example, for any , is a right ideal, is a left ideal, and is an ideal. they generate. Formally, Schützenberger’s proof is an induction on the rank of , defined as
that is, the number of elements not in . For example, . Rank is a particularly useful measure of progress in the induction due to the following proposition:
For any we have .
, so . Therefore, . Similarly, . ∎
It will turn out that only the identity of the monoid has rank 0. First, we show that a product of monoid elements is the identity if and only if every element is the identity.
For elements in an aperiodic monoid , if then .
It suffices to prove the result for and induct. Suppose , and then by repeated substitution,
for any . Since the monoid is aperiodic, there exists such that . Therefore,
By symmetry, is also the identity. ∎
Let be a finite aperiodic monoid. For any , iff .
Suppose that for some monoid element . By the definition of rank, we have that , and in particular implies for some . By Proposition 20, . ∎
It is not hard to see that is star free. For , Schützenberger decomposes into a Boolean combination of star-free languages with strictly smaller rank, completing the proof. To avoid recapitulating all of Schützenberger’s proof, we simply quote the main decomposition theorem.
Theorem 22 (Decomposition Theorem).
For any ,
Furthermore, for all appearing in , , or , .
To see the decomposition theorem worked out on a small example, we refer the reader to Appendix C. Although Theorem 22 is sufficient to prove Schützenberger’s theorem, the same inductive approach does not immediately lead to a quantum algorithm for star-free languages. For example, it is not clear how to efficiently decide membership in given an algorithm for membership in .121212We will show this is possible, but it requires that the language is regular. In general, a query algorithm for a language does not imply an query algorithm for . We have a counterexample: consider the language of strings of the form such that all are binary strings of the same length and for some . can be decided in queries by a Grover search. There is a clear reduction from element distinctness to , therefore is at least . In the next section, we will strengthen our induction hypothesis such that queries of this type are possible. Let us conclude this section with a splitting theorem based on Schützenberger’s notion of rank.
Let for monoid element . Then, splits as
Furthermore, for all elements of the union, .
We first verify equality. We have that since
Now, suppose . For any decomposition , we have that Let and . Therefore, and with . Finally, by Proposition 19 we get that . ∎
4.2 algorithm for star-free languages
Recall that our objective is to create an algorithm for language , where is an arbitrary monoid element. We mimic Schützenberger’s proof of Theorem 5 by constructing algorithms for each in the order of the rank of . Implicit in such an argument is a procedure that must convert an efficient query algorithm for into an efficient query algorithm for for .
Notice that for , we have (by definition) that . That is, the prefix of the input string matching is not an arbitrary location in the string, but one of finitely many points in the string where the right ideal strictly decreases. We use this to our benefit in the following key lemma.
Let be a monoid homomorphism. Suppose there exists an membership algorithm for for any such that . Then, there exists an algorithm to test membership in for any and such that and .
Consider a string . The right ideal represents the set of monoid elements we could reach after reading . These right ideals descend as we read more of the string:
If , then there is some prefix in followed by an . By assumption, , so this is a point in the string where the right ideal strictly descends.
Notice that implies , and since we have , we conclude that . In other words, the right ideal descends from something containing (namely ), to something not containing (namely ).
To decide whether belongs to , it suffices to find the longest prefix such that contains . If and , then the string is in , otherwise there is no other possible prefix that could match , so the string is not in .
Define a new language where
This is precisely the language of strings/prefixes that could be extended to strings in . We can decide membership in with queries because implies and hence .
It is also clear that is prefix closed: if then , so as well. The empty prefix is in , and by binary search we can find the longest prefix in . Then, as discussed above, we complete the algorithm by checking whether the prefix is (i) in and (ii) followed by an . If so, then we report , otherwise . ∎
We are now ready to state and prove our main theorem.
For any star-free language , there exists a quantum algorithm which solves membership in with queries and time.
Let for some homomorphism to an aperiodic finite monoid , and . We will show that there is an algorithm for each by induction on the rank of .
We can Grover search for a counterexample in time to decide membership in .
Now suppose is nonzero. Our main tool is Theorem 22, which decomposes into a Boolean combination of languages,
where are as they appear in that theorem statement. We will also make reference to sets from Theorem 22.
To give an algorithm for , it suffices to give an algorithm for each component of this Boolean combination: , , and . Since , , and are finite unions of simpler languages, it suffices to consider each language in the union separately.
The first component is , but we have already done most of the work for in Lemma 24. Recall
where . This gives us an time algorithm for . By symmetry, there also exists an algorithm for . Recall that is a finite set of characters, so membership in is decided by a Grover search for any of those characters.
The last component is , which consists of a union of languages of the form where . That is, and but . We can use Theorem 23 to split into
We hope to apply Lemma 24 to and (in reverse) , then use infix search (i.e., Theorem 18) to try to find a substring in , but first we need to verify that all the preconditions of these theorems are met—namely, that the rank of and are small, and and cause the ideal to descend.
but we know is in and not in , so we have a contradiction from the definition of . Hence, , and by a symmetric argument , so we have query algorithms for and from Lemma 24. It follows that there is an algorithm for as well. ∎
This finishes the main theorem for this section. See Algorithm 1 for pseudocode.