Walnut is a software package that implements a mechanical decision procedure for deciding certain combinatorial properties of some special words referred to as automatic words or automatic sequences. To learn more about automatic words and their applications, see [Allouche&Shallit:book]. To learn about decision procedures for automatic words, see Schaeffer’s Master’s thesis [Schaeffer:thesis] and the survey paper [Shallit:survey]. To read more about decidable properties of automatic words, refer to [Charlier&Rampersad&Shallit:enumeration]. To read about another software package that provided a similar mechanical decision procedure for automatic words, and was developed before Walnut, read Goc’s Master’s thesis [Goc:thesis]. To see applications of Walnut, refer to [paperfolding, tribonacci, fib1, fib2, fib3, balanced].
The aim of this article is to introduce Walnut and explain its core features. This article consists of four parts: basics, syntax, implementation, and the Walnut guide. In the first part, Section 2, we establish the basic notation and concepts. We go over words, automata, number systems, automatic words, and Presburger arithmetic. We learn what it means for an automaton to accept a predicate. We also learn how to automatically decide properties of automatic words.
The second part, Section 3, talks about the building blocks of predicates: constants, variables, operators, and different types of expressions. The semantics of predicates in Presburger arithmetic are well-known and are not explained, whereas semantic rules for calling and indexing, with which we extend the Presburger arithmetic to include automatic words, are explained in detail.
The third part, Sections 4 and 5, explains the decision procedure implemented in Walnut. The cross product of two automata, which is behind the construction of automata for all binary logical operators, is introduced. Building on that, we see how to construct automata for predicates from automata for subpredicates. In Section 5, we talk about two types of automata that do not appear often in Walnut, but are nevertheless important to understand.
The fourth and last part, Sections 6–8, starts with Walnut’s installation and goes over all of its commands, i.e., exit, eval, def, reg, and load. In Section 8, we learn how to manually define automata in text files. We also learn how to define new number systems.
If you are already familiar with the objects described in the first sentence of this introduction, you can skip Section 2 and come back to it only as a reference. For a more comprehensive treatment of the theory behind decision procedures for automatic words refer to [Schaeffer:thesis, Shallit:survey, Charlier&Rampersad&Shallit:enumeration].
Since this article is more about Walnut than the theory behind it, when we explain the latter, we use Walnut’s notation as opposed to the more familiar mathematical notation. For example, we use and for conjunction and universal quantifier as opposed to and of mathematical logic 111Users enter logical predicates in a terminal when they use Walnut. We find that entering latex-like commands in the terminal, e.g., \forall, does not improve the readability.. As another example, when we define structures such as number systems or objects such as automatic words, we give the definitions that are closer to Walnut’s capabilities than the most general theoretical ones possible. This will help the reader make a smoother transition from the theory to its application in Walnut.
You can download Walnut from Jeffrey Shallit’s website, or alternatively from GitHub. Walnut is written in Java and is open source. It is licensed under GNU General Public License. We would appreciate it if users cite this article in their publications. For automata minimization and converting regular expressions to automata, Walnut relies on the automata library in [dk.brics]. We would greatly appreciate it if users report bugs to firstname.lastname@example.org. The author would like to thank Jeffrey Shallit for revising this article.
2.1 Words and Automata
A word for a finite, infinite, or a possibly empty subset of natural numbers , is a sequence of symbols over a finite set called an alphabet. The set usually equals or for some . The set of finite and infinite words over the alphabet are denoted by and , respectively. The empty word is denoted by . For the finite word , the length , is defined and equals . We let denote the set of all words over of length . A subword (sometimes called “factor” in the literature) is a finite and contiguous subsequence of a word. The subword of starting at position of length is denoted by . Many interesting properties of words can be expressed in terms of their subwords. For example, the property of having two equal and adjacent subwords, referred to as a square, is discussed in numerous papers in the area of combinatorics on words. The product of two words and , denoted by , is the result of concatenating by .
There are cases where our words are defined over alphabets consisting of tuples of symbols, so let us fix our notation regarding these words. For a word over an alphabet , we let the projection map for denote the word over , obtained from by looking at the ’th coordinates, i.e., words are uniquely defined by
For example, for over we have and .
The reader is probably familiar with the notions of deterministic and nondeterministic finite state automata. In Walnut, an automatonwith inputs (input tapes), is an -tuple , where is the (finite) set of states, is the initial state, is the set of final states, is the transition function, and is the alphabet of the ’th input (tape). The automaton’s alphabet is defined to be the cross product , and the notions of accepting a word or a language over this alphabet is defined as usual. A nondeterministic automaton is defined similarly, except that the transition function is defined by . In Walnut and throughout this article, the are finite subsets of integers .
Two automata are equal (isomorphic) if their underlying graphs are isomorphic. Two automata are equivalent if they accept the same language. There exists a determinization algorithm that converts a nondeterministic automaton to an equivalent deterministic automaton. There exists a minimization algorithm that converts an automaton to an equivalent automaton with the least number of states (which is unique up to isomorphism). It is known that extending the automata model by allowing multiple initial states (similar to how there can be multiple final states) does not add to the model’s expressiveness.
Next we extend the notion of accepting languages to relations, since the latter is more natural in Walnut:
Definition 1 (relations computed by automata).
The relation computed/accepted by is defined by
Since for every word , the words are all of the same length, the relation accepted by an automaton is consisted of tuples of the words of the same length, i.e., we have
For example, the language accepted by the following automaton is , whereas the relation accepted is :
In other words, the automaton accepts tuples where and are representations of the the same length, in the most-significant-digit-first binary system, of natural numbers and respectively. On the other hand, referring to the words in that are accepted by this automaton is not very descriptive. That is why, in this article, we prefer the relation (tuple) terminology over the language (word) terminology.
In almost all depictions of the underlying graphs of automata, such as the one in Figure 2.1, when a transition is not specified, it is assumed to be a transition to a dead state. In Walnut we do not store transitions to the dead state. Adding the dead state and all implicit transitions to it, is called totalizing an automaton.
An automaton with output is a tuple where are as before, the set is the output alphabet, and, instead of a set of final states, we have a map . The symbol is called the output of the state . An automaton with output can be thought of as an automaton that reads a word over and outputs whatever is the last state’s output. In Walnut, the output alphabet is a finite subset of integers. We can think of ordinary automata as a special case of automata with output by letting the set of final states to be . This is indeed how ordinary automata are stored in Walnut.
In the next section, we learn how to add more structure to alphabets by defining number systems. As we saw in the example, the automaton in Figure 2.1
accepts binary representations of numbers. In a moment we will extend our definition of automata to, where the are number systems and concealed in them are alphabets among other things.
2.2 Number Systems
In any course on theory of computation, it is customary to talk about the representations of the objects an algorithm/Turing machine takes as inputs. At the core of Walnut are automata taking natural numbers as inputs, and doing various computations on them, so fixing a representation for natural numbers is essential. We could limit ourselves to binary representations. However, there are many interesting automata accepting representations in number systems other than the binary one. So we are going to define, in general terms, the concept of a number system. Walnut allows number systems to be defined and used (with a few restrictions to the general definition below).
Definition 2 (number systems).
A number system is a -tuple of alphabet , language of valid representations containing and at least one of or , and decoding function that assigns integers to every word in and for which is usually written as . The decoding function has the following additional properties:
if and only if
For all , either and for all , or and for all . The former is called an number system and the latter is called an number system222 and are short for most-significant-digit-first and least-significant-digit-first, respectively. However, it should not be taken literally in this definition, as one could define number systems (in the sense defined here), with no direct correspondence to the notion of most-significant-digit-first representation..
For all positive , there exists for which and if is or if is . The word , if unique, is called the canonical encoding of in , and is sometimes denoted by . We let .
The addition relation is defined such that if and only if are of the same length and . The equality relation is defined such that if and only if and are of the same length and . The less than relation is defined as for which if and only if and are of the same length and . We adopt the in-order notation for , , and , i.e., we write , , and as opposed to the more cumbersome , , and respectively. It follows from the definition that for all , the set of representations of in , defined by is non-empty.
For example, the most-significant-digit binary system, denoted by , is defined by where
e.g., . For , we are very fortunate to have simple automata computing all of its important aspects, namely, valid representations , the addition relation , the equality relation , and the less-than relation . See Figures 2.2,2.3,2.4, and 2.5 respectively.
We can define the least-significant-digit-first binary system, denoted by , in a similar way. In fact, we can define and for all , and for all of them, there are simple automata computing valid representations, addition, equality, and less-than relations. In fact we can define the following:
Definition 3 (number systems in Walnut).
Number systems for which the automata for representations, addition, equality, and less-than exist, and equality is the same as word equality, i.e., if and only if , are exactly the type of number systems one can define and use in Walnut. Note that the alphabet of a number system is restricted to finite subsets of due to the same restriction on automata in Walnut.
In addition to base- number systems, Walnut has a built-in definition for the Fibonacci number system.
The most-significant-digit-first Fibonacci system, denoted by , is defined by where
where is the ’th Fibonacci number given by , and for . For example, . The set of valid representations is exactly the set of binary words avoiding consecutive s. The avid reader might want to verify that is a number system. There are automata computing all major aspects of . For example, here is the automaton accepting 333The automaton accepting has states, which is too big to be represented here.:
In cases, where an automaton’s inputs are representations of integers in some number system, which by far are the most important type of automata in Walnut, we would like to signify these number systems instead of the input alphabets. For example, we might write to mean . It should be understood that in these cases, if for a word input is not a valid representation in , it does not mean that the automaton’s behavior is not defined for . This just means that is, by default, not going to get accepted. The behaviors of both automata and automata with output that are taking representations of numbers in some number systems as inputs are defined for all words (even those not representing numbers in the given number systems).
2.3 Automatic Words
An automatic word is a word in for which there exists a number system and an automaton with output for which reading outputs . In other words, for an automatic word, the symbol at position for all can be effectively computed by running an automaton with output on any single representation of in a number system. As usual we assume is a finite subset of .
The word for which the symbol at position , is the number of s in any binary representation of , modulo , is called the Thue-Morse word. The Thue-Morse word is well-defined since all the infinitely many different binary representations of an integer have the same number of ’s. It is instantly clear that is an automatic word over if one notes the automaton with output in Figure 2.7.
In the introduction, we mentioned that Walnut decides some properties of automatic words. Recall from Section 2.1 that squares are non-empty words of the form . It is easy to see that has square subwords. The following predicate captures this property:
Walnut provides a decision procedure that takes predicates like this and decides whether they are true or false. Walnut does so, by constructing automata for every subpredicate in the predicate above; see Section 2.4 for more details. It starts by constructing from the automaton in Figure 2.7 an automaton for subpredicate . This means (see Section 2.4) that is constructed so that it accepts tuples if and only if and substitutions , , and are satisfying . Walnut then using constructs an automaton for . The automaton takes two inputs representing the two free variables and in . Walnut continues by constructing the automaton for . In the end, Walnut returns true if accepts anything. The fact that ,, and exist is explained in Section 2.4. The details of how Walnut constructs these automata are explained in Section 4. The details of what comprises a valid predicate is explained in Section 3. To see more examples of the properties of the Thue-Morse word and their proofs see Section 7.1.
We can extend the definition of automatic words to higher dimensions. The (-dimensional) automatic word
is an infinite word over for which there exist number systems and an automaton with output
for which reading , such that for all , outputs
2.4 Automata accepting Predicates
In Walnut, we are interested in automaton accepting same-length representations in number systems of integers satisfying some predicate . When this is the case we say that automaton accepts the predicate (or equivalently accepts relation of tuples satisfying ). We already saw a few examples of such automata in Figures 2.1–2.6. From [Buchi:1966], also see [Schaeffer:thesis], and as it will be proved again in Section 4, for predicate in Presburger arithmetic such an automaton always exists. Presburger arithmetic is the first-order theory of natural numbers, in which predicates are consisted of constants (natural numbers), variables over natural numbers, existential quantifiers, universal quantifiers, logical operators (conjunction, disjunction, negation, exclusive disjunction, implication, equivalence), arithmetic operators (addition, subtraction, multiplication and division by constants), and comparison operators (equality, less than, greater than, less than or equal, greater than or equal)444Presburger arithmetic in its formal definition recognizes only a minimal subset of constants and operators: ,,,,,, but it is not difficult to show that all the other objects and operators we mentioned, e.g., multiplication by constants, does not add to the power of Presburger arithmetic and can be derived from that minimal set of objects. See Section 3.2 for more details. One thing to note here is that subtraction exists only when there exists a non-negative number for which ..
You can find the list of all operators in table 3.1. This list has three operators, namely, reverse ` ,indexing , and calling , that are not allowed in Presburger arithmetic. By indexing we mean indexing into an automatic word, e.g., writing things like ; see Section 3.6 for more details. In [Shallit:survey],[Charlier&Rampersad&Shallit:enumeration],[Schaeffer:thesis], and also in Section 4.6 we learn that extending Presburger arithmetic to include indexing is still decidable. In Section 3.7 we learn about calling and in Section 4.5 we learn that it is just a syntactic sugar and does not add to the power of the extended Presburger arithmetic (one that includes indexing into automatic words). We learn about reverse operation in Section 4.3. From here on, by “predicate” we mean a predicate over this extended Presburger arithmetic (extended to include indexing into automatic words) and until we see the proof in Section 4, we accept the fact that there exist automata accepting such predicates.
In Section 3 we formally define what constitutes a predicate, but first let us see a few examples:
We adopt the terminology of free variables from mathematical logic, i.e., a variable that is not bound to a quantifier (quantified). For example has no free variables, and can be regarded as a constant, in this case it is always true.
We have seen that, given a predicate , for any ordering of free variables and for every assignment of number systems to those variables, there exists an automaton accepting such a predicate, i.e., a tuple of same length words is accepted by if and only if the substitutions satisfy .
For example, consider the predicate . The automaton in Figure 2.1 accepts . Furthermore there exists automaton accepting tuples for which and substitutions , and are satisfying . There also exists an automaton accepting tuples for which and substitutions and are satisfying . By definition, both and also accept the predicate .
We would like to annotate predicates so that they contain information on number systems without ambiguity (we will see how shortly). For such an annotated predicate and the ordering on free variables, there exists a unique minimized automaton accepting the predicate. We denote this unique automaton by
The ordering we fix on variables, in Walnut and throughout this article, is the lexicographic ordering on the variables’ name.
The following are examples of annotated predicates555Names for variables, words, and automata in Walnut start with a letter and can contain alphanumerics and underscores. So to distinguish number system annotations in a predicate we use the prefix .:
From the annotated predicate we understand that should all be interpreted in and should be interpreted as . Hence is the automaton accepting representations of and as its first and second inputs respectively. Also from annotation ?msd_fib in it is clear what to expect from automaton .
We can annotate a predicate with multiple number systems, e.g., see Figure 2.9. Here are the rules with which we assign number systems to constants, variables, and operators in a predicate:
If ?S appears outside all parentheses and brackets, then the number system is effective from the place it occurs in the predicate to the end of predicate.
If none of the rules above applies, the number system is assumed to be by default.
It is assumed that the number systems do not contradict each other, i.e., a single variable cannot have two different number systems in one predicate, and all operands of an arithmetic or comparison operator must belong to the same number system.
Note how this automaton fails to accept for any . This is obviously due to the fact that does not have a representation of length in . So we stress again that when we say automaton accepts predicate , we mean that accepts all (tuples of) equal length representations of satisfying . Therefore this example conforms to the definition.
Let us see an example of an automaton having multiple number systems. Figure 2.9 depicts the automaton .
3 Syntax and Semantic of Predicates in Walnut
We mentioned in earlier sections that all input and output alphabets of automata are subsets of integers in Walnut. Specifically for any automatic word , we can assume is an integer.
3.2 Arithmetic and Alphabetic Constants
Arithmetic constants in a predicate are allowed to be natural numbers only. There is, however, another type of constant: the alphabetic constant. Alphabetic constants are useful when referring to symbols at particular positions in automatic words. For example, the predicate that accepts positions for which the automatic word is is written as . In order to draw the distinction between alphabetic and arithmetic constants, we use alphabetic constants with a prefix of . The reason we call these constants alphabetic (as opposed to arithmetic) is due to the fact that Walnut does not allow (and it does not make much sense to allow) predicates that are comparing indexing expressions 3.6 and arithmetic expressions 3.5, e.g., expressions such as is not allowed. As we will see in Section 3.8, the only objects that can be compared with indexing expressions are alphabetic constants and indexing expressions themselves.
A variable’s name must start with a letter and can contain upper- and lower-case alphanumerics and underscores. A variable’s name cannot be E or A.
The full list of operators allowed in predicates can be found in Table 3.1777we prefer this notation to those familiar from mathematical logic, because we want to liken our notation to those of programming languages, as Walnut is ultimately a programming language.. This list has operator precedences. The lower this number is, the higher the precedence is. For example, multiplication by constant has the highest precedence. Parentheses override all precedences. All operators are associative from left to right, except for complement , reverse ` , quantifiers E and A, calling $, and indexing which are all associative from right to left.
|1||multiplication by a constant||and|
|1||division by a constant||but not|
|3||<=||less than or equal|
|3||>=||greater than or equal|
3.5 Arithmetic Expressions
The permissible arithmetic operators are . Equality is not an arithmetic operator. A constant expression is an expression involving only constants and arithmetic operators that evaluates to a natural number, e.g., but not nor . An arithmetic expression is defined recursively in the usual way:
A constant expression is an arithmetic expression, e.g., ,,, but not .
A variable is an arithmetic expression, e.g., ,etc.
For arithmetic expression , the expression is also arithmetic.
For arithmetic expression and both of and are arithmetic expressions.
For variable and constant expression all of ,, and are arithmetic expressions.
For arithmetic expression and constant expression all of , , and are arithmetic expressions.
An arithmetic expression on its own is not a predicate, and it is not meaningful to talk about an automaton accepting an arithmetic expression. For example, talking about an automaton accepting makes sense, while talking about an automaton accepting is not meaningful. Walnut reports an error if the user tries to construct an automaton for an arithmetic expression.
See Section 4.4 to see how Walnut constructs automaton for valid predicates like
where the and are variables or arithmetic constants, are arithmetic operators, and is a comparison operator.
3.6 Indexing Expressions and Their Semantic Rules
For an -dimensional automatic word , an indexing expression is where the are either arithmetic expressions or predicates with one free variable.
An indexing expression on its own is not a valid predicate, and it is not meaningful to talk about automata accepting indexing expressions. Smallest predicates involving indexing expressions are defined in Section 3.8 and they involve comparison operators.
We use indexing expressions to refer to positions indicated by . The semantic of predicates involving indexing expressions can be derived from the following rule:
Definition 4 (semantic rule regarding indexing).
Suppose automatic word , expressions where the are either arithmetic expressions or predicates with one free variable, free variables occurring in the , and an alphabetic constant are given. Predicate is satisfied by substitutions for all , if all of the following hold:
If is an arithmetic expression, then is the value of the when evaluated at for all .
If is a predicate with one free variable, then it is satisfied by substitutions for all . Let equals when is the free variable in .
The symbol equals .
Having this rule, coming up with similar rules for other comparison operators, e.g., , and even predicates involving comparison of two automatic words, e.g., , should be straightforward. Recall that alphabetic constants are ordered just like integers.
3.7 Calling Expressions and Their Semantic Rules
For an automaton with inputs a calling expression is where the are either arithmetic expressions or predicates with one free variable. For such an expression, we say that is called with arguments . A calling expression on its own is a valid predicate, as we will see in Section 3.8.
Definition 5 (semantic rule regarding calling).
Suppose is the automaton for some predicate . Suppose expressions where the are either arithmetic expressions or predicates with one free variable, and free variables occurring in the are given. Predicate is satisfied by substitutions for all , if all of the following hold:
If is an arithmetic expression, then is the value of when evaluated at for all .
If is a predicate with one free variable, then it is satisfied by substitutions for all . Let equals when is the free variable in .
is satisfied by substitutions for all .
3.8 Relative Expressions
Comparison operators are ,!=,,,<=, and >=. A relative expression is any of the following:
An expression where and are arithmetic expressions and is any comparison operator.
An expression where and are indexing expressions and/or alphabetic constants and is any comparison operator.
A calling expression is a relative expression.
We stress that is not a relative expression based on the definition above, since is an indexing expression and is an arithmetic expression. We will see shortly that any relative expression is a predicate. Section 4.4 explains how to construct automata accepting relative expressions.
A predicate is an expression formed from relative expressions and logical operators:
Every relative expression is a predicate.
For every predicate all of , and are predicates.
For every predicate and all of , , , , are predicates.
For every predicate and free variables both of and are predicates.
The semantic rules with which we assign true and false values to predicates defined here can be obtained by adding the semantic rules for indexing and calling to the well-known semantics of first-order logic and Presburger arithmetic.
4 Decision Procedure: Walnut’s Implementation
In this section, we learn about a procedure that takes a predicate and constructs an automaton accepting that predicate. The procedure explained here is what implemented in Walnut, and we shall call it the decision procedure.
For every defined number system, Walnut knows the automata for valid representations, addition, equality, and less-than predicates/relations. Every predicate is ultimately built out of these four predicates using logical operators. So we only need to explain the construction of automata for complex predicates from automata for simpler subpredicates. We start by explaining cross product in Section 4.1, which is the core object when constructing automata for predicates formed from binary logical operators, i.e., ,,,=>,<=>. Then we move on to quantification in Section 4.2, explaining the construction of automata for predicates formed from E and A operators. In Section 4.3, we discuss construction of automata for the complement and reverse ` operators. With these tools at our disposal, we are on the right track to construct automata for complex predicates formed from comparison and arithmetic operators, e.g., ,,,<=,etc. which we explain in Section 4.4.
4.1 Cross Product
Let and be the automaton and respectively. Let us assume that if then . Let where be the union of and and further assume that the are appearing in lexicographic order. Depending on whether or , let denote or respectively. Then the cross product of and denoted by is the tuple
where the transition function is defined to be
for equals or depending on whether or respectively. Note that is not an automaton since a set of final states is not specified. For , let denote the automaton .
For , the automaton accepts predicate . Furthermore, minimizing , we obtain automaton .
Based on the definition for cross product, for to be defined, the same variables in and have to have the same number systems assigned in and . But that is exactly the same condition that needs to hold for number system annotations in to be consistent (in the sense defined in the last bullet in Page 2.4).
Let and such that and where and are all equal and whenever . Let such that or depending on whether or .
We have the following equivalent statements:
There is a path from to in reading .
There is a path from to in reading , and there is a path from to in reading .
accepts and accepts .
is satisfied by substituting for all , and is satisfied by substituting for all .
is satisfied by substituting .
Obviously both the construction of cross product and minimizing automata can be carried out using algorithmic procedures. Therefore Theorem 6 gives us a procedure for constructing the automaton for conjunction.
With proper definitions for , we have similar theorems for when is any other binary logical operator.
Recall that transitions not depicted are transitions to a dead state. The cross product operation is depicted below:
Making a final state, minimizing, and renaming the states, we get the automaton in Figure 4.4.
In this section we learn how to construct an automaton from automaton . Let be the automaton and let be the predicate . We first construct the nondeterministic automaton
from by eliminating the ’th input (coordinate) on all transitions, i.e., letting
For example, letting be the automaton depicted in Figure 4.4, the automaton is depicted as follows:
By the definition of transition function of , i.e., , it is easy to see that if accepts , then accepts
However, there might be where the are equal for all and substitutions for all , satisfies but does not accept . In other words, there are cases where does not accept .
In our example accepts for all , and as it is clear accepts for all . However does not accept , whereas should be accepted by any automaton accepting .
Therefore, we have to do more work on , to get to an automaton for . However as we will see in Lemma 1, the automaton might only miss an insignificant portion of accepted tuples of an automaton accepting . These insignificant tuples missed by are those with leading or trailing zeros. The good news is that with a little bit of technical work, it is possible to revive even these insignificant tuples.
Let ,,, and be as in the discussion above, and suppose is some tuple of same length words. If is satisfied with substitutions for , then there exists a constant and such that for all we have