Regular Model Checking Revisited (Technical Report)

by   Anthony W. Lin, et al.

In this contribution we revisit regular model checking, a powerful framework that has been successfully applied for the verification of infinite-state systems, especially parameterized systems (concurrent systems with an arbitrary number of processes). We provide a reformulation of regular model checking with length-preserving transducers in terms of existential second-order theory over automatic structures. We argue that this is a natural formulation that enables us tap into powerful synthesis techniques that have been extensively studied in the software verification community. More precisely, in this formulation the first-order part represents the verification conditions for the desired correctness property (for which we have complete solvers), whereas the existentially quantified second-order variables represent the relations to be synthesized. We show that many interesting correctness properties can be formulated in this way, examples being safety, liveness, bisimilarity, and games. More importantly, we show that this new formulation allows new interesting benchmarks (and old regular model checking benchmarks that were previously believed to be difficult), especially in the domain of parameterized system verification, to be solved.



There are no comments yet.


page 1

page 2

page 3

page 4


Fair Termination for Parameterized Probabilistic Concurrent Systems (Technical Report)

We consider the problem of automatically verifying that a parameterized ...

MSO-Definable Regular Model Checking

Regular Model Checking (RMC) is a symbolic model checking technique wher...

Regular Model Checking Approach to Knowledge Reasoning over Parameterized Systems (technical report)

We present a general framework for modelling and verifying epistemic pro...

Learning to Prove Safety over Parameterised Concurrent Systems (Full Version)

We revisit the classic problem of proving safety over parameterised conc...

Safety Verification of Parameterized Systems under Release-Acquire

We study the safety verification problem for parameterized systems under...

Verification of an industrial asynchronous leader election algorithm using abstractions and parametric model checking

The election of a leader in a network is a challenging task, especially ...

VSync: Push-Button Verification and Optimization for Synchronization Primitives on Weak Memory Models (Technical Report)

This technical report contains material accompanying our work with same ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Verification of infinite-state systems has been an important area of research in the past few decades. In the late 1990s and early 2000s, an important stride advancing the verification of infinite-state systems was made when an elegant, simple, but powerful framework for modelling and verifying infinite-state systems, dubbed

regular model checking (e.g. [1, 25, 12, 2, 3, 27, 46]), was developed.

Regular model checking, broadly construed, is the idea of reasoning about the infinite-state systems using regular languages as symbolic representations. This means that configurations of the infinite systems are encoded as finite words over some finite alphabet , while other important infinite sets (e.g. of initial and final configurations) will be represented as regular languages over . The transition relation of the system is, then, represented a finite-state transducer of some sort.

Example 1

As a simple illustration, we have a unidirectional token passing protocol with processes arranged in a linear array. Here is a parameter, regardless of whose value (so long as it is a positive integer) the correctness property has to hold. This is also one reason why such systems are referred to as parameterized systems. Multiple tokens might exist at any given time, but at most one is held by a process. At each point in time, a process holding a token can pass it to the process to its right. If a process holding a token receives a token from its left neighbor, then it discards one of the two tokens. Each configuration of the system can be encoded as a word over , where (resp. ) denotes that process holds (resp. does not hold) a token. The set of all configurations is, therefore, , i.e., a regular language. Various correctness properties can be mentioned for this system. An example of the safety property is that if the system starts with a configuration in (i.e. with only one token), then it will never visit a configuration in (i.e. with at least two tokens). An example of a liveness property is that it always terminates with configurations in the regular set . ∎

This basic idea of regular model checking was already present in the work of Pnueli et al. [27] and Boigelot and Wolper [46]. The term “regular model checking” was coined by Abdulla et al. [12]. A lot of the initial work in regular model checking focussed on developing scalable algorithms (mostly via acceleration and widening) for verifying safety, while unfortunately going beyond safety (e.g. to liveness) posed a significant challenge; see [3, 44]. It is now 20 years since the publication of the seminal paper [12] on regular model checking. The area of computer-aided verification has undergone some paradigm shifts including the rise of SAT-solvers and SMT-solvers (e.g. see the textbooks [13, 28]), as well synthesis algorithms [5]. In the meantime, regular model checking was also affected by this in some fashion. In 2013 Neider and Jansen [36] proposed an automata synthesis algorithm for verifying safety in regular model checking using SAT-solvers to guide the search of an inductive invariant. This new way of looking at regular model checking has inspired a new class of regular model checking algorithms, which could solve old regular model checking benchmarks that could not be solved automatically by any known automatic techniques (e.g. liveness, even for probabilistic distributed protocols [34, 30]), as well as new correctness properties (e.g. safety games [37] and probabilistic bisimulation with applications to proving anonymity [24]). Despite these recent successes, these techniques are rather ad-hoc, and often difficult to adapt to new correctness properties.


We provide a new and clean reformulation of regular model checking inspired by deductive verification. More precisely, we show how to express RMC as satisfaction of existential second-order logic (ESO) over automatic structures. Among others, this new framework puts virtually all interesting correctness properties (e.g. safety, liveness, safety games, bisimulation, etc.) in regular model checking under one broad umbrella. We provide new automata synthesis algorithms for solving any regular model checking that is expressed in this framework.

In deductive verification, we encode correctness properties of a program as formulas in some (first-order) logic, commonly called verification conditions, and then check the conditions using a theorem prover. This approach provides a clean separation of concerns between generating and checking “correctness proofs,” and underlies several verification methodologies and systems, for instance in deductive verification (with systems like Dafny [29] or Key [4]) or termination checkers (e.g., AProVE [21] or T2 [14]). For practical reasons, the most attractive case is of course the one where all verification conditions can be kept within decidable theories. We propose to use first-order logic over universal automatic structures [9, 10, 8, 15] for the decidable theories expressing the verification conditions. Furthermore, we show that the correctness properties can be shown as satisfactions of ESO formulas over automatic structures, where the second-order variables express the existence of proofs such that the verification conditions are satisfied. Finally, we show that restricting to regular proofs (i.e. proofs that can be expressed by finite automata) is sufficient in practice, and allows us to have powerful verification algorithms that unify the recent successful automata synthesis algorithms [36, 34, 30, 24] for safety, liveness, reachability games, and other interesting correctness properties.


Section 2 contains preliminaries. We provide our reformulation of regular model checking in terms of existential second-order logic (ESO) over automatic structures in Section 3. We provide a synthesis algorithm for solving formulas in ESO over automatic structures in Section 4. We conclude in Section 5 with research challenges.

2 Preliminaries

2.1 Automata

We assume basic familiarity with finite automata (e.g. see [40]). We use to denote a finite alphabet. In this paper, we exclusively deal with automata over finite words, but the framework and techniques extend to other classes of structures (e.g. trees) and finite automata (e.g. finite tree automata). An automaton over is a tuple , where is a finite set of states, is the transition relation, is the initial state, and is the set of final states. In this way, our automata are by default assumed to be non-deterministic. The notion of runs of on an input word is standard (e.g. see, i.e., a function so that , , and the transition relation is respected. We use to denote the language (i.e. subset of ) accepted by .

2.2 Regular Model Checking

Regular Model Checking (RMC) is a generic symbolic framework for modelling and verifying infinite-state [25, 12, 3]. The basic principle behind the framework is to use finite automata to represent an infinite-state system, and witnesses for a correctness property. For example, an infinite set of states can be represented as a regular language over . How do we represent a transition relation ? In the basic setting (as described in the seminal papers [25, 12]), we can use length-preserving transducers for representing . A length-preserving transducer is simply an automaton over the alphabet . Given an input tuple , an acceptance of by is defined to be the acceptance of the “product” word by the automaton . In this way, a transition relation can now be represented by an automaton.

In this paper, we will deal mostly with systems whose transition relations can be represented by length-preserving transducers. This is not a problem in practice because this is already applicable for a lot of applications, including reasoning about distributed algorithms (arguably the most important class of applications of RMC), where the number of processes is typically fixed at runtime. That said, we will show how to easily extend the definition to non-length-preserving relations (called automatic relations [9, 10, 8, 15]

) since they are needed in our decidable logic. This is done by the standard trick of padding the shorter strings a special padding symbol. More precisely, given two words

and , we define the convolution to be the word (where and ) such that , for all (for , ), and for all (for all , ). For example, is the word . Whether is accepted by now is synonymous with acceptance of by . In this way, transition relations that relate words of different lengths can still be represented using finite automata.

2.3 Weakly-Finite Systems

In this paper, we will restrict ourselves to transition systems that systems whose domain is a regular subset of , and whose transition relations can be described by length-preserving transducers. That is, since is finite, from any given configuration of the system there is a finite number of configurations that are reachable from (in fact, there is at most reachable configurations). Such transition systems (which can be infinite, but where the number of reachable configurations from any given configuration is finite) are typically referred to as weakly-finite systems [19]. As we previously mentioned, this restriction is not a big problem in practice since many practical examples (including those from distributed algorithms) can be captured. The restriction is, however, useful when developing a clean framework that is unencumbered by a lot of extra assumptions, and at the same time captures a a lot of interesting correctness properties.

2.4 Existential Second-Order Logic

In this paper, we will use Existential Second-Order Logic (ESO) to reformulate RMC. Second-order Logic (e.g. see [31]) is an extension of first-order logic by quantifications over relations. Let be a vocabulary consisting of relations (i.e. relational vocabulary). A relational variable will be denoted by capital letters , etc. Each relational variable has an arity . ESO over is simply the fragment of second-order logic over consisting of formulas of the form

where is a first-order logic over the vocabulary , where is a relation symbol of arity . Given a structure over and an ESO formula (as above), checking whether amounts to finding relations over the domain of such that is satisfied (with the standard definition of first-order logic); in other words, extending to a structure  over such that .

3 RMC as ESO Satisfaction over Automatic Structures

As we previously described, our new reformulation of RMC is inspired by deductive verification, which provides a separation between generating and checking correctness proofs. The verification conditions should be describable in decidable logical theories. As a concrete example, suppose we want to prove a safety property for a program . Then, a correctness proof would be a finitely-representable inductive invariant that contains all initial states of , and is disjoint from the set of all bad states of . The termination of a program can similarly be proven by finding a well-founded relation  that subsumes the transition relation of a program. In both cases, a correctness proof corresponds to a solution for existentially quantified second-order variables that encode the desired correctness property; in the spirit of Section 2.4, the correctness of a proof can be verified by evaluating just the first-order part  of a formula. The generation of the candidate proofs will then be taken care of separately, which we will talk about in the next section. Suffice to say for now that the counterexample guided inductive synthesis (CEGIS) framework [5] would be appropriate for the proof generation. In this section, we provide a reformulation of RMC in the aforementioned framework for software verification.

3.1 Automatic Structures

What is the right decidable theory to capture regular model checking? We venture that the answer is the first-order theory of an automatic structure [9, 10, 8, 15]. An automatic structure over the vocabulary consisting of relations with arities is a structure whose universe is the set of all strings over some finite alphabet , and where each relation is regular, i.e., the set is regular. The following well-known closure and algorithmic property is what makes the theory of automatic structures appealing.

Theorem 3.1

There is an algorithm which, given a first-order formula and an automatic structure over the vocabulary , computes a finite automaton for consisting of tuples of words, such that .

The algorithm is a standard automata construction (e.g. see [41] for details), which is in fact so similar to the standard automata construction from the weak second-order theory of one successor [22]. [In fact, first-order logic over automatic structures can be encoded to (and vice versa) to weak second-order theory of one successor via the so-called finite set interpretations [18], which would allow us to use tools like MONA to check first-order formulas over automatic structures.]

Automatic structures are extremely powerful. We can encode the linear integer arithmetic theory as an automatic structure [15]. In fact, we can even add the predicate (where iff divides and for some natural number ) to , while still preserving decidable. This essentially implies that ESO over automatic structures is undecidable; in fact, this is the case even when formulas are restricted to monadic predicates.

We are now ready to describe our framework for RMC in ESO over automatic structures:

  1. Specification:
    Express the verification problem as a formula

    in ESO over automatic structures.

  2. Specification Checking:
    Search for regular witnesses for that satisfy .

Note that while the specification (Item 1) would provide a complete and faithful encoding of the verification problem, our method for checking the specification (Item 2) restricts to regular proofs. It is expected that this is an incomplete proof rule, i.e., for to be satisfied, it is not sufficient in general to restrict to regular relations. Therefore, two important questions arise. Firstly, how expressive is the framework of regular proofs? Numerous results suggest that the answer is that it is very expressive. On the practical side, many benchmarks (especially from paramterized systems) have indicated this to be the case, e.g., see [36, 17, 34, 30, 37, 3, 44, 38, 24, 33]

. On the theoretical side, this framework is in fact complete for important properties like safety and liveness for many classes of infinite-state systems that can be captured by regular model checking, including pushdown systems, reversal-bounded counter systems, two-dimensional vector addition systems, communication-free Petri nets, and tree-rewrite systems (for the extension to trees), among others, e.g., see

[41, 42, 7, 32, 23, 35]. In addition, the restriction to regular proofs is also attractive since it gives rise to a simple method to enumerate all regular proofs that check . This naive method would not work in practice, but smart enumeration techniques of regular proofs (e.g., using automata learning and CEGIS) are available, which we will discuss in the Section 4.

3.2 Safety

We start with the most straightforward example: safety. We assume that our transition system is represented by a length-preserving system with domain and a transition relation given by a length-preserving transducer. Furthermore, we assume that the system contains two regular languages , representing the set of initial and bad states. As we mentioned earlier in this section, safety amounts to checking the existence of an invariant that contains but is disjoint from . That is, the safety property holds iff there exists a set such that:

  • is an inductive, i.e., for every configuration , if , then .

The above formulation immediately leads to a first-order formula over the vocabulary of . Therefore, the desired ESO formula over the original vocabulary (i.e. ) is

Example 2

Fix . Consider the transition relation generated by the regular expression . Intuitively, nondeterministically picks a substring 10 in an input word and rewrites it to 01. Let and . Observe that there is a regular proof for this safety property: . Note that this is despite the fact that in general is not a regular set.

3.3 Liveness

A second class of properties are liveness properties, for instance checking whether a program is guaranteed to terminate, guaranteed to answer requests eventually, or guaranteed to visit certain states infinitely often. In the context of RMC, liveness has been studied a lot less than safety, and methods sucessful for proving safety usually do not lend themselves to an easy generalisation to liveness.

We consider the case of program termination. As before, we assume that a transition system is defined by a domain , a transition relation , and a set  of initial states. Proving termination amounts to showing that no infinite runs starting from a state in exist; to this end, we can search for a pair  consisting of an inductive invariant and a well-founded ranking relation:

  • ;

  • is inductive (as in Section 3.2);

  • the relation  covers the reachable transitions: ;

  • is transitive: and imply ;

  • is irreflexive: for every .

The last two conditions ensure that is a strict partial order, and therefore is even well-founded on fixed-length subsets  of the domain. All five conditions can easily be expressed by a first-order formula  over the relations . Now, for length-preserving relations , expressing in first-order logic that a transitive relation is well-founded is simple: it is not the case that there are words such that and . This “lasso” shape is owing to the fact that in every finite system every infinite path always leads to one state that is visited infinitely often. In summary, termination of a system is therefore captured by the following ESO formula:

where is the first-order part that encodes the aforementioned verification conditions.

Example 3

We consider here the same example as Example 2, but we instead want to prove termination. It is quite easy to see that every configuration will always lead to a configuration of the form , which is a dead end. Termination of the system can be proven using the trivial inductive invariant , and a lexicographic ranking relation , represented as a transducer with two states and shown in Fig. 1. Using the algorithms proposed in Section 4, this ranking relation can be computed fully automatically in a few milliseconds.

Figure 1: Lexicographic ranking relation for Example 3

3.4 Winning Strategies for Two-Player Games on Infinite Graphs

We only need to slightly modify the ESO formula for program termination, given in the previous section, to reason about the existence of winning strategies in a reachability game. Instead of a single transition relation , for a two-player game we assume that two relations  are given, encoding the possible moves of Player 1 and Player 2, respectively. A reachability game starts in any configuration in the set . The players move in alternation, with Player 2 winning if the game eventually reaches a configuration in , whereas Player 1 wins if the game never enters . The first move in a game is always done by Player 1.

As in the previous section, we formulate the existence of a winning strategy for Player 2 (for any initial configuration in ) in terms of a pair  of relations. The set now represents the possible configurations that Player 1 visits during games, whereas the ranking relation  expresses progress made by Player 2 towards the region .

  • ;

  • is transitive and irreflexive (as in Section 3.3);

  • Player 2 can force the game to progress: for every , and every move of Player 1 with , there is a move  of Player 2 such that and .

It is again easy to see that all conditions can be expressed by a first-order formula over the relations , and the existence of a winning strategy as an ESO formula:

A similar encoding has been used in previous work of the authors to reason about almost-sure termination of parameterised probabilistic systems [34, 30]. In this setting, the two players characterise non-determinism (demonic choice, e.g., the scheduler) and probabilistic choice (angelic choice, e.g., randomisation).

Example 4

We consider a classical take-away game [20] with two players. In the beginning of the game, there are chips on the table. In alternating moves, with Player 1 starting, each player can take 1, 2, or 3 chips from the table. The first player who has no more chips to take loses. It can be observed that Player 2 has a winning strategy whenever the initial number  is a multiple of .

Configurations of this game can be modelled as words , in which the first letter ( or ) indicates the next player to make a move, and the number of s represents the number of chips left. To prove that Player 2 can win whenever , we choose as the initial states, and , i.e., we check whether Player 2 can move first to a configuration in which no chips are left. The transitions of the two players are described by the regular expressions

Figure 2: Set  of reachable configurations of the take-away game in Example 4
Figure 3: Relation  in Example 4

The witnesses proving that Player 2 indeed has a winning strategy are shown in Fig. 3 and Fig. 3, respectively. The ranking relation  in Fig. 3 is similar to the one proving termination in Example 3, and expresses that the number of s is monotonically decreasing. The invariant  in Fig. 3 expresses that Player 2 should move in such a way that the number of chips on the table remains divisible by ; and in combination encode the strategy that Player 2 should follow to win. The witness relations were found by the tool SLRP, presented in [34], in around 3 seconds on an Intel Core i5 computer with 3.2 GHz.

3.5 Isomorphism and Bisimulation

We now describe how we can compare the behaviour of two given systems described by length-preserving transducers. There are many natural notions of “similarity”, but we target isomorphism, bisimulation, and probabilistic bisimulation (or variants thereof). All of these are important properties since they show indistinguishability of two systems, which are applicable to proving anonymity, e.g., in the case of the Dining Cryptographer Protocol [16]. Isomorphism can also be used to detect symmetries in systems, which can be used to speed up regular model checking [33]. Here, we only describe how to express isomorphism of two systems. Encoding bisimulation and probabilistic bisimulation for parameterized systems is a bit trickier since we will need infinitely many action labels (i.e. to distinguish the action of the th process), but this can also be encoded in our framework; see the first-order proof rules over automatic structures in the recent paper [24].

We are given two systems , , whose domains and whose transition relations and are described by transducers. We would like to show that and are the same up to isomorphism. The desired ESO formula is of the form

where says that describes the desired isomorphism between and . To this end, we will first need to say that is a bijective function. This can easily be described in first-order logic over the vocabulary . For example, is a function can be described as

Note that is describable by a simple transducer, so this is a valid first-order formula over automatic structures. We then need to add some more conjuncts in saying that is a homomorphism and its reverse is also a homomorphism. This is also easily described in first-order logic, e.g.,

says that is a homomorphism.

Example 5

We describe the Dining Cryptographer example [16], and how to prove this by reasoning about isomorphism. [There is a cleaner way to do this using probabilistic bisimulation [24].] In this protocol there are cryptographers sitting at a round table. The cryptographers knew that the dinner was paid by NSA, or exactly one of the cryptographers at the table. The protocol aims to determine which one of these without revealing the identity of the cryptographer who pays. The th cryptographer is in state (resp.

) if he did not pay for the dinner. Any two neighbouring cryptographers keep a private fair coin (that is only visible to themselves). There is a transition to toss any of the coins (in this case, probability is replaced by non-determinism). Let us use

to denote the value of the coin that is shared by the th and st cryptographers. If the th cryptographer paid, it will announce (here is the XOR operator); otherwise, it will announce the negation of this. We call the value announced by the th cryptographer . At the end, we take the XOR of , which is 0 iff none of the cryptographers paid.

This example can easily be encoded by a length-preserving transducer . For example, the domain is a word of the form

where and . Here, the symbol ’?’ is used to denote that the value of is not yet determined. In the case of , the symbol ’?’ means that it is not yet announced. Although it is a bit cumbersome, it is possible to describe the dynamics of the system by a transducer. The desired property to prove then is whether there is an isomorphism between and for every , i.e., that the first cryptographer, who did not pay, cannot distinguish if it were the second or the third cryptographer who paid. There is a transducer describing the isomorphism that maps to , which is done by inverting the value of .

4 How to Satisfy Existential Second-Order Quantifiers

We have given several examples for the Specification step in Section 3.1, but the question remains how one can solve the Specification Checking step and automatically compute witnesses  for the existential quantifiers in a formula . We present two solutions for this problem, two approaches to automata learning whose respective applicability depends on the shape of the matrix . Both methods have in previous work proven to be useful for analysing complex parameterised systems. On the one hand, it has been shown that automata learning is competitive with tailor-made algorithms, for instance with Abstract Regular Model Checking (ARMC) [11], for safety proofs [43, 17]; on the other hand, automata learning is general and can help to automate the verification of properties for which no bespoke approaches exist, for instance liveness properties or properties of games.

4.1 Active Automata Learning

The more efficient, though also more restricted approach is to use classical automata learning, for instance Angluin’s  algorithm [6], or one of its variants (e.g., [39, 26]), to compute witnesses for . In all those algorithms, a learner attempts to reconstruct a regular language  known to the teacher by repeatedly asking two kinds of queries: membership, i.e., whether a word  should be in ; and equivalence, i.e., whether coincides with some candidate language  constructed by the learner. When equivalence fails, the teacher provides a positive or negative counterexample, which is a word in the symmetric difference between and .

This leads to the question how membership and equivalence can be implemented in the ESO setting, in order to let a learner search for . In general, it is clearly not possible to answer membership queries about , since there can be many choices of relations satisfying , some of which might contain a word, while others do not; in other words, the relations are in general not uniquely determined by . We need to make additional assumptions.

As the simplest case, active automata learning can be used if two properties are satisfied: (i) the relations are uniquely defined by and the structure ; and (ii) for any , the sub-relations  can be effectively computed from and . Given those two assumptions, automata learning can be used to approximate the genuine solution  up to any length bound , resulting in a candidate solution . It can also be verified whether coincide with the genuine solution by evaluating , i.e., by checking whether . If this check succeeds, learning has been successful; if it fails, the bound  can be increased and a better approximation computed. Whenever the unique solution  exists and is regular, this algorithm is guaranteed to terminate and produce a correct answer.

What can be done when the relations  are not unique? Depending on the shape of , a simple trick can be applicable, namely the learning algorithm can be generalised to search for a unique smallest or unique largest solution (in the set-theoretic sense) of , provided those solutions exist. This is the case in particular when can be rephrased as a fixed-point equation

for some monotonic function ; for instance, if can be written as a set of Horn clauses. We still require property (ii), however, and need to be able to compute sub-relations of the smallest or largest solution to answer membership queries.

In order to check whether a solution candidate  is correct (for equivalence queries), we can as before evaluate , and terminate the search if is satisfied. In general, however, there is no way to verify that is indeed the smallest solution of , which affects termination and completeness in a somewhat subtle way. If the smallest solution of exists and is regular, then termination of the overall search is guaranteed, and the produced solution will indeed satisfy ; but what is found is not necessarily the smallest solution of .

This method has been implemented in particular for proving safety [43, 17] and probabilistic bisimulations [24] of length-preserving systems, cases in which

is naturally monotonic, and where active learning methods are able to compute witnesses with hundreds (sometimes 1000s) of states within minutes.

4.2 SAT-Based Automata Learning

-style learning is not applicable if the matrix of an ESO formula  does not have a smallest or largest solution, or if those solutions cannot be computed up to bounds .111Which is usually the case when the transducers defining a system are not length-preserving. An example of such non-monotonic formulas are the formulas characterising winning strategies of reachability games presented in Section 3.4; indeed, multiple minimal but incomparable strategies can exist to win a game, so that in general there is no smallest solution. A more general learning strategy to solve ESO formulas in the non-monotonic case is SAT-based learning, i.e., using a Boolean encoding of finite-state automata to systematically search for solutions of  [36, 34, 45]. SAT-based learning is a more general solution than active automata learning for constructing ESO proofs, although experiments show that it is also a lot slower for simpler analysis tasks like safety proofs [17].

We outline how a SAT solver can be used to construct deterministic finite-state automata (DFAs), following the encoding used in [34]. The encoding assumes that a finite alphabet  and the number  of states of the automaton are fixed. The states of the automaton are assumed to be , and without loss of generality is the unique initial state. The Boolean decision variables of the encoding are (i) variables  that determine which of the states are accepting; and (ii) variables  that determine, for any letter  and states , whether the automaton has a transition from to with label .

A number of Boolean constraints are then asserted to ensure that only well-formed DFAs are considered: determinism; reachability of every automaton state from the initial state; reachability of an accepting state from every state; and symmetry-breaking constraints.

Next, the formula  can be translated to Boolean constraints over the decision variables. This translation can be done eagerly for all conjuncts of that can be represented succinctly:

  • a positive atom  in which the length of is bounded can be translated to constraints that assert the existence of a run accepting ;

  • a negative atom  can similarly be encoded as a run ending in a non-accepting state, thanks to the determinism of the automaton;

  • for automata representing binary relations , several universally quantified formulas can be encoded as a polynomial-size Boolean constraint as well, including:

    Functional consistency:

Other conjuncts in can be encoded lazily with the help of a refinement loop, resembling the classical CEGAR approach. The SAT solver is first queried to produce a candidate automaton  that satisfies a partial encoding of . It is then checked whether the candidate  indeed satisfies ; if this is the case, SAT-based learning has been successful and terminates; otherwise, a blocking constraint is asserted that rules out the candidate  in subsequent queries.

It should be noted that this approach can in principle be implemented for any formula , since it is always possible to generate a naïve blocking constraint that blocks exactly the observed assignment of the variables , i.e., that exactly matches the automaton . It is well-known in Satisfiability Modulo Theories, however, that good blocking constraints are those which eliminate as many similar candidate solutions as possible, and need to be designed carefully and specifically for a theory (or, in our case, based on the shape of ).

Several implementations of SAT-based learning have been described in the literature, for instance for computing inductive invariants [36], synthesising state machines satisfying given properties [45], computing symmetries of parameterised systems [33], and for solving various kinds of games [34]. Experiments show that the automata that can be computed using SAT-based learning tend to be several order of magnitudes smaller than with active automata learning methods (typically, at most 10–20 states), but that SAT-based learning can solve a more general class of synthesis problems as well.

4.3 Stratification of ESO Formulas

The two approaches to compute regular languages can sometimes be combined. For instance, in [34] active automata learning is used to approximate the reachable configurations of a two-player game (in the sense of computing an inductive invariant), whereas SAT-based learning is used to compute winning strategies; the results of the two procedures in combination represent a solution of an ESO formula  with two second-order quantifiers.

More generally, since the active automata learning approach in Section 4.1 is able to compute smallest or greatest solutions of formulas, a combined approach is possible when the matrix  of an ESO formula  can be stratified. Suppose can be decomposed into  in such a way that (i)  has a unique smallest solution in , and (ii)  contains only in literals 

in negative positions, i.e., underneath an odd number of negations. In this situation, one can clearly proceed by first computing a smallest relation 

satisfying , using the methods in Section 4.1, and then solve the remaining formula given this fixed solution for . The case where has a greatest solution, and contains only positively can be handled similarly.

5 Conclusions

In this paper, we have proposed existential second-order logic (ESO) over automatic structures as an umbrella covering a large number of regular model checking tasks. We have shown that many important correctness properties can be represented elegantly in ESO, and developed unified algorithms that can be applied to any correctness property captured using ESO. Experiments showing the practicality of this approach have been presented in several recent publications, including computation of inductive invariants [43, 36, 17], of symmetries and simulation relations of parameterised systems [33], of winning strategies of games [34, 30], and of probabilistic bisimulations [24].

Several challenges remain. One bottleneck that has been identified in several of the studies is the size of alphabets necessary to model systems, to which the algorithms presented in Section 4 are very sensitive. This indicates that some analysis tasks require more compact or more expressive automata representations, for instance symbolic automata, and generalised learning methods; or abstraction to reduce the size of alphabets. Another less-than-satisfactory point is the handling of well-foundedness in the ESO framework. When restricting the class of considered systems to weakly finite systems, as done here, well-foundedness of relations can be replaced by acyclicity, which can be expressed easily in ESO (as shown in Section 3.3). It is not obvious, however, in which way ESO should be extended to also handle systems that are not weakly finite, without sacrificing the elegance of the approach.


We thank our numerous collaborators in our work on regular model checking that led to this work including, Parosh Abdulla, Yu-Fang Chen, Lukas Holik, Chih-Duo Hong, Bengt Jonsson, Ondrej Lengal, Leonid Libkin, Rupak Majumdar, and Tomas Vojnar. This research was sponsored in part by the ERC Starting Grant 759969 (AV-SMP), the Swedish Research Council (VR) under grant 2018-04727, and by the Swedish Foundation for Strategic Research (SSF) under the project WebSec (Ref. RIT17-0011).


  • [1] P. A. Abdulla, A. Bouajjani, B. Jonsson, and M. Nilsson. Handling global conditions in parameterized system verification. In Computer Aided Verification, 11th International Conference, CAV ’99, Trento, Italy, July 6-10, 1999, Proceedings, pages 134–145, 1999.
  • [2] P. A. Abdulla, B. Jonsson, P. Mahata, and J. d’Orso. Regular tree model checking. In CAV, pages 555–568, 2002.
  • [3] P. A. Abdulla, B. Jonsson, M. Nilsson, and M. Saksena. A survey of regular model checking. In CONCUR, pages 35–48, 2004.
  • [4] W. Ahrendt, B. Beckert, R. Bubel, R. Hähnle, P. H. Schmitt, and M. Ulbrich, editors. Deductive Software Verification - The KeY Book - From Theory to Practice, volume 10001 of Lecture Notes in Computer Science. Springer, 2016.
  • [5] R. Alur, R. Bodík, G. Juniwal, M. M. K. Martin, M. Raghothaman, S. A. Seshia, R. Singh, A. Solar-Lezama, E. Torlak, and A. Udupa. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, pages 1–8, 2013.
  • [6] D. Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, Nov. 1987.
  • [7] S. Bardin, A. Finkel, J. Leroux, and P. Schnoebelen. Flat acceleration in symbolic model checking. In ATVA, pages 474–488, 2005.
  • [8] M. Benedikt, L. Libkin, T. Schwentick, and L. Segoufin. Definable relations and first-order query languages over strings. J. ACM, 50(5):694–751, 2003.
  • [9] A. Blumensath and E. Grädel. Automatic structures. In Logic in Computer Science, 2000. Proceedings. 15th Annual IEEE Symposium on, pages 51–62. IEEE, 2000.
  • [10] A. Blumensath and E. Grädel. Finite presentations of infinite structures: Automata and interpretations. Theory of Computing Systems, 37(6):641–674, 2004.
  • [11] A. Bouajjani, P. Habermehl, and T. Vojnar. Abstract regular model checking. In CAV’04, pages 372–386.
  • [12] A. Bouajjani, B. Jonsson, M. Nilsson, and T. Touili. Regular model checking. In Computer Aided Verification, 12th International Conference, CAV 2000, Chicago, IL, USA, July 15-19, 2000, Proceedings, pages 403–418, 2000.
  • [13] A. R. Bradley and Z. Manna. The Calculus of Computation: Decision Procedures with Applications to Verification. Springer, 1998.
  • [14] M. Brockschmidt, B. Cook, S. Ishtiaq, H. Khlaaf, and N. Piterman. T2: temporal property verification. In M. Chechik and J. Raskin, editors, Tools and Algorithms for the Construction and Analysis of Systems - 22nd International Conference, TACAS 2016, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2016, Eindhoven, The Netherlands, April 2-8, 2016, Proceedings, volume 9636 of Lecture Notes in Computer Science, pages 387–393. Springer, 2016.
  • [15] V. Bruyere, G. Hansel, C. Michaux, and R. Villemaire. Logic and -recognizable sets of integers. Bull. Belg. Math. Soc., 1:191–238, 1994.
  • [16] D. Chaum. The dining cryptographers problem: Unconditional sender and recipient untraceability. Journal of cryptology, 1(1):65–75, 1988.
  • [17] Y. Chen, C. Hong, A. W. Lin, and P. Rümmer. Learning to prove safety over parameterised concurrent systems. In 2017 Formal Methods in Computer Aided Design, FMCAD 2017, Vienna, Austria, October 2-6, 2017, pages 76–83, 2017.
  • [18] T. Colcombet and C. Löding. Transforming structures by set interpretations. Logical Methods in Computer Science, 3(2), 2007.
  • [19] J. Esparza, A. Gaiser, and S. Kiefer. Proving termination of probabilistic programs using patterns. In Computer Aided Verification - 24th International Conference, CAV 2012, Berkeley, CA, USA, July 7-13, 2012 Proceedings, pages 123–138, 2012.
  • [20] T. S. Ferguson. Game Theory. Online Book, second edition edition, 2014.
  • [21] J. Giesl, R. Thiemann, P. Schneider-Kamp, and S. Falke. Automated termination proofs with AProVE. In V. van Oostrom, editor, Rewriting Techniques and Applications, 15th International Conference, RTA 2004, Aachen, Germany, June 3-5, 2004, Proceedings, volume 3091 of Lecture Notes in Computer Science, pages 210–220. Springer, 2004.
  • [22] E. Grädel, W. Thomas, and T. Wilke, editors. Automata, Logics, and Infinite Games: A Guide to Current Research [outcome of a Dagstuhl seminar, February 2001], volume 2500 of Lecture Notes in Computer Science. Springer, 2002.
  • [23] M. Hague, A. W. Lin, and C. L. Ong. Detecting redundant CSS rules in HTML5 applications: a tree rewriting approach. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, part of SPLASH 2015, Pittsburgh, PA, USA, October 25-30, 2015, pages 1–19, 2015.
  • [24] C. Hong, A. W. Lin, R. Majumdar, and P. Rümmer. Probabilistic bisimulation for parameterized systems - (with applications to verifying anonymous protocols). In Computer Aided Verification - 31st International Conference, CAV 2019, New York City, NY, USA, July 15-18, 2019, Proceedings, Part I, pages 455–474, 2019.
  • [25] B. Jonsson and M. Nilsson. Transitive closures of regular relations for verifying infinite-state systems. In Tools and Algorithms for Construction and Analysis of Systems, 6th International Conference, TACAS 2000, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS 2000, Berlin, Germany, March 25 - April 2, 2000, Proceedings, pages 220–234, 2000.
  • [26] M. J. Kearns and U. V. Vazirani.

    An Introduction to Computational Learning Theory

    MIT press, 1994.
  • [27] Y. Kesten, O. Maler, M. Marcus, A. Pnueli, and E. Shahar. Symbolic model checking with rich assertional languages. In Computer Aided Verification, 9th International Conference, CAV ’97, Haifa, Israel, June 22-25, 1997, Proceedings, pages 424–435, 1997.
  • [28] D. Kroening and O. Strichman. Decision Procedures: An Algorithmic Point of View. Springer Publishing Company, Incorporated, 1 edition, 2008.
  • [29] K. R. M. Leino. Dafny: An automatic program verifier for functional correctness. In E. M. Clarke and A. Voronkov, editors,

    Logic for Programming, Artificial Intelligence, and Reasoning - 16th International Conference, LPAR-16, Dakar, Senegal, April 25-May 1, 2010, Revised Selected Papers

    , volume 6355 of Lecture Notes in Computer Science, pages 348–370. Springer, 2010.
  • [30] O. Lengál, A. W. Lin, R. Majumdar, and P. Rümmer. Fair termination for parameterized probabilistic concurrent systems. In TACAS, pages 499–517, 2017.
  • [31] L. Libkin. Elements of Finite Model Theory. Springer, 2004.
  • [32] A. W. Lin. Accelerating tree-automatic relations. In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2012, December 15-17, 2012, Hyderabad, India, pages 313–324, 2012.
  • [33] A. W. Lin, T. K. Nguyen, P. Rümmer, and J. Sun. Regular symmetry patterns. In Verification, Model Checking, and Abstract Interpretation - 17th International Conference, VMCAI 2016, St. Petersburg, FL, USA, January 17-19, 2016. Proceedings, pages 455–475, 2016.
  • [34] A. W. Lin and P. Rümmer. Liveness of randomised parameterised systems under arbitrary schedulers. In CAV’16 (2), volume 9779 of LNCS, pages 112–133. Springer, 2016.
  • [35] C. Löding and A. Spelten. Transition graphs of rewriting systems over unranked trees. In Mathematical Foundations of Computer Science 2007, 32nd International Symposium, MFCS 2007, Ceský Krumlov, Czech Republic, August 26-31, 2007, Proceedings, pages 67–77, 2007.
  • [36] D. Neider and N. Jansen. Regular model checking using solver technologies and automata learning. In NASA Formal Methods, 5th International Symposium, NFM 2013, Moffett Field, CA, USA, May 14-16, 2013. Proceedings, pages 16–31, 2013.
  • [37] D. Neider and U. Topcu. An automaton learning approach to solving safety games over infinite graphs. In TACAS, pages 204–221, 2016.
  • [38] M. Nilsson. Regular Model Checking. PhD thesis, Uppsala Universitet, 2005.
  • [39] R. L. Rivest and R. E. Schapire. Inference of finite automata using homing sequences. Inf. Comput., 103(2):299–347, 1993.
  • [40] M. Sipser. Introduction to the Theory of Computation. PWS Publishing Company, 1997.
  • [41] A. W. To. Model Checking Infinite-State Systems: Generic and Specific Approaches. PhD thesis, School of Informatics, University of Edinburgh, 2010.
  • [42] A. W. To and L. Libkin. Algorithmic metatheorems for decidable LTL model checking over infinite systems. In Foundations of Software Science and Computational Structures, 13th International Conference, FOSSACS 2010, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2010, Paphos, Cyprus, March 20-28, 2010. Proceedings, pages 221–236, 2010.
  • [43] A. Vardhan, K. Sen, M. Viswanathan, and G. Agha. Learning to verify safety properties. In ICFME’04, pages 274–289.
  • [44] T. Vojnar. Cut-offs and automata in formal verification of infinite-state systems, 2007. Habilitation Thesis, Faculty of Information Technology, Brno University of Technology.
  • [45] N. Walkinshaw, R. Taylor, and J. Derrick. Inferring extended finite state machine models from software executions. Empirical Software Engineering, 21(3):811–853, 2016.
  • [46] P. Wolper and B. Boigelot. Verifying systems with infinite but regular state spaces. In Computer Aided Verification, 10th International Conference, CAV ’98, Vancouver, BC, Canada, June 28 - July 2, 1998, Proceedings, pages 88–97, 1998.