The theory of Equality with Uninterpreted Functions (EUF) is an important fragment of First Order Logic, defined by a set of functions, equality axioms, and congruence axioms. Its satisfiability problem is decidable. It is a core theory of most SMT solvers, used as a glue (or abstraction) for more complex theories. A closely related notion is that of Uninterpreted Programs (UP), where all basic operations are defined by uninterpreted functions. Feasibility of a UP computation is characterized by satisfiability of its path condition in EUF. UPs provide a natural abstraction layer for reasoning about software. They have been used (sometimes without explicitly being named), in equivalence checking of pipelined microprocesors [DBLP:conf/cav/BurchD94], and equivalence checking of C programs [DBLP:conf/vstte/StrichmanG05]. They also provide the foundations of Global Value Numbering (GVN) optimization in many modern compilers[DBLP:conf/popl/Kildall73, DBLP:conf/sas/GulwaniN04, dblp:conf/vmcai/muller-olmrs05].
Unlike EUF, reachability in UP is undecidable. That is, in the lingua franca of SMT, the satisfiability of Constrained Horn Clauses over EUF is undecidable. Recently, Mathur et al. [DBLP:journals/pacmpl/MathurMV19], have proposed a variant of UPs, called coherent uninterpreted program (CUPs). The precise definition of coherence is rather technical (see Def. 3), but intuitively the program is restricted from depending on arbitrarily deep terms. The key result of [DBLP:journals/pacmpl/MathurMV19] is to show that both reachability of CUPs and deciding whether an UP is coherent are decidable. This makes CUP an interesting infinite state abstraction with a decidable reachability problem.
Unfortunately, as shown by our counterexample in Fig. 4 (and described in Sec. VI), the key construction in [DBLP:journals/pacmpl/MathurMV19] is incorrect. More precisely, the proofs of [DBLP:journals/pacmpl/MathurMV19]
hold only of CUPs restricted to unary functions. In this paper, we address this bug. We provide an alternative (in our view simpler) proof of decidability and extend the results from reachability to arbitrary model checking. The case of non-unary CUPS is much more complex than unary. This is not surprising, since similar complications arise in related results on Uniform Interpolation[DBLP:conf/cilc/GhilardiGK20] and Cover [DBLP:conf/esop/GulwaniM08] for EUF.
Our key result is a logical characterization of CUP. We show that the set of reachable states (i.e., the strongest inductive invariant) of a CUP is definable by an EUF formula, over program variables, with terms of depth at most 1. That is, the most complex term that can appear in the invariant is of the form , where and are program variables, and a function.
This characterization has several important consequences since the number of such bounded depth formulas is finite. Decidability of reachability, for example, follows trivially by enumerating all possible candidate inductive invariants. More importantly from a practical perspective, it leads to an efficient analysis of arbitrary UPs. Take a UP , and check whether it has a safe inductive invariant of bounded terms. Since the number of terms is finite, this can be done by implicit predicate abstraction [DBLP:conf/tacas/CimattiGMT14]. If no invariant is found, and the counterexample is not feasible, then is not a CUP. At this point, the process either terminates, or another verification round is done with predicates over deeper terms. Crucially, this does not require knowing whether is a CUP apriori – a problem that itself is shown in [DBLP:journals/pacmpl/MathurMV19] to be at least PSPACE.
We extend the results further and show that CUPs are bisimilar to a finite state system, showing, in particular, that arbitrary model checking for CUP (not just reachability) is decidable.
Our proofs are structured around a series of abstractions, illustrated in a commuting diagram in Fig. 1. Our key abstraction is the base abstraction . It forgets terms deeper than depth 1, while maintaining all their consequences (by using additional fresh variables). We show that is sound and complete (i.e., preserves all properties) for CUPs (while, sound, but not complete for UP). It is combined with a cover abstraction , that we borrow from [DBLP:conf/esop/GulwaniM08]. The cover abstraction ensures that reachable states are always expressible over program variables. It serves the purpose of existential quantifier elimination, that is not available for EUF. Finally, a renaming abstraction is a technical tool to bound the occurrences of constants in abstract reachable states.
The rest of the paper is structured as follows. We review the necessary background on EUF in Sec. II. We introduce our formalization of UPs and CUPs in Sec. III. Sec. IV presents bisimulation inducing abstractions for UP. Sec. V presents our base abstraction and shows that it induces a bisimulation for CUPs. Sec. VI develops logical characterization for CUPs, presents our decidability results, and shows that a finite state abstraction of CUPs is computable. We conclude the paper in Sec. VII with summary of results and a discussion of open challenges and future work.
We assume that the reader is familiar with the basics of First Order Logic (FOL), and the theory of Equality and Uninterpreted Functions (EUF). We use to denote a FOL signature with constants , functions , and predicates , representing equality and disequality, respectively. A term is a constant or (well-formed) application of a function to terms. A literal is either or , where and are terms. A formula is a Boolean combination of literals. We assume that all formulas are quantifier free unless stated otherwise. We further assume that all formulas are in Negation Normal Form (NNF), so negation is defined as a shorthand: , and . Throughout the paper, we use to indicate a predicate in . For example, means . We write for false, and for true. We do not differentiate between sets of literals and their conjunction . We write for the maximal depth of function applications in a term . We write , , and for the set of all terms, constants, and functions, in , respectively, where is either a formula or a collection of formulas. Finally, we write to mean that the term contains as a subterm.
For a formula , we write if entails , that is every model of is also a model of . For any literal , we write , pronounced is derived from , if is derivable from by the usual EUF proof system .111Shown in Appendix A. By refutational completeness of , is unsatisfiable iff .
Given two EUF formulas and and a set of constants , we say that the formulas are -equivalent, denoted , if, for all quantifier free EUF formulas such that , if and only if .
Let , , , and . Then, but .
While EUF does not admit quantifier elimination, it does admit elimination of constants while preserving quantifier free consequences. Formally, a cover [DBLP:conf/esop/GulwaniM08, DBLP:conf/cade/CalvaneseGGMR19, DBLP:conf/cilc/GhilardiGK20] of an EUF formula w.r.t. a set of constants is an EUF formula such that and . By [DBLP:conf/esop/GulwaniM08], such exists and is unique up to equivalence; we denote it by .
Iii Uninterpreted Programs
An uninterpreted program (UP) is a program in the uninterpreted programming language (UPL). The syntax of UPL is shown in Fig. 2. Let V denote a fixed set of program variables. We use lower case letters in a special font: x, y, etc. to denote individual variables in V. We write for a list of program variables. Function symbols are taken from a fixed set . As in [DBLP:journals/pacmpl/MathurMV19], w.l.o.g., UPL does not allow for Boolean combination of conditionals and relational symbols.
The small step symbolic operational semantics of UPL is defined with respect to a FOL signature by the rules shown in Fig. 3. A program configuration is a triple , where , called a statement, is a UP being executed, is a state mapping program variables to constants in , and , called the path condition, is a EUF formula over . We use to denote the set of all constants that represent current variable assignments in . With abuse of notation, we use and interchangebly. We write to mean .
For a state , we write for a state that is identical to , except that it maps x to . We write to denote that is the value of the expression in state , i.e., the result of substituting each program variable x in with , and replacing functions and predicates with their FOL counterparts. The value of is an FOL term or an FOL formula over . For example, .
Given two configurations and , we write if reduces to using one of the rules in Fig. 3. Note that there is no rule for skip – the program terminates once it gets into a configuration .
Let be a set of initial constants. In the initial state of a program, every variable is mapped to the corresponding initial constant, i.e., .
The operational semantics induces, for an UP , a transition system , where is the set of configurations, is the initial configuration, and . A configuration of is reachable if is reachable from in . We denote the set of all reachable configurations in using . The set of all statements in the semantics of , including the intermediate statements, are called locations of , and are denoted by . We often use and interchangeably.
Our semantics of UPL differs in some respects from the one in [DBLP:journals/pacmpl/MathurMV19]. First, we follow a more traditional small-step operational semantics presentation, by providing semantics rules and the corresponding transition system. However, this does not change the semantics conceptually. More importantly, we ensure that the path condition remains satisfiable in all reachable configurations (by only allowing an assume statement to execute when it results in a satisfiable path condition). We believe this is a more natural choice that is also consistent with what is typically used in other symbolic semantics. UP reachability under our semantics coincides with the definition of [DBLP:journals/pacmpl/MathurMV19].
Definition 1 (UP Reachability)
Given an UP , determine whether there exists a state and a path condition s.t., the configuration is reachable in .
A certificate for unreachability of location , is an inductive assertion map (or an inductive invariant) s.t. .
Definition 2 (Inductive Assertion Map)
Let , be restriction of to . An inductive assertion map of an UP , is a map s.t. (a) , and (b) if , then .
In [DBLP:journals/pacmpl/MathurMV19], a special sub-class of UPs has been introduced with a decidable reachability problem.
Definition 3 (Coherent Uninterpreted Program [DBLP:journals/pacmpl/MathurMV19])
An UP is coherent (CUP) if all of the reachable configurations of satisfy the following two properties:
for any configuration , if there is a term s.t. , then there is s.t. .
- Early assume
for any configuration
, if there is a term s.t. where is a superterm of either or , then, there is s.t. .
Intuitively, memoization ensures that if a term is recomputed, then it is already stored in a program variable; early assumes ensures that whenever an equality between variables is assumed, any of their superterms that was ever computed is still stored in a program variable. Note that unlike the original definition of CUP in [DBLP:journals/pacmpl/MathurMV19], we do not require the notion of an execution. The path condition accumulates the history of the execution in a configuration, which is sufficient.
An example of a CUP is shown in Fig. 4. Some reachable states in the first iteration of the loop are shown below, where line numbers are used as locations, and stands for the path condition at line :
The program is coherent because (a) no term is recomputed; (b) for the assume at line 10, the only superterms of and are and , and they are stored in x and y, respectively; and (c) for the assume introduced by the exit condition of the while loop, no superterms of , are ever computed. The program does not reduce to (i.e., it does not reach a final configuration). Its inductive assertion map is shown in Fig. 4 (right).
Note that UP are closely related, but are not equivalent, to the Herbrand programs of [dblp:conf/vmcai/muller-olmrs05]. While Herbrand programs use the syntax of UPL, they are interpreted over a fixed universe of Herbrand terms. In particular, in Herbrand programs is always false (since and have different top-level functions), while in UP, it is satisfiable.
Iv Abstraction and Bisimulation for UP
In this section, we review abstractions for transition systems. We then define two abstraction for UP: cover and renaming, and show that they induce bisimulation. That is, for UP, these abstractions preserve all properties. Finally, we show a simple logical characterization result for UP to set the stage for our main results in the following sections.
Given a transition system and a (possibly partial) abstraction function , the induced abstract transition system is , where
We write when . Note that must be defined for .
Throughout the paper, we construct several abstract transition systems. All transition systems considered are attentive. Intuitively, this means that their transitions do not distinguish between configurations that have -equivalent path conditions. We say that two configurations and are equivalent, denoted if .
Definition 5 (Attentive TS)
A transition system is attentive if for any two configurations s.t. , if there exists s.t. , then there exists , s.t. and and vice versa.
Weak, respectively strong, preservation of properties between the abstract and the concrete transition systems are ensured by the notions of simulation, respectively bisimulation.
Definition 6 ([DBLP:books/daglib/0067019])
Let and be transition systems. A relation is a simulation from to , if for every :
if then there exists such that and .
is a bisimulation from to if is a simulation from to and is a simulation from to . We say that simulates, respectively is bisimilar to, if there exists a simulation, respectively, a bisimulation, from to such that .
We say that a bisimulation is finite if its range, , is finite. A finite bisimulation relates a (possibly infinite) transition system with a finite one.
Next, we define two abstractions for UP programs and show that they result in bisimilar abstract transition systems. The first abstraction eliminates all constants that are not assigned to program variables from the path condition, using the cover operation. The second abstraction renames the constants assigned to program variables back to the initial constants . Both abstractions together ensure that all reachable configurations in the abstract transition system are defined over (i.e., the only constants that appear in states, as well as in path conditions, are from ). There may still be infinitely many such configurations since the depth of terms may be unbounded. We show that whenever the obtained abstract transition system has finitely many reachable configurations, the concrete one has an inductive assertion map that characterizes the set of reachable configurations.
Definition 7 (Cover abstraction)
The cover abstraction function is defined by
Since , the cover abstraction also results in a bisimilar abstract transition system.
For any attentive transition system , the relation is a bisimulation from to .
To introduce the renaming abstraction, we need some notation. Given a quantifier free formula , constants such that , let denote , where is a constant not in . For example, if , .
Given a path condition and a state , let denote the formula obtained by renaming all constants in using their initial values. for all such that .
Definition 8 (Renaming abstraction)
The renaming abstraction function is defined by
For any attentive transition system , the relation is a bisimulation from to .
Finally, we denote by the composition of the renaming and cover abstractions: (i.e., ). Since the composition of bisimulation relations is also a bisimulation, is bisimilar to .
Theorem 3 (Logical Characterization of UP)
If induces a finite bisimulation on an UP , then, there exists an inductive assertion map for that characterizes the reachable configurations of .
Define . Then, is such an inductive assertion map.
Intuitively, Thm. 3 says that inductive invariant of UP, whenever it exists, can be described using EUF formulas over program variables. That is, any extra variables that are added to the path condition during program execution can be abstracted away (specifically, using the cover abstraction). There are, of course, infinitely many such invariants since the depth of terms is not bounded (only constants occurring in them). In the sequel, we systematically construct a similar result for CUP.
V Bismulation of CUP
The first step in extending Thm. 3 to CUP is to design an abstraction function that bounds the depth of terms that appear in any reachable (abstract) state. It is easy to design such a function while maintaining soundness – simply forget literals that have terms that are too deep. However, we want to maintain precision as well. That is, we want the abstract transition system to be bisimilar to the concrete one. Just like cover abstraction, the base abstraction function also eliminates all constants that are not assigned to program variables. Unlike cover abstraction, the base abstraction does not maintain -equivalence of the path conditions, but, rather, forgets most literals that cannot be expressed over program variables.
In this section, we focus on the definition of the base abstraction and prove that it induces bisimulation for CUP. This result is used in Sec. VI, to logically characterize CUPs.
Intuitively, the base abstraction “truncates” the congruence graph induced by a path condition in nodes that have no representative in the set of constants assigned to the program variables ( in the following definition), and assigns to the truncated nodes fresh constants (from in the following definition).
Congruence closure procedures for EUF use a congruence graph to concisely represent the deductive closure of a set of EUF literals [DBLP:journals/jacm/NelsonO80, DBLP:conf/rta/NieuwenhuisO05]. Here, we use a logical characterization of a congruence graph, called a -basis. Let be a set of EUF literals. A triple is a -basis of relative to a set of constants , written , iff (a) is a set of fresh constants not in , and and are conjunctions of EUF literals; (b) (; (c) and , where
(d) for any , ; and (e) for any s.t. .
Note that we represent both equalities and disequalities in the -basis as common in implementations (but not in the theoretical presentations) of the congruence closure algorithm. Intuitively, are constants in that represent equivalence classes in , and are constants added to represent equivalence classes that do not have a representative in . A -basis, of any satisfiable set , is unique up to renaming of constants in and ordering of equalities between constants in .
Let and . A -basis of is , where , , . Renaming to is a different -basis: where , and .
As another example, consider and . A -basis of is , where , , and
While a basis maintains all consequences of (since ), the -base abstraction of , defined next, is weaker. It preserves consequences of only:
Definition 9 (-base abstraction)
The -base abstraction for a set of constants , is a function between sets of literals s.t. for any sets of literals and :
, where ,
if there exists a s.t. and , then .
The second requirement of Def. 9 ensures that two formulas that have the same -consequences, have the same -abstraction. For example, for a set of constants , the formulas and , have the same -base abstraction: . Note that at this point, we only require that is well defined (for example, it does not have to be computable.)
We now extend -base abstraction to program configuration, calling it simply base abstraction, since the set of preserved constants is determined by the configuration:
Definition 10 (Base abstraction)
The base abstraction is defined for configurations , where is a conjunction of literals: .
Namely, the base abstraction applied to the path condition is determined by the state in the configuration. We often write as a shorthand for .
We are now in position to state the main result of this section. Given a CUP , the abstract transition system is bisimilar to the concrete transition system . Note that at this point, we do not claim that is finite, or that it is computable. We focus only on the fact that the literals that are forgotten by the base abstraction do not matter for any future transitions. The key technical step is summarized in the following theorem:
Let be a reachable configuration of a CUP . Then,
The proof of Thm. 4 is not complicated, but it is tedious and technical. It depends on many basic properties of EUF. We summarize the key results that we require in the following lemmas. The proofs of the lemmas are provided in App. B.
We begin by defining a purifier – a set of constants sufficient to represent a set of EUF literals with terms of depth one.
Definition 11 (Purifier)
We say that a set of constants is a purifier of a constant in a set of literals , if and for every term s.t. , s.t. .
For example, if . Then, is a purifier for , but not a purifier for , even though .
In all the following lemmas, , , are sets of literals; a set constants; ; ; is a purifier for in , , and in ; ; and .
Lemma V says that anything newly derivable from and a new equality is derivable using superterms of and :
Let and be two terms in s.t. . Then, , for some constants and in , iff there are two superterms, and , of and , respectively, s.t. (i) , (ii) , and (iii) .
Lemma V says that can be described using terms of depth one using constants in .
is a purifier for in .
Lemma V says that is idempotent.
If , then
Lemma V extends the preservation results to disequalities. is a set of constants, . is not required to be a purifier (as it was in the previous lemmas).
Lemma V extends the preservation results for equalities involving a fresh constant s.t. . , , and be a term s.t there does not exists a term s.t. or .
We are now ready to present the proof of Thm. 4:
Proof (Theorem 4)
In the proof, we use , and . For part (1), we only show the proof for since the other cases are trivial.
The only-if direction follows since is weaker than . For the if direction, since it is part of a reachable configuration. Then, there are two cases:
case . Assume . Then, and for some . By Lemma V, in any new equality that is implied by (but not by ), and are equivalent (in ) to superterms of or . By the early assume property of CUP, purifies in . Therefore, every superterm of or is equivalent (in ) to some constant in . Thus, and for some . By Lemma V, . By Lemma V, . Thus, .
case . if and only if . Since , .
For part (2), we only show the cases for assume and assignment statements, the other cases are trivial.
case . W.l.o.g., for some constant . There are two cases: (a) there is a term s.t. , (b) there is no such term .
By the memoizing property of CUP, there is a program variable z s.t. and . Therefore, by definition of , . The rest of the proof is identical to the case of .
For a CUP , the relation is a bisimulation from to