1 Introduction
Proving the satisfiability or unsatisfiability of a firstorder formula (possibly modulo some background theory) is an essential problem in computer science – in particular for the automatic verification of complex systems, and instantiation schemes can be used for this purpose. Such schemes can be viewed as functions that map a set of formulæ (or clauses) to a set of ground (i.e. without variable) instances of . An instantiation scheme is refutationally complete if for all sets of clauses , is satisfiable exactly when is. Examples of refutationally complete instantiation schemes include [22, 24, 17, 5]. It is clear that an instantiation scheme that is refutationally complete does not always terminate, as may be infinite, but schemes that are both complete and terminating can be defined for specific classes of clause sets, that are thus decidable. A trivial and wellknown example is the BernaysSchönfinkel class (i.e. the class of purely universal formulæ without function symbols of arity distinct from , see, e.g., [11]), since in this case the set of ground instances is finite. Other examples include the class of stratified clause sets [1] and many classes of clause sets of the form , where is a set of ground formulæ and is the set of axioms of a specific theory, for instance the theory of arrays [6]. In this last case, of course, only the axioms in need to be instantiated.
Instantiation schemes can also be defined for specific theories for which decision procedures exist. Then, the theory is not axiomatized, but directly handled by an external prover – used as a “black box”. In this case, the instantiation procedure should preserve the validity of the formula modulo the considered theory. Such procedures are appealing, because it is usually much easier to check the validity of a ground set than that of a nonground set (see for instance [7]).
Frequently, one has to handle heterogeneous problems, defined on complex theories for which no instantiation procedure exists. Such theories are frequently obtained by combining simpler theories. For instance the theory describing a datastructure (arrays, list, etc.) may be combined with the theory modeling the elements it contains (e.g., integers). Most systems rely on the NelsonOppen method (and its numerous refinements) to reason on combination of theories. This scheme allows one – under certain conditions – to combine independent decision procedures (see, e.g., [27]), but it is of no use for reasoning on theories that include axioms containing function or predicate symbols from both theories. As an example, consider the following formula:
that states that an array is sorted. This formula uses symbols from the theory of integers (the predicate ) and from the theory of arrays (the function , which returns the value stored in a certain array at a certain index).
In this paper, we show how to construct automatically instantiation schemes for such axioms, by combining existing instantiation schemes. More precisely, from two complete instantiation procedures and for the theory of integers and for the theory of arrays respectively, we construct a new procedure which is able to handle a particular class of “mixed” axioms, containing function symbols from both theories (including for instance the axioms for sorted arrays and many others). will be complete and terminating if both and are (as proven in Section 3.3). This approach is not restricted to specific theories such as and ; on the contrary it is generic and applies to a wide range of theories and some examples are provided in Section 4. The conditions that must be satisfied by the considered theories and by their instantiation procedures are very precisely identified (see Section 3.2).
Comparison with Related Work
There is an extensive amount of work on the combination of (usually disjoint) theories, using mainly refinements or extensions of the NelsonOppen method (see, e.g., [27, 8]). For instance, [14] shows that many decidable fragments of firstorder logic can be combined with any disjoint theory, even if these fragments do not fulfill the stable infiniteness condition in general. A related result is presented in [15] for the theory of lists (with a length function). However, these results do not apply to nondisjoint theories as the ones we consider in this paper, and they cannot handle nested combinations of arbitrary theories.
Reasoning on the combination of theories with mixed axioms has been recognized as an important problem and numerous solutions have been proposed in many specific cases. Most existing work focuses on testing the satisfiability problem of ground formulæ in combinations or extensions of existing theories. In contrast, our method aims at reducing nonground satisfiability to ground satisfiability tests, via instantiation.
For instance, [7, 6] define a decision procedure for extensions of the theory of arrays with integer elements, which is able to handle axioms such as the one above for sorted arrays. As we shall see in Section 4, our approach, when applied to these particular theories, permits to handle a strictly more expressive class of quantified formulæ.
[19] focuses on arrays with integer indices and devises a method to combine existing decision procedures (for Presburger arithmetic and for the theory of arrays). This method is able to handle some important specific features of arrays such as sortedness or array dimension. Similarly to our approach, theirs is based on an instantiation of the axioms. As we shall see, some of its features can be tackled with our method and others (such as Injectivity) are out of its scope. However, our method is generic in the sense that it applies to a wide class of theories and axioms (in particular, it applies to axioms that are not considered in [19]). It is essentially syntactic, whereas that of [19] is more of a semantic nature.
A logic devoted to reasoning with arrays of integers is presented is [21] and the decidability of the satisfiability problem is established by reduction to the emptiness problem for counter automata. In Section 4 we shall show that the expressive power of this logic is again incomparable with the one we obtain with our approach.
[18] proposes an instantiation scheme for sets of clauses possibly containing arithmetic literals, which can handle some of the axioms we consider. However termination is not guaranteed for this scheme, in contrast to ours.
Slightly closer to our approach is the work described in [25, 26], which defines the notion of the (stably) local extension of a theory and shows that the satisfiability problem in a (stably) local extension of a theory can be reduced to a mere satisfiability test in . The notion of a local extension is a generalization of the notion of a local theory [16]. The idea is that, for testing the satisfiability of a ground formula in the local extension of a theory, it is sufficient to instantiate the variables occurring in the new axioms by ground terms occurring either in or in the axioms. This condition holds for numerous useful extensions of base theories, including for instance extensions with free functions, with selector functions for an injective constructor, with monotone functions over integers or reals etc. Our approach departs from these results because our goal is not to extend basic theories, but rather to combine existing instantiation procedures. Note also that the notion of a local extension is a semantic one, and that this property must be established separately for every considered extension. In our approach we define conditions on the theories ensuring that they can be safely combined. These conditions can be tested once and for all for each theory, and then any combination is allowed. The extensions we consider in this paper are not necessarily local thus do not fall under the scope of the method in [25, 26]. However, an important restriction of our approach compared to [25, 26] is that the theories must be combined in a hierarchic way: intuitively there can be function symbols mapping elements of the first theory (the “base” theory) to elements of the second one (the “nesting” theory), but no function symbols are allowed from to .
Extensions of the superposition calculus [3] have been proposed to handle firstorder extensions of a base theory (see for example [4, 2]). The superposition calculus is used to reason on the generic part of the formulæ whereas the theoryspecific part is handled by an external prover. These proof procedures can be used to reason on some the formulæ we consider in the present paper. However, we are not aware of any termination result for these approaches (even completeness requires additional restrictions that are not always satisfied in practice). Our approach uses an instantiationbased approach instead of superposition, and ensures that termination is preserved by the combination, at the cost of much stronger syntactic restrictions on the considered formulæ.
Organization of the Paper
The rest of the paper is structured as follows. Section 2 contains general definitions and notations used throughout the present work. Most of them are standard, but some are more particular, such as the notions of clauses or specifications. Section 3 describes our procedure for the nested combination of instantiation schemes, and introduces conditions to ensure that completeness is preserved. Section 4 shows some interesting applications of these results for theories that are particularly useful in the field of verification (especially for extensions of the theory of arrays). Section 5 concludes the paper and gives some lines of future work.
2 Preliminaries
In this section, we first briefly review usual notions and notations about firstorder clausal logic. Then we introduce the rather nonstandard notion of an clause (a clause with infinitely many literals). We define the notion of specifications and provide some examples showing how usual theories such as those for integers or arrays can be encoded. Finally we introduce the notion of instantiation methods.
2.1 Syntax
Let be a set of sort symbols and be a set of function symbols together with a ranking function . For every , we write if . If then is a constant symbol of sort . We assume that contains at least one constant symbol of each sort. To every sort is associated a countably infinite set of variables of sort , such that these sets are pairwise disjoint. denotes the whole set of variables. For every , the set of terms of sort is denoted by and built inductively as usual on and :

.

If and for all then .
The set of terms is defined by .
An atom is an equality between terms of the same sort. A literal is either an atom or the negation of an atom (written ). If is a literal, then denotes its complementary: and . A clause is a finite set (written as a disjunction) of literals. We assume that contains a sort and that contains a constant symbol true of sort . For readability, atoms of the form will be simply denoted by (thus we write, e.g., instead of ). An atom is equational iff it is of the form where .
The set of variables occurring in an expression (term, atom, literal or clause) is denoted by . is ground iff . The set of ground terms of sort is denoted by and the set of ground terms by .
A substitution is a function that maps every variable to a term of the same sort. The image of a variable by a substitution is denoted by . The domain of a substitution is the set^{1}^{1}1for technical convenience we do not assume that is finite. , and its codomain is the set of elements the variables in the domain are mapped to. Substitutions are extended to terms, atoms, literals and clauses as usual: , , and . A substitution is ground if , . A ground instance of an expression is an expression of the form , where is a ground substitution of domain .
Definition.
A substitution is pure iff for all , . In this case, for any term , is a pure instance of . A substitution is a renaming if it is pure and injective.
A substitution is a unifier of a set of pairs iff . It is wellknown that all unifiable sets have a most general unifier (mgu), which is unique up to a renaming.
2.2 Semantics
An interpretation is a function mapping:

Every sort symbol to a nonempty set .

Every function symbol to a function .
denotes the domain of , i.e., the set . As usual, the valuation function maps every ground expression to a value defined as follows:

,

iff ,

iff ,

iff .
An interpretation satisfies an clause if for every ground instance of we have . A set of clauses is satisfied by if satisfies every clause in . If this is the case, then is a model of and we write . A set of clauses is satisfiable if it has a model; two sets of clauses are equisatisfiable if one is satisfiable exactly when the other is satisfiable.
In the sequel, we restrict ourselves, w.l.o.g., to interpretations such that, for every , .
2.3 Clauses
For technical convenience, we extend the usual notion of a clause by allowing infinite disjunction of literals:
Definition.
An clause is a possibly infinite set of literals.
The notion of instance extends straightforwardly to clauses: if is an clause then denotes the clause (recall that the domain of may be infinite). Similarly, the semantics of clauses is identical to that of standard clauses: if is a ground clause, then iff there exists an such that . If is a nonground clause, then iff for every ground substitution of domain , . The notions of satisfiability, models etc. are extended accordingly. If are two sets of clauses, we write if for every clause there exists a clause such that .
Proposition.
If then is a logical consequence of .
Of course, most of the usual properties of firstorder logic such as semidecidability or compactness fail if clauses are considered. For instance, if stands for the clause and for , then is unsatisfiable, although every finite subset of is satisfiable.
2.4 Specifications
Usually, theories are defined by sets of axioms and are closed under logical consequence. In our setting, we will restrict either the class of interpretations (e.g., by fixing the interpretation of a sort to the natural numbers) or the class of clause sets (e.g., by considering only clause sets belonging to some decidable fragments or containing certain axioms). This is why we introduce the (slightly unusual) notion of specifications, of which we provide examples in the following section:
Definition.
A specification is a pair , where is a set of interpretations and is a class of clause sets. A clause set is satisfiable if there exists an such that . and are equisatisfiable if they are both satisfiable or both unsatisfiable. We write iff every model of is also an model of .
For the sake of readability, if is clear from the context, we will say that a set of clauses is satisfiable, instead of satisfiable. We write iff and . By a slight abuse of language, we say that occurs in if there exists such that .
In many cases, is simply the set of all interpretations, which we denote by . But our results also apply to domainspecific instantiation schemes such as those for Presburger arithmetic. Of course, restricting the form of the clause sets in is necessary in many cases for defining instantiation schemes that are both terminating and refutationally complete. That is why we do not assume that contains every clause set. Note that axioms may be included in . We shall simply assume that is closed under inclusion and ground instantiations, i.e., for all if and only contains ground instances of clauses in , then . All the classes of clause sets considered in this paper satisfy these requirements.
We shall restrict ourselves to a particular class of specifications: those with a set of interpretations that can be defined by a set of clauses.
Definition.
A specification is definable iff there exists a (possibly infinite) set of clauses such that .
From now on, we assume that all the considered specifications are definable.
2.5 Examples
Example.
The specification of firstorder logic is defined by where:

is the set of all interpretations (i.e. ).

is the set of all clause sets on the considered signature.
Example.
The specification of Presburger arithmetic is defined as follows: where:

contains the domain axiom: and the usual axioms for the function symbols , , , , , and for the predicate symbols (for every ) and :
denotes equality modulo (which will be used in Section 4.1.1); denote variables of sort and is any natural number. Note that the domain axiom is an infinite clause, while the other axioms can be viewed as standard clauses.

is the class of clause sets built on the set of function symbols and on the previous set of predicate symbols.
In the sequel, the terms and will be written and respectively.
Example.
The specification of arrays is where:

, where and ( is a variable of sort array, are variables of sort ind and is a variable of sort elem).

is the class of ground clause sets built on , and a set of constant symbols.
It should be noted that reals can be also handled by using any axiomatization of real closed fields.
2.6 Instantiation Procedures
An instantiation procedure is a function that reduces the satisfiability problem for any set of clauses to that of a (possibly infinite) set of ground clauses.
Definition.
Let be a specification. An instantiation procedure for is a function from to such that for every , is a set of ground instances of clauses in . is complete for if for every , and are equisatisfiable. It is terminating if is finite for every .
If is complete and terminating, and if there exists a decision procedure for checking whether a ground (finite) clause set is satisfiable in , then the satisfiability problem is clearly decidable. Several examples of complete instantiation procedures are available in the literature [24, 17, 5, 18, 23, 1, 7, 13, 12]. Our goal in this paper is to provide a general mechanism for constructing new complete instantiation procedures by combining existing ones.
3 Nested Combination of Specifications
3.1 Definition
Theories are usually combined by considering their (in general disjoint) union. Decision procedures for disjoint theories can be combined (under certain conditions) by different methods, including the NelsonOppen method [27] or its refinements. In this section we consider a different way of combining specifications. The idea is to combine them in a “hierarchic” way, i.e., by considering the formulæ of the first specification as constraints on the formulæ of the second one.
For instance, if is the specification of Presburger arithmetic and is the specification of arrays, then:

is a formula of ( denotes a variable and denotes a constant symbol of sort ).

is a formula of (stating that is a constant array).

(stating that is a constant on the interval ) is a formula obtained by combining and hierarchically.
Such a combination cannot be viewed as a union of disjoint specifications, since the axioms contain function symbols from both specifications. In this example, is a base specification and is a nesting specification.
More formally, we assume that the set of sorts is divided into two disjoint sets and such that for every function , if , then . A term is a base term if it is of a sort and a nesting term if it is of a sort and contains no nonvariable base term. In the sequel we let (resp. ) be the set of base variables (resp. nesting variables) and let (resp. ) be the set of function symbols whose codomain is in (resp. ). An ground instance of an expression is an expression of the form where is a ground substitution of domain . Intuitively, an ground instance of is obtained from by replacing every variable of a sort (and only these variables) by a ground term of the same sort.
Definition.
denotes the set of clauses such that every term occurring in is a base term. denotes the set of clauses such that:

Every nonvariable term occurring in is a nesting term.

For every atom occurring in , and are nesting terms.
Notice that it follows from the definition that , since and are disjoint.
Definition.
A specification is a base specification if and for every , . It is a nesting specification if and for every , .
Throughout this section, will denote a base specification and denotes a nesting specification. Base and nesting specifications are combined as follows:
Definition.
The hierarchic expansion of over is the specification defined as follows:

.

Every clause set in is of the form , where and .
If is a clause in , then is the base part of the clause and is its nesting part. If is a set of clauses in , then and respectively denote the sets and , and are respectively called the base part and nesting part of .
The following proposition shows that the decomposition in Condition 2 is unique.
Proposition.
For every clause occurring in a clause set in , there exist two unique clauses and such that .
Proof.
Example.
Consider the following clauses:

Clauses and occur in , and for instance, and . Clause does not occur in because the atom of the nesting specification contains the nonvariable term of the base specification. However, can be equivalently written as follows:

and is in ^{2}^{2}2However as we shall see in Section 4, our method cannot handle such axioms, except in some very particular cases. In fact, adding axioms relating two consecutive elements of an array easily yields undecidable specifications (as shown in [6]).. Clause does not occur in , because contains symbols from both (namely ) and () which contradicts Condition 2 of Definition 3.1. However, can be handled in this setting by considering a copy of (with disjoint sorts and function symbols). In this case, belongs to , where denotes the union of the specifications and . Of course can be replaced by any other specification containing an ordering predicate symbol. The same transformation cannot be used on the clause , since (because of the literal ) the sort of the indices cannot be separated from that of the elements. Again, this is not surprising because, as shown in [6], such axioms (in which index variables occur out of the scope of a ) easily make the theory undecidable.
Since and are disjoint, the boolean sort cannot occur both in and . However, this problem can easily be overcome by considering two copies of this sort (bool and ).
3.2 Nested Combination of Instantiation Schemes
The goal of this section is to investigate how instantiation schemes for and can be combined in order to obtain an instantiation scheme for . For instance, given two instantiation schemes for integers and arrays respectively, we want to automatically derive an instantiation scheme handling mixed axioms such as those in Example 3.1. We begin by imposing conditions on the schemes under consideration.
3.2.1 Conditions on the Nesting Specification
First, we investigate what conditions can be imposed on the instantiation procedure for the nesting specification . What is needed is not an instantiation procedure that is complete for ; indeed, since by definition every term of a sort in occurring in is a variable, such an instantiation would normally replace every such variable by an arbitrary ground term (a constant, for example). This is not satisfactory because in the current setting, the value of these variables can be constrained by the base part of the clause. This is why we shall assume that the considered procedure is complete for every clause set that is obtained from clauses in by grounding the variables in , no matter the grounding instantiation.
Definition.
An mapping is a function from to . Such a mapping is extended straightforwardly into a function from expressions to expressions: for every expression (term, atom, literal, clause or set of clauses) , denotes the expression obtained from by replacing every term occurring in by .
An instantiation procedure is invariant iff for every mapping , and every clause in a set , .
We may now define nestingcomplete instantiation procedures. Intuitively, such a procedure must be complete on those sets in which the only terms of a sort in that occur are ground, the instances cannot depend on the names of the terms in and the addition of information cannot make the procedure less instantiate a clause set.
Definition.
An instantiation procedure is nestingcomplete if the following conditions hold:

For all sets and all sets such that every clause in is an ground instance of a clause in , and are equisatisfiable.

is invariant.

is monotonic: .
3.2.2 Conditions on the Base Specification
Second, we impose conditions on the instantiation procedure for the base specification . We need the following definitions:
Definition.
Let be a set of clauses and let be a set of terms. We denote by the set of clauses of the form , where and maps every variable in to a term of the same sort in .
Proposition.
Let be a set of clauses and let and be two sets of ground terms. If then .
Definition.
If is a set of clauses, we denote by the set of clauses of the form such that for every , and is a pure substitution.
Example.
Let . Then contains among others the clauses , , , , etc.
Definition.
An instantiation procedure for is basecomplete iff the following conditions hold:

For every there exists a finite set of terms such that and and are equisatisfiable.

If then .

For every clause set , .
Obviously these conditions are much stronger than those of Definition 3.2.1. Informally, Definition 3.2.2 states that:

All variables must be instantiated in a uniform^{3}^{3}3Of course sort constraints must be taken into account. way by ground terms, and satisfiability must be preserved.

The instantiation procedure is monotonic.

The considered set of ground terms does not change when new clauses are added to , provided that these clauses are obtained from clauses already occurring in by disjunction and pure instantiation only.
3.2.3 Definition of the Combined Instantiation Scheme
We now define an instantiation procedure for . Intuitively this procedure is defined as follows.

First, the nesting part of each clause in is extracted and all base variables are instantiated by arbitrary constant symbols (one for each base sort).

The instantiation procedure for is applied on the resulting clause set. This instantiates all nesting variables (but not the base variables, since they have already been instantiated at Step ).

All the substitutions on nesting variables from Step 2 are applied to the initial set of clauses.

Assuming the instantiation procedure for is basecomplete, if this procedure was applied to the base part of the clauses, then by Condition 1 of Definition 3.2.2, the base variables in the base part of the clauses would be uniformly instantiated by some set of terms . All base variables and all occurrences of constants are replaced by all possible terms in .
Example.
Assume that , and that contains the following symbols: , , and . Consider the set .

We compute the set and replace every base variable by . This yields the set: .

We apply an instantiation procedure for ^{4}^{4}4There exist several instantiation procedures for , one such example is given in Section 4.2.1.. Obviously, this procedure should instantiate the variable by , yielding .

We apply the (unique in our case) substitution to the initial clauses: . Note that at this point all the remaining variables are in .

We compute the set of clauses and the set of terms . It should be intuitively clear^{5}^{5}5A formal definition of an instantiation procedure for this fragment of Presburger arithmetic will be given in Section 4.1.1. that must be instantiated by and by , yielding .

We thus replace all base variables by every term in yielding the set , i.e., after simplification, . It is straightforward to check that this set of clauses is unsatisfiable. Any SMTsolver capable of handling arithmetic and propositional logic can be employed to test the satisfiability of this set.
The formal definition of the procedure is given below. Let be a substitution mapping every variable of a sort to an arbitrary constant symbol of sort .
Definition.
Let be a basecomplete instantiation procedure and be a nestingcomplete instantiation procedure. is defined as the set of clauses of the form where:

.

.

is obtained from by replacing every occurrence of a constant symbol in the codomain of by a fresh variable of the same sort.

maps every variable in to a term of the same sort in .
The following proposition is straightforward to prove and states the soundness of this procedure:
Proposition.
Let be a basecomplete instantiation procedure and let be a nestingcomplete instantiation procedure. For every set of clauses , is a set of ground instances of clauses in . Thus if is unsatisfiable, then so is .
3.3 Completeness
The remainder of this section is devoted to the proof of the main result of this paper, namely that the procedure is complete for :
Theorem.
Let be a basecomplete instantiation procedure (for ) and let be a nestingcomplete instantiation procedure (for ). Then is complete for ; furthermore, this procedure is monotonic and invariant.
The rest of the section (up to Page 4) can be skipped entirely by readers not interested in the more theoretical aspects of the work. The proof of this theorem relies on a few intermediate results that are developed in what follows.
3.3.1 Substitution Decomposition
Definition.
A substitution is a base substitution iff . It is a nesting substitution iff and for every , contains no nonvariable base term.
We show that every ground substitution can be decomposed into two parts: a nesting substitution and a base substitution. We begin by an example:
Example.
Assume that , and that contains the following symbols: . Consider the ground substitution . We can extract from a nesting substitution by replacing all subtermmaximal base terms by variables, thus obtaining , and then construct the base substitution such that . Note that is not ground and that .
The following result generalizes this construction:
Proposition.
Every ground substitution can be decomposed into a product where is a nesting substitution, is a base substitution, and for all ,

,

.
Proof.
Let be the set of subtermmaximal base terms occurring in terms of the form , with . Let be a (partial) function mapping every term to an arbitrarily chosen variable such that . This function is extended into a total function on by mapping all terms for which is undefined to pairwise distinct new variables, not occurring in . Note that is injective by construction. The substitutions and are defined as follows:

and is the term obtained by replacing every occurrence of a term in by ;

; if for some term , then ; otherwise, . Note that is welldefined, since by definition if then .
By construction, is a nesting substitution and is a base substitution. Furthermore, since , for every . Similarly, for every , and therefore . Let . By definition of , is of the form for some , and there is no variable such that , since otherwise would have been defined as . Thus . Now if and , then is also of the form for some and we have and , hence and .
3.3.2 Partial Evaluations
Given a set of clauses in and an interpretation of , we consider a set of clauses of by selecting those ground instances of clauses in whose base part evaluates to false in and adding their nesting part to . More formally:
Definition.
For every clause and for every interpretation , we denote by the set of ground substitutions of domain such that . Then, for every we define:
Example.
Let be a set of clauses in , where are of sort and is a variable of a sort distinct from . Let be the interpretation of natural numbers such that . Then and . Therefore .
The following lemma shows that is unsatisfiable when is unsatisfiable.
Lemma.
For every unsatisfiable set of clauses and for every , is unsatisfiable.
Proof.
Let . Assume that is satisfiable, i.e. that there exists an interpretation validating . W.l.o.g. we assume that the domain of is disjoint from that of . We construct an interpretation satisfying , which will yield a contradiction since is unsatisfiable by hypothesis.
For all sort symbols and for all , we denote by an arbitrarily chosen ground term in such that ^{6}^{6}6 always exists since we restricted ourselves to interpretations such that, for every , .. If is a ground expression, we denote by the expression obtained from by replacing every term by ; by construction . Let be the function defined for every element as follows:

if then ;

otherwise .
We define the interpretation by combining and