1 Introduction
Associativity and commutativity (AC) axioms occur in many applications but efficient reasoning with them remain one of the major challenges in firstorder theorem proving due to prolific nature of these axioms. ^{1}^{1}todo: 1There is a proper system description of Twee in this year’s CADE, changed the citation. Despite a number of theoretical advances specialised treatment of AC axioms is mainly supported by provers for unit equality such as Waldmeister [DBLP:journals/aicom/LochnerH02], Twee [tweesystemdesc] and MaedMax [DBLP:conf/cade/WinklerM18]. These provers are based on KnuthBendix completion, and the main ingredient for dealing with AC in these provers are ground joinability criteria adapted for AC [DBLP:conf/cade/MartinN90, acjoinability]. Completeness proofs for ground joinability, known so far, are restricted to unit equalities, which limits applicability of these techniques. These proofs are based on proof transformations for unit rewriting which are not easily adaptable to the full firstorder logic and also lack general redundancy criteria.
In this paper we extend ground AC joinability criteria from the context of KnuthBendix completion to the superposition calculus for full firstorder logic. Our approach is based on an extension of the BachmairGanzinger model construction [DBLP:journals/logcom/BachmairG94] and a new redundancy criterion called closure redundancy. Closure redundancy allows for fine grained redundancy elimination which we show also covers ground AC joinability. We also introduced a new simplification called AC normalisation and showed that AC normalisation preserves completeness of the superposition calculus. Superposition calculus with the standard notion of redundancy can generate infinitely many nonredundant conclusions from AC axioms alone. Using our generalised notion of redundancy we can show that all of these inferences are redundant in the presence of a single extension axiom.
Using these results, superposition theorem provers for full firstorder logic such as Vampire [vampire], E [eprover], SPASS [spass], Zipperposition [DBLP:conf/cade/VukmirovicBBCNT21] and iProver [iproversystemdesc] can incorporate AC simplifications without compromising completeness.
A byproduct of our approach is a new criterion for applicability of demodulation which we call encompassment demodulation. Demodulation is one of the main simplification rules in the superpositionbased reasoning and is a key ingredient in efficient firstorder theorem provers. Our new demodulation criterion is useful independently of AC theories, and we demonstrate that it enables demodulation in many more cases, compared to the standard demodulation.
The main contributions of this paper include:

New redundancy criteria for the superposition calculus called closure redundancy.

Completeness proof of the superposition calculus with the closure redundancy.

Proof of admissibility of AC joinability and AC normalisation simplifications for the superposition calculus.

Encompassment demodulation and its admissibility for the superposition calculus.
In Section 2 we discuss preliminary notions, introduce closure orderings and prove properties of these orderings In Section 3 we introduce closure redundancy and prove the key theorem stating completeness of the superposition calculus with closure redundancy. In Section 4 we use closure redundancy to show that encompassment demodulation, AC joinability and AC normalisation are admissible simplifications. In Section 5 we show some experimental results and conclude in Section 6.
2 Preliminaries
We consider a signature consisting of a finite set of function symbols and the equality predicate as the only predicate symbol. We fix a countably infinite set of variables. Firstorder terms are defined in the usual manner. Terms without variables are called ground terms. A literal is an unordered pair of terms with either positive or negative polarity, written and respectively (we write to mean either of the former two). A clause is a multiset of literals. Collectively terms, literals, and clauses will be called expressions.
A substitution is a mapping from variables to terms which is the identity for all but a finitely many variables. If is an expression, we denote application of a substitution by , replacing all variables with their image in . Let be the set of ground substitutions for . ^{color=purple!20,}^{color=purple!20,}todo: color=purple!20,KK: ground to grounding Overloading this notation for sets we write . Finally, we write e.g. instead of .
An injective substitution with codomain being the set of variables is a renaming. Substitutions which are not renamings are called proper.
A substitution is more general than if for some proper substitution . If and can be unified, that is, if there exists such that , then there also exists the most general unifier, written . A term is said to be more general than if there exists a substitution that makes but there is no substitution such that . We may also say that is a proper instance of . Two terms and are said to be equal modulo renaming if there exists a renaming such that . The relations “less general than”, “equal modulo renaming”, and their union are represented respectively by the symbols ‘’, ‘’, and ‘’. ^{color=purple!20,}^{color=purple!20,}todo: color=purple!20,KK: are “more general than” signs reversed ?
A more refined notion of instance is that of closure [bachmair_basic_1995]. ^{3}^{3}todo: 3Added reference Closures are pairs that are said to represent the term while retaining information about the original term and its instantiation. Closures where is ground are said to be ground closures. Let be the set of ground closures of . Analogously to term closures, we define closures for other expressions such as literals and clauses, as a pair of an expression and a substitution. Overloading the notation for sets, if is a set of clauses then .
We write if is a subterm of . If also , then it is a strict subterm. We denote these relations by and respectively. We write to denote the term obtained from by replacing at the position by . We omit the position when it clear from the context or irrelevant.
A relation ‘’ over the set of terms is a rewrite relation if (i) and (ii) . The members of a rewrite relation are called rewrite rules. The reflexivetransitive closure of a relation is the smallest reflexivetransitive relation which contains it. It is denoted by ‘’. Two terms are joinable () if .
If a rewrite relation is also a strict ordering (transitive, irreflexive), then it is a rewrite ordering. A reduction ordering is a rewrite ordering which is wellfounded. In this paper we consider reduction orderings which are total on ground terms, such orderings are also simplification orderings i.e., satisfy .
For an ordering ‘’ over a set , its multiset extension ‘’ over multisets of is given by: iff , where is the number of occurrences of element in multiset . ^{5}^{5}todo: 5Is it necessary to define what “multiset” means here? It is well known that the mutltiset extension of a wellfounded (total) order is also a wellfounded (respectively, total) order [DBLP:journals/cacm/DershowitzM79].
Orderings on closures
In the following, let ‘’ be a reduction ordering which is total on ground terms. Examples of such orderings include KBO or LPO [termrewriting]. ^{6}^{6}todo: 6Number definitions? It is extended to an ordering on literals via iff , where and . It is further extended to an ordering on clauses via iff .
We extend this ordering to an ordering on ground closures. The idea is to “break ties”, whenever two closures represent the same term, to make more general closures smaller in the ordering than more specific ones. The definitions follow. ^{7}^{7}todo: 7It is also suggested that we properly define how to make ‘’ total, rather than simply saying that it is extended in an arbitrary, unspecified way.
iff 

(1)  
This is a wellfounded ordering, since ‘’ and ‘’ are also wellfounded. However it is only a partial order even on ground closures (e.g., ), but it is wellknown that any partial wellfounded order can be extended to a total wellfounded order (see e.g. [wellfoundedext]). Therefore we will assume that ‘’ is extended to a total wellfounded order on ground closures. Then let and in  
iff  (2)  
and let if is a unit clause , and otherwise, in  
iff  (3) 
Let us note that unit and nonunit clauses are treated differently in this ordering. Some properties that will be used throughout the paper follow.
Lemma 1
‘’, ‘’, and ‘’ are all wellfounded and total on ground term closures, literal closures, and clause closures, respectively.
Proof
We have already established that is wellfounded by construction. ‘’ and ‘’ are derived from ‘’ by multiset extension, so they are also wellfounded. Similarly, ‘’ is total on groundterms on by construction, and ‘’ and ‘’ are derived from ‘’ by multiset extension, so they are also total on ground literals/clauses. ∎
Lemma 2
Assume , are ground, then ^{color=cyan!30,}^{color=cyan!30,}todo: color=cyan!30,AD: is only defined on ground . Analogously for ‘’ and ‘’.
Lemma 3
‘’ is an extension of ‘’, in that , however this is generally not the case for ‘’ and ‘’: , and .
Proof
As an example, let and consider literal closures
(4) 
The literal represented by the one on the left is greater than the one represented by the one on the right, in ‘’. However, the closure on the left is smaller than the one on the right, in ‘’. This is also an example for ‘’ if these are two unit clauses. ∎
Lemma 4
. Analogously for ‘’ and ‘’. In particular, and analogously for ‘’ and ‘’.
Proof
From definition and the fact that . ∎
Lemma 5
.^{*}^{*}*But not, in general, , e.g. . Analogously for ‘’ and ‘’.
Proof
For to hold, either , or else but then cannot hold. The direction follows from the definition. ∎
Lemma 6
‘’ has the following property: . Analogously for ‘’ and ‘’.
Proof
For ‘’: let . By the fact that ‘’ is a rewrite relation, we have . Then, by the definition of ‘’, . For ‘’ and ‘’: by the above and by their definitions we have that the analogous properties also hold. ∎
Sometimes we will drop subscripts and use just ‘’ when it is obvious from the context: term, literals and clauses will be compared with ‘’, ‘’, ‘’ respectively, and corresponding closures with ‘’, ‘’, ‘’.
3 Model construction
The superposition calculus comprises the following inference rules.
Superposition  (5)  
Eq. Resolution  s ／≈t ∨CCθ where ,  (6)  
Eq. Factoring  (7) 
and the selection function (underlined) selects at least one negative, or else all maximal (wrt. ‘’) literals in the clause. ^{10}^{10}todo: 10Selection function
The superposition calculus is refutationally complete wrt. the standard notion of redundancy [DBLP:journals/logcom/BachmairG94, handbookparamodulation]. In the following, we refine the standard redundancy to closure redundancy and prove completeness in this case.
3.0.1 Closure redundancy
Let . In the standard definition of redundancy, a clause is redundant in a set if all follow from smaller ground instances in . Unfortunately, this standard notion of redundancy does not cover many simplifications such as AC normalisation and a large class of demodulations (which we discuss in Section 4).
By modifying the notion of ordering between ground instances, using ‘’ rather than ‘’, we adapt this redundancy notion to a closurebased one, which allows for such simplifications. We then show that superposition is still complete wrt. these redundancy criterion.
A clause is closure redundant in a set if all follow from smaller ground closures in (i.e., for all there exists a set such that and ).
Although the definition of closure redundancy looks similar to the standard definition, consider the following example showing differences between them.
Example 1
Consider unit clauses where . Then is not redundant in , in the standard sense, as it does not follow from any smaller (wrt. ‘’) ground instances of clauses in , (it does follow from instances , , but the former is bigger than ). However, it is closure redundant in , since its only ground instance follows from the smaller (wrt. ‘’) closure instances: and . In other words, the new redundancy criterion allows demodulation even when the smaller side of the equation we demodulate with is greater than the smaller side of the target equation, provided that the matching substitution is proper. As we will see in Section 4 this considerably simplifies the applicability condition on demodulation and more crucially when dealing with theories such as AC it allows to use AC axioms to normalise clauses when standard demodulation is not be applicable.
Likewise, we extend the standard notion of redundant inference. An inference is closure redundant in a set if, for all , the closure follows from closures in which are smaller wrt. ‘’ than the maximal element of . ^{12}^{12}todo: 12Added notion of redundant inference
Let us establish the following connection between closure redundant inferences and closure redundant clauses. An inference is reductive if for all we have .
Lemma 7
If the conclusion of a reductive inference is in or is closure redundant in , then the inference is closure redundant in .
Proof
If is in , then all are in . But if the inference is reductive then , so it trivially follows from a closure smaller than that maximal element: itself.
If is redundant, then all follow from smaller closures in . But if the inference is reductive then again , so it also follows from closures smaller than that maximal element. ∎
A set of clauses is saturated up to closure redundancy if any inference with premises in , which are all not redundant in , is closure redundant in . In the sequel, we refer to the new notion of closure redundancy as simply “redundancy”, when it is clear form the context.
Theorem 1
The superposition inference system is refutationally complete wrt. closure redundancy, that is, if a set of clauses is saturated up to closure redundancy and does not contain the empty clause , then it is satisfiable.
Proof
Let be a set of clauses such that , and . Let us assume is saturated up to closure redundancy. We will build a model for , and hence for , as follows. A model is represented by a convergent term rewrite system (we will show convergence in Lemma 8), where a closure is true in a given model if at least one of its positive literals has , or if at least one of its negative literals has .
For each closure , the partial model is a rewrite system defined as . The total model is thus . For each , the set is defined recursively over as follows. If:
[a.,noitemsep] is false in , strictly maximal in , , is false in , is irreducible via ,  (8) 
then and the closure is called productive, otherwise . Let also be .
Our goal is to show that is a model for . We will prove this by contradiction: if this is not the case, then there is a minimal (wrt. ‘’) closure such that . We will show by case analysis how the existence of this closure leads to a contradiction, if the set is saturated up to redundancy. First, some lemmas.
Lemma 8
and all are convergent, i.e. terminating and confluent.
Proof
It is terminating since the rewrite relation is contained in , which is wellfounded. For confluence it is sufficient to show that left hand sides of rules in are irreducible in . Assume that and are two rules produced by closures and respectively. Assume is reducible by . Then , and since is a simplification order, then . If then by (8b) and (8c) we have all terms in , therefore all literal closures in will be smaller than the literal closure in which produced (by Lemma 5), therefore (see Lemma 4). But then could not be productive due to (8e). If then both rules can reduce each other, and again due to (8e) whichever closure is larger would not be productive. In either case we obtain a contradiction. ∎
Lemma 9
If , then for any , and . ^{13}^{13}todo: 13Replaced by , since this is how it’s used below.
Proof
If a positive literal of is true in , then . Since no rules are ever removed during the model construction, then and .
If a negative literal of is true in , then . Wlog. assume that . Consider a productive closure that produced a rule . Let us show that cannot reduce . Assume otherwise. By (8b), is strictly maximal in , so if reduces either or a strict subterm of , meaning , then clearly all terms in , therefore (Lemmas 4 and 5), which contradicts regardless of whether any of them is unit. If , then , since implies , and . Hence, by Lemma 4, , contradicting (again regardless of either of them being a unit). ∎
Lemma 10
If is productive, then for any , and .
Proof
All literals in are false in by (8d). For all negative literals in , if they are false then . Since no rules are ever removed during the model construction then and .
For all positive literals in , if they are false in then . Two cases arise. If is unit, then , so is trivially false in any interpretation. If is nonunit, then consider any productive closure that produces a rule , by definition and by Lemma 5 . Since is strictly maximal in then . Therefore cannot reduce or . ∎
We are now ready to prove the main proposition by induction on closures (see Lemma 1), namely that for all we have . We will show a stronger result: that for all we have (the former result follows from the latter by Lemma 9). ^{14}^{14}todo: 14Reviewer 2 says that the latter assertion is not stronger than the former, but I don’t understand why he says that. If this is not the case, then there exists a minimal counterexample which is false in .
Notice that, since by induction hypothesis all closures such that have , then by Lemma 9 we have (and ). Consider the following cases.
Case 1
is redundant.
Proof
By definition, follows from smaller closures in . But if is the minimal closure which is false in , then all smaller are true in , which (as noted above) means that all smaller are true in , which means is true in , which is a contradiction. ∎
Case 2
contains a variable such that is reducible.
Proof
Then contains a rule which reduces to a term . Let be identical to except that it maps to . Then , so (see Lemma 3), and therefore is true in . But is true in iff in , since , therefore is also true in , which is a contradiction. ∎
Case 3
^{15}^{15}todo: 15New case, simplifies rest of proof. [Updated]There is reductive inference which is redundant, such that , is maximal in , and .
Proof
Then is implied by closures in smaller than . But since those closures are true in , then is true, and since implies , then is true in , which is a contradiction. ∎
Case 4
Neither of the previous cases apply, and contains a negative literal which is selected in the clause, i.e., with selected in .
Proof
Then either and is true and we are done, or else . Wlog., let us assume .
Subcase 4.1
.
Proof
Then and are unifiable, meaning that there is an equality resolution inference
(9) 
with premise in .
Take the instance of the conclusion such that ; it always exists since . ^{16}^{16}todo: 16Idempotence of mgu is necessary to match the definition of redundant inference. Also, since the mgu is idempotent [termrewriting] then , so . We show that . If is empty, then this is trivial. If has more than 1 element, then this is also trivial (see Lemma 2). If has exactly 1 element, then let . We have if , which is true by Lemma 4. Notice also that if is true then must also be true.
Recall that 3 does not apply. But we have shown that this inference is reductive, with , trivially maximal in , and that the instance of the conclusion implies . So for 3 not to apply the inference must be nonredundant. Also since 1 doesn’t apply then the premise is not redundant. This means that the set is not saturated, which is a contradiction. ∎
Subcase 4.2
.
Proof
Then (recall that ) must be reducible by some rule in . Since by (8b) the clause cannot be productive, it must be reducible by some rule in . Let us say that this rule is , produced by a closure smaller than .^{†}^{†}†We can use the same substitution on both and by simply assuming wlog. that they have no variables in common. Therefore closure must be of the form , with maximal in , and false in . Also note that cannot be redundant, or else it would follow from smaller closures, but those closures (which are smaller than and therefore smaller than ) would be true, so would be also true in , so by (8a) it would not be productive.
Then for some subterm of , meaning is unifiable with , meaning there exists a superposition inference
(10) 
Similar to what we did before, consider the instance with .^{‡}^{‡}‡And again note that the is idempotent so . We wish to show that this instance of the conclusion is smaller than (an instance of the second premise), that is that
(11) 
Several cases arise:

[wide,nosep,label=•]

. Then both premise and conclusion are nonunit, so comparing them means comparing and (Lemma 2), or after removing common elements, comparing and . This is true since (i) , and (ii) and is greater than all literals in , so is greater than all literals in .

and . Then simply means , which since , means .
In all these cases this instance of the conclusion is always smaller than the instance of the second premise. Note also that is maximal in . Also, since is false in (by Lemma 10) and is false in (since is in the false closure , , and the rewrite system is confluent), then in order for that instance of the conclusion to be true in it must be the case that is true in . But if the latter is true then is true, in . In other words that instance of the conclusion implies . Therefore again, since 1 and 3 don’t apply, we conclude that the inference is nonredundant with nonredundant premises, so the set is not saturated, which is a contradiction. ∎
This proves all subcases. ∎
Case 5
Neither of the previous cases apply, so all selected literals in are positive, i.e., with selected in . ^{17}^{17}todo: 17Here relate maximality of in and in
Proof
Then, since if the selection function doesn’t select a negative literal then it must select all maximal ones, wlog. one of the selected literals must have is maximal in . Then if either is true in , or , or , then is true in and we are done. Otherwise, , is false in , and wlog. . If is maximal in then is maximal in .
Subcase 5.1
maximal but not strictly maximal in .
Proof
If this is the case, then there is at least one other maximal positive literal in the clause. Let , where and . Therefore and are unifiable and there is an equality factoring inference:
(12) 
with . Take the instance of the conclusion with . This is smaller than (since , and Lemma 2 applies). Since and is false in , this instance of the conclusion is true in iff is true in . But if the latter is true in then also is. Therefore that instance of the conclusion implies . As such, and since again Cases 1 and 3 do not apply, we have a contradiction. ∎
Subcase 5.2
strictly maximal in , and reducible (in ).
Proof
This is similar to 4.2. If is reducible, say by a rule , then (since ) this is produced by some closure smaller than , with , with the maximal in , and with false in .
Then there is a superposition inference
(13) 
Again taking the instance with , we see that it is smaller than (see discussion in 4.2). Furthermore since and are false in , then that instance of the conclusion is true in iff is. But since also , then implies . Therefore that instance of the conclusion implies . Again this means we have a contradiction. ∎
Subcase 5.3
strictly maximal in , and irreducible (in ).
Proof
Since is not productive, and at the same time all criteria in (8) except (8d) are satisfied, it must be that (8d) is not, that is must be true in . Then this must mean we can write , where the latter literal is the one that becomes true with the addition of , whereas without that rule it was false.
But this means that such that any rewrite proof needs at least one step where is used, since is irreducible by . Wlog. say . Since: (i) , (ii) , and (iii) , then , which implies , which implies can not be used to reduce . Then the only way it can reduce or is if . This means there is an equality factoring inference: