DeepAI

# Subsumption Demodulation in First-Order Theorem Proving

Motivated by applications of first-order theorem proving to software analysis, we introduce a new inference rule, called subsumption demodulation, to improve support for reasoning with conditional equalities in superposition-based theorem proving. We show that subsumption demodulation is a simplification rule that does not require radical changes to the underlying superposition calculus. We implemented subsumption demodulation in the theorem prover Vampire, by extending Vampire with a new clause index and adapting its multi-literal matching component. Our experiments, using the TPTP and SMT-LIB repositories, show that subsumption demodulation in Vampire can solve many new problems that could so far not be solved by state-of-the-art reasoners.

• 6 publications
• 25 publications
• 2 publications
02/03/2019

### Automated ZFC Theorem Proving with E

I introduce an approach for automated reasoning in first order set theor...
02/15/2017

### Theorem Proving Based on Semantics of DNA Strand Graph

Because of several technological limitations of traditional silicon base...
12/28/2017

### Object-Oriented Theorem Proving (OOTP): First Thoughts

Automatic (i.e., computer-assisted) theorem proving (ATP) can come in ma...
07/29/2019

### Generating theorem proving procedures from axioms of Truncated Predicate Calculus

We present a novel approach to the problem of automated theorem proving....
02/11/2022

### REST: Integrating Term Rewriting with Program Verification (Extended Version)

We introduce REST, a novel term rewriting technique for theorem proving ...
05/23/2011

### Integrating Testing and Interactive Theorem Proving

Using an interactive theorem prover to reason about programs involves a ...
06/20/2019

### Designing Game of Theorems

"Theorem proving is similar to the game of Go. So, we can probably impro...

## 1 Introduction

For the efficiency of organizing proof search during saturation-based first-order theorem proving, simplification rules are of critical importance. Simplification rules are inference rules that do not add new formulas to the search space, but simplify formulas by deleting (redundant) clauses from the search space. As such, simplification rules reduce the size of the search space and are crucial in making automated reasoning efficient.

When reasoning about properties of first-order logic with equality, one of the most common simplification rules is demodulation [10] for rewriting (and hence simplifying) formulas using unit equalities , where are terms and denotes equality. As a special case of superposition, demodulation is implemented in first-order provers such as E [13], Spass [20] and Vampire [10]. Recent applications of superposition-based reasoning, for example to program analysis and verification [5], demand however new and efficient extensions of demodulation to reason about and simplify upon conditional equalities , where is a first-order formula. Such conditional equalities may, for example, encode software properties expressed in a guarded command language, with denoting a guard (such as a loop condition) and encoding equational properties over program variables. We illustrate the need of considering generalized versions of demodulation in the following example.

###### Example 1

Consider the following formulas expressed in the first-order theory of integer linear arithmetic:

 f(i)≃g(i)0≤i

Here, is an implicitly universally quantified logical variable of integer sort, and is integer-valued constant. First-order reasoners will first clausify formulas (1), deriving:

 f(i)≃g(i)0≰i∨i≮n∨P(f(i)) (2)

By applying demodulation over (2), the formula is rewritten111assuming that is simpler/smaller than using the unit equality , yielding the clause . That is, is derived from (1) by one application of demodulation.

Let us now consider a slightly modified version of (1), as below:

 0≤i

whose clausal representation is given by:

 0≰i∨i≮n∨f(i)≃g(i)0≰i∨i≮n∨P(f(i)) (4)

It is again obvious that from (3) one can derive the formula , or equivalently the clause:

 0≰i∨i≮n∨P(g(i)) (5)

Yet, one cannot anymore apply demodulation-based simplification over (4) to derive such a clause, as (4) contains no unit equality. ∎

In this paper we propose a generalized version of demodulation, called subsumption demodulation, allowing to rewrite terms and simplify formulas using rewriting based on conditional equalities, such as in (3). To do so, we extend demodulation with subsumption, that is with deciding whether (an instance of a) clause is a submultiset of a clause . This way, subsumption demodulation can be applied to non-unit clauses and is not restricted to have at least one premise clause that is a unit equality. We show that subsumption demodulation is a simplification rule of the superposition framework (Section 4), allowing for example to derive the clause (5) from (3) in one inference step. By properly adjusting clause indexing and multi-literal matching in first-oder theorem provers, we provide an efficient implementation of subsumption demodulation in Vampire (Section 5) and evaluate our work against state-of-the-art reasoners, including E [13], Spass [20], CVC4 [3] and Z3 [7] (Section 6).

#### Related work.

While several approaches generalize demodulation in superposition-based theorem proving, we argue that subsumption demodulation improves existing methods either in terms of applicability and/or efficiency. The AVATAR architecture of first-order provers [18] splits general clauses into components with disjoint sets of variables, potentially enabling demodulation inferences whenever some of these components become unit equalities. Example 1 demonstrates that subsumption demodulation solves applies in situations where AVATAR does not: in each clause of (4), all literals share the variable and hence none of the clauses from (4) can be split using AVATAR. That is, AVATAR would not generate unit equalities from (4), and therefore cannot apply demodulation over (4) to derive (5).

The local rewriting approach of [19] requires rewriting equality literals to be maximal222w.r.t. clause ordering in clauses. However, following [10], for efficiency reasons we consider equality literals to be “smaller” than non-equality literals. In particular, the equality literals of clauses (4) are “smaller” than the non-equality literals, preventing thus the application of local rewriting in Example 1.

We further note that the contextual rewriting rule of [1] is more general than our rule of subsumption demodulation. Yet, efficiently automating contextual rewriting is extremely challenging, while subsumption demodulation requires no radical changes in the existing machinery of superposition provers (see Section 5).

To the best of our knowledge, except Spass [20], no other state-of-the-art superposition prover implements variants of conditional rewriting. Subterm contextual rewriting [21] is a refined notion of contextual rewriting and is implemented in Spass. A major difference of subterm contextual rewriting when compared to subsumption demodulation is that in subsumption demodulation the discovery of the substitution is driven by the side conditions whereas in subterm contextual rewriting the side conditions are evaluated by checking the validity of certain implications by means of a reduction calculus. This reduction calculus recursively applies another restriction of contextual rewriting called recursive contextual ground rewriting, among other standard reduction rules. While subterm contextual rewriting is more general, we believe that the benefit of subsumption demodulation comes with its relatively easy and efficient integration within existing superposition reasoners, as evidenced also in Section 6.

Local contextual rewriting [9] is another refinement of contextual rewriting implemented in Spass. In our experiments it performed similarly to subterm contextual rewriting.

Finally, we note that SMT-based reasoners also implement various methods to efficiently handle conditional equalities, see e.g. [12, 6]. Yet, the setting is very different as they rely on the DPLL(T) framework [8] rather than implementing superposition.

#### Contributions.

Summarizing, this paper brings the following contributions.

• To improve reasoning in the presence of conditional equalities, we introduce the new inference rule subsumption demodulation, which generalizes demodulation to non-unit equalities by combining demodulation and subsumption (Section 4).

• Subsumption demodulation does not require radical changes to the underlying superposition calculus. We implemented subsumption demodulation in the first-order theorem prover Vampire, by extending Vampire with a new clause index and adapting its multi-literal matching component (Section 5).

• We compared our work against state-of-the-art reasoners, using the TPTP and SMT-LIB benchmark repositories. Our experiments show that subsumption demodulation in Vampire can solve 11 first-order problems that could so far not be solved by any other state-of-the-art provers, including Vampire, E, Spass, CVC4 and Z3 (Section 6).

## 2 Preliminaries

For simplicity, in what follows we consider standard first-order logic with equality, where equality is denoted by . We support all standard boolean connectives and quantifiers in the language. Throughout the paper, we denote terms by , variables by , constants by , function symbols by and predicate symbols by , all possibly with indices. Further, we denote literals by and clauses by , again possibly with indices. We write to denote the formula . A literal is called an equality literal. We consider clauses as multisets of literals and denote by the subset relation among multisets. A clause that only consists of one one equality literal is called a unit equality.

An expression is a term, literal, or clause. We write to mean an expression with a particular occurrence of a term . A substitution, denoted by , is any finite mapping of the form , where . Applying a substitution to an expression yields another expression, denoted by , by simultaneously replacing each by in . We say that is an instance of . A unifier of two expressions and is a substitution such that . If two expressions have a unifier, they also have a most general unifier (mgu). A match of expression to expression is a substitution such that . Note that any match is a unifier (assuming the sets of variables in and are disjoint), but not vice-versa, as illustrated below.

###### Example 2

Let and be the clauses and , respectively. The only possible match of to is . On the other hand, the only possible match of to is . As and are not the same, there is no match of to . Note however that and can be unified; for example, using .

#### Superposition inference system.

We assume basic knowledge in first-order theorem proving and superposition reasoning [2, 11]. We adopt the notations and the inference system of superposition from [10]. We recall that first-order provers perform inferences on clauses using inference rules, where an inference is usually written as: with . The clauses are called the premises and is the conclusion of the inference above. An inference is sound if its conclusion is a logical consequence of its premises. An inference rule is a set of inferences and an inference system is a set of inference rules. An inference system is sound if all its inference rules are sound.

Modern first-order theorem provers implement the superposition inference system for first-order logic with equality. This inference system is parametrized by a simplification ordering over terms and a literal selection function over clauses. In what follows, we denote by a simplification ordering over terms, that is is a well-founded partial ordering satisfying the following three conditions:

• stability under substitutions: if , then ;

• monotonicity: if , then ;

• subterm property: whenever is a proper subterm of .

The simplification ordering on terms can be extended to a simplification ordering on literals and clauses, using a multiset extension of orderings. For simplicity, the extension of to literals and clauses will also be denoted by . Whenever , we say that is bigger than and is smaller than w.r.t. . We say that an equality literal is oriented, if or . The literal extension of asserts that negative literals are always bigger than their positive counterparts. Moreover, if , where and are positive, then . Finally, equality literals are set to be smaller than any literal using a predicate different than .

A selection function selects at least one literal in every non-empty clause. In what follows, selected literals in clauses will be underlined: when writing , we mean that (at least) is selected in . In what follows, we assume that selection functions are well-behaved w.r.t. : either a negative literal is selected or all maximal literals w.r.t. are selected.

In the sequel, we fix a simplification ordering and a well-behaved selection function and consider the superposition inference system, denoted by Sup, parametrized by these two ingredients. The inference system Sup for first-order logic with equality consists of the inference rules of Figure 1, and it is both sound and refutationally complete. That is, if a set of clauses is unsatisfiable, then the empty clause (that is, the always false formula) is derivable from in Sup.

## 3 Superposition-based Proof Search

We now overview the main ingredients in organizing proof search within first-order provers, using the superposition calculus. For details, we refer to [2, 11, 10].

Superposition-based provers use saturation algorithms: applying all possible inferences of Sup in a certain order to the clauses in the search space until (i) no more inferences can be applied or (ii) the empty clause has been derived. A simple implementation of a saturation algorithm would however be very inefficient as applications of all possible inferences will quickly blow up the search space.

Saturation algorithms can however be made efficient by exploiting a powerful concept of redundancy: deleting so-called redundant clauses from the search space by preserving completeness of Sup. A clause in a set of clauses (i.e. in the search space) is redundant in , if there exist clauses in , such that and . That is, a clause is redundant in if it is a logical consequence of clauses that are smaller than w.r.t. . It is known that redundant clause can be removed from the search space without affecting completeness of superposition-based proof search. For this reason, saturation-based theorem provers, such as E, Spass and Vampire, not only generate new clauses but also delete redundant clauses during proof search by using both generating and simplifying inferences.

Simplification rules. A simplifying inference is an inference in which one premise becomes redundant after the addition of the conclusion to the search space, and hence can be deleted. In what follows, we will denote deleted clauses by drawing a line through it and refer to simplifying inferences as simplification rules. The premise that becomes redundant is called the main premise, whereas other premises are called side premises of the simplification rule. Intuitively, a simplification rule simplifies its main premise to its conclusion by using additional knowledge from its side premises. Inferences that are not simplifying are called generating, as they generate and add a new clause to the search space.

In saturation-based proof search, we distinguish between forward and backward simplifications. During forward simplification, a newly derived clause is simplified using previously derived clauses as side clauses. Conversely, during backward simplification a newly derived clause is used as side clause to simplify previously derived clauses.

Demodulation. One example of a simplification rule is demodulation, or also called rewriting by unit equalities. Demodulation is the following inference rule:

where , and , for some substitution .

It is easy to see that demodulation is a simplification rule. Moreover, demodulation is special case of a superposition inference where one premise of the inference is deleted. However, unlike a superposition inference, demodulation is not restricted to selected literals.

###### Example 3

Consider the clauses and . Let be the substitution . By the subterm property of , we have . Further, as equality literals are smaller than non-equality literals, we have . We thus apply demodulation and is simplified into the clause :

 \prftreef(f(x))≃f(x)\cancel{P(f(f(c)))∨Q(d)% }P(f(c))∨Q(d)

Deletion rules. Even when simplification rules are in use, deleting more/other redundant clauses is still useful to keep the search space small. For this reason, in addition to simplifying and generating rules, theorem provers also use deletion rules: a deletion rule checks whether clauses in the search space are redundant due to the presence of other clauses in the search space, and removes redundant clauses from the search space.

Given clauses and , we say subsumes if there is some substitution such that is a submultiset of , that is . Subsumption is the deletion rule that removes subsumed clauses from the search space.

###### Example 4

Let and be clauses in the search space. Using , it is easy to see that subsumes , and hence is deleted from the search space. ∎

## 4 Subsumption Demodulation

In this section we introduce a new simplification rule, called subsumption demodulation, by extending demodulation to a simplification rule over conditional equalities. We do so by combining demodulation with subsumption checks to find simplifying applications of rewriting by non-unit (and hence conditional) equalities.

### 4.1 Subsumption Demodulation for Conditional Rewriting

Our rule of subsumption demodulation is defined below.

###### Definition 1 (Subsumption Demodulation)

Subsumption demodulation is the inference rule:

 \prftreel≃r∨CL[t]∨DL[rσ]∨D (5)

where:

1. ,

2. ,

3. , and

4. .

We call the equality in the left premise of (5) the rewriting equality of subsumption demodulation.

It is easy to see that if and are valid, then also holds. We thus conclude:

###### Theorem 4.1 (Soundness)

Subsumption demodulation is sound.

Detecting possible applications of subsumption demodulation involves (i) selecting one equality of the side clause as rewriting equality and (ii) matching each of the remaining literals, denoted in (5), to some literal in the main clause. Step (i) is similar to finding unit equalities in demodulation, whereas step (ii) reduces to showing that subsumes parts of the main premise. Informally speaking, subsumption demodulation combines demodulation and subsumption, as discussed in Section 5. Note that in step (ii), matching allows any instantiation of to via substitution ; yet, we we do not unify the side and main premises of subsumption demodulation, as illustrated later in Example 7. Furthermore, we need to find a term in the unmatched part of the main premise, such that can be rewritten according to the rewriting equality into .

As the ordering is partial, the conditions of Definition 1 must be checked a posteriori, that is after subsumption demodulation has been applied with a fixed substitution and revise the substitution if needed. Note however that if in the rewriting equality, then for any substitution, so checking the ordering a priori helps, as illustrated in the following example.

###### Example 5

Let us consider the following two clauses:

 C1 =f(g(x))≃g(x)∨Q(x)∨R(y) C2 =P(f(g(c)))∨Q(c)∨Q(d)∨R(f(g(d)))

By the subterm property of , we conclude that . Hence, the rewriting equality, as well as any instance of it, is oriented.

Let be the substitution . Due to the previous paragraph, we know As equality literals are smaller than non-equality ones, we also conclude . Thus, we have and we can apply subsumption demodulation to and , deriving clause .

We note that demodulation cannot derive from and , as there is no unit equality. ∎

Example 5 highlights limitations of demodulation when compared to subsumption demodulation. We next illustrate different possible applications of subsumption demodulation using a fixed side premise and different main premises.

###### Example 6

Consider the clause . Only the first literal  is a positive equality and as such eligible as rewriting equality. Note that  and  are incomparable w.r.t. due to occurrences of different variables, and hence whether depends on the chosen substitution .

(1) Consider the clause  as the main premise. With the substitution , we have  as due to the subterm property of , enabling a possible application of subsumption demodulation over and .

(2) Consider now  as the main premise and the substitution . We have , as . The instance of the rewriting equality is oriented differently in this case than in the previous one, enabling a possible application of subsumption demodulation over and .

(3) On the other hand, using the clause  as the main premise, the only substitution we can use is . The corresponding instance of the rewriting equality is then , which cannot be oriented in general. Hence, subsumption demodulation cannot be applied in this case, even though we can find the matching term in . ∎

As mentioned before, the substitution appearing in subsumption demodulation can only be used to instantiate the side premise, but not for unifying side and main premises, as we would not obtain a simplification rule.

###### Example 7

Consider the clauses:

 C1 =f(c)≃c∨Q(d) C2 =P(f(c))∨Q(x)

As we cannot match to (although we could match to ), subsumption demodulation is not applicable with premises and . ∎

### 4.2 Simplification using Subsumption Demodulation

Note that in the special case where is the empty clause in (5), subsumption demodulation reduces to demodulation and hence it is a simplification rule. We next show that this is the case in general:

###### Theorem 4.2 (Simplification rule)

Subsumption demodulation is a simplification rule and we have:

 \prftreel≃r∨C\cancel{L[t]∨D}L[rσ]∨D

where:

1. ,

2. ,

3. , and

4. .

###### Proof

Because of the second condition of the definition of subsumption demodulation, is clearly a logical consequence of and . Moreover, from the fourth condition, we trivially have . It thus remains to show that is smaller than w.r.t. . As , the monotonicity property of asserts that , and hence . This concludes that is redundant w.r.t. the conclusion and left-most premise of subsumption demodulation. ∎

###### Example 8

By revisiting Example 5, Theorem 4.2 asserts that clause is simplified into , and subsumption demodulation deletes from the search space. ∎

### 4.3 Refining Redundancy

The fourth condition defining subsumption demodulation in Definition 1 is needed to ensure that the main premise of subsumption demodulation becomes redundant. However, comparing clauses w.r.t. the ordering is computationally expensive; yet, not necessary for subsumption demodulation. Following the notation of Definition 1, let such that . By properties of multiset orderings, the condition is equivalent to , as the literals in occur on both sides of . This means, to ensure the redundancy of the main premise of subsumption demodulation, we only need to ensure that there is a literal from such that this literal is bigger that the rewriting equality.

###### Theorem 4.3 (Refining redundancy)

The following two conditions are equivalent:

As mentioned in Section 4.1, application of subsumption demodulation involves checking that an ordering condition between premises holds (side condition 4 in Definition 1). Theorem 4.3 asserts that we only need to find a literal in that is bigger than the rewriting equality in order to ensure that the ordering condition is fulfilled. In the next section we show that by re-using and properly changing the underlying machinery of first-order provers for demodulation and subsumption, subsumption demodulation can efficiently be implemented in superposition-based proof search.

## 5 Subsumption Demodulation in Vampire

We implemented subsumption demodulation in the first-order theorem prover Vampire. Our implementation consists of about 5000 lines of C++ code and is available at:

As for any simplification rule, we implemented the forward and backward versions of subsumption demodulation separately. Our new Vampire options controlling subsumption demodulation are fsd and bsd, both with possible values on and off, to respectively enable forward and backward subsumption demodulation.

As discussed in Section 4, subsumption demodulation uses reasoning based on a combination of demodulation and subsumption. Algorithm 1 details our implementation for forward subsumption demodulation. In a nutshell, given a clause as main premise, (forward) subsumption demodulation in Vampire consists of the following main steps:

1. Retrieve candidate clauses as side premises of subsumption demodulation (line 1 of Algorithm 1). To this end, we design a new clause index with imperfect filtering, by modifying the subsumption index in Vampire, as discussed later in this section.

2. Prune candidate clauses by checking the conditions of subsumption demodulation (lines 11 of Algorithm 1), in particular selecting a rewriting equality and matching the remaining literals of the side premise to literals of the main premise. After this, prune further by performing a posteriori checks for orienting the rewriting equality , and checking the redundancy of the given main premise . To do so, we revised multi-literal matching and redundancy checking in Vampire (see later).

3. Build simplified clause by simplifying and deleting the (main) premise of subsumption demodulation using (forward) simplification (line 1 of Algorithm 1).

Our implementation of backward subsumption demodulation requires only a few changes to Algorithm 1: (i) we use the input clause as side premise of backward subsumption demodulation and (ii) we retrieve candidate clauses as potential main premises of subsumption demodulation. Additionally, (iii) instead of returning a single simplified clause , we record a replacement clause for each candidate clause where a simplification was possible.

#### Clause indexing for subsumption demodulation.

We build upon the indexing approach [14] used for subsumption in Vampire: the subsumption index in Vampire

stores and retrieves candidate clauses for subsumption. Each clause is indexed by exactly one of its literals. In principle, any literal of the clause can be chosen. In order to reduce the number of retrieved candidates, the best literal is chosen in the sense that the chosen literal maximizes a certain heuristic (e.g. maximal weight). Since the subsumption index is not a perfect index (i.e., it may retrieve non-subsumed clauses), additional checks on the retrieved clauses are performed.

Using the subsumption index of Vampire as the clause index for forward subsumption demodulation would however omit retrieving clauses (side premises) in which the rewriting equality is chosen as key for the index, omitting this way a possible application of subsumption demodulation. Hence, we need a new clause index in which the best literal can be adjusted to be the rewriting equality. To address this issue, we added a new clause index, called the forward subsumption demodulation index (FSD index), to Vampire, as follows: we index potential side premises either by their best literal (according to the heuristic), the second best literal, or both. If the best literal in a clause is a positive equality (i.e. a candidate rewriting equality) but the second best is not, is indexed by the second best literal, and vice versa. If both the best and second best literal are positive equalities, is indexed by both of them. Furthermore, because the FSD index is exclusively used by forward subsumption demodulation, this index only needs to keep track of clauses that contain at least one positive equality.

In the backward case, we can in fact reuse Vampire’s index for backward subsumption. Instead we need to query the index by the best literal, the second best literal, or both (as described in the previous paragraph).

#### Multi-literal matching.

Similarly to the subsumption index, our new subsumption demodulation index is not a perfect index, that is it performs imperfect filtering for retrieving clauses. Therefore, additional post-checks are required on the retrieved clauses. In our work, we devised a multi-literal matching approach to:

– choose the rewriting equality among the literals of the side premise , and

– check whether the remaining literals of can be uniformly instantiated to the literals of the main premise of subsumption demodulation.

There are multiple ways to organize this process. A simple approach is to (i) first pick any equality of a side premise as the rewriting equality of subsumption demodulation, and then (ii) invoke the existing multi-literal matching machinery of Vampire to match the remaining literals of with a subset of literals of . For the latter step (ii), the task is to find a substitution such that becomes a submultiset of the given clause . If the choice of the rewriting equality in step (i) turns out to be wrong, we backtrack. In our work, we revised the existing multi-literal matching machinery of Vampire to a new multi-literal matching approach for subsumption demodulation, by using the steps (i)-(ii) and interleaving equality selection with matching.

We note that the substitution in step (ii) above is built in two stages: first we get a partial substitution from multi-literal matching and then (possibly) extend to by matching term instances of the rewriting equality with terms of .

###### Example 9

Let be the clause . Assume that our (FSD) clause index retrieves the clause from the search space (line 1 of Algorithm 1). We then invoke our multi-literal matcher (line 1 of Algorithm 1), which matches the literal of to the literal of and selects the equality literal  of as the rewriting equality for subsumption demodulation over and . The matcher returns the choice of rewriting equality and the partial substitution . We arrive at the final substitution  only when we match the instance , that is , of the left-hand side of the rewriting equality to the literal  of . Using , subsumption demodulation over and will derive , after ensuring that becomes redundant (line 1 of Algorithm 1). ∎

We further note that multi-literal matching is an NP-complete problem. Our multi-literal matching problems may have more than one solution, with possibly only some (or none) of them leading to successful applications of subsumption demodulation. In our implementation, we examine all solutions retrieved by multi-literal matching. We also experimented with limiting the number of matches examined after multi-literal matching but did not observe relevant improvements. Yet, our implementation in Vampire also supports an additional option allowing the user to specify an upper bound on how many solutions of multi-literal matching should be examined.

#### Redundancy checking.

To ensure redundancy of the main premise after the subsumption demodulation inference, we need to check two properties. First, the instance of the rewriting equality must be oriented. This is a simple ordering check. Second, the main premise must be larger than the side premise . Thanks to Theorem 4.3, this latter condition is reduced to finding a literal among the unmatched part of the main premise that is bigger than the instance of the rewriting equality .

###### Example 10

In case of Example 9, the rewriting equality is oriented and hence is also oriented. Next, the literal is bigger than , and hence is redundant w.r.t. and . ∎

## 6 Experiments

We evaluated our implementation of subsumption demodulation in Vampire on the examples of the TPTP [16] and SMT-LIB [4] repositories. All our experiments were carried out on the StarExec cluster [15].

Benchmark setup. From the 22,686 problems in the TPTP benchmark set, Vampire can parse 18,232 problems. Out of these problems, we only used those problems that involve equalities as subsumption demodulation is only applicable in the presence of (at least one) equality. As such, we used 13,924 TPTP problems in our experiments.

On the other hand, when using the SMT-LIB repository, we chose the benchmarks from categories LIA, UF, UFDT, UFDTLIA, and UFLIA, as these benchmarks involve reasoning with both theories and quantifiers and the background theories are the theories that Vampire supports. These are 22,951 SMT-LIB problems in total, of which 22,833 problems remain after removing those where equality does not occur.

Comparative experiments with Vampire. As a first experimental study, we compared the performance of subsumption demodulation in Vampire for different values of fsd and bsd, that is by using forward (FSD) and/or backward (BSD) subsumption demodulation. To this end, we evaluated subsumption demodulation using the CASC and SMTCOMP schedules of Vampire’s portfolio mode. In order to test subsumption demodulation with the portfolio mode, we added the options fsd and/or bsd to all strategies of Vampire. While the resulting strategy schedules could potentially be further improved, it allowed us to test FSD/BSD with a variety of strategies.

Our results are summarized in Tables 1-2. The first column of these tables lists the Vampire version and configuration, where Vampire refers to Vampire in its portfolio mode (version 4.4). Lines 2-4 of these tables use our new Vampire, that is our implementation of subsumption demodulation in Vampire. The column “Solved” reports, respectively, the total number of TPTP and SMT-LIB problems solved by the considered Vampire configurations. Column “New” lists, respectively, the number of TPTP and SMT-LIB problems solved by the version with subsumption demodulation but not by the portfolio version of Vampire. This column also indicates in parentheses how many of the solved problems were satisfiable/unsatisfiable.

While in total the portfolio mode of Vampire can solve more problems, we note that this comes at no suprise as the portfolio mode of Vampire is highly tuned using the existing Vampire options. In our experiments, we were interested to see whether subsumption demodulation in Vampire can solve problems that cannot be solved by the portfolio mode of Vampire. The columns “New” of Tables 1-2 give practical evidence of the impact of subsumption demodulation: there are 30 new TPTP problems and 76 SMT-LIB problems333The list of these new problems is available at
that the portfolio version of Vampire cannot solve, but forward and backward subsumption demodulation in Vampire can.

New problems solved only by subsumption demodulation. Building upon our results from Tables 1-2, we analysed how many new problems subsumption demodulation in Vampire can solve when compared to other state-of-the-art reasoners. To this end, we evaluated our work against the superposition provers E (version 2.4) and Spass (version 3.9), as well as the SMT solvers CVC4 (version 1.7) and Z3 (version 4.8.7). We note however, that when using our 30 new problems from Table 1, we could not compare our results against Z3 as Z3 does not natively parse TPTP. On the other hand, when using our 76 new problems from Table 2, we only compared against CVC4 and Z3, as E and Spass do not support the SMT-LIB syntax.

Table 3 summarizes our findings. First, 11 of our 30 “new” TPTP problems can only be solved using forward and backward subsumption demodulation in Vampire; none of the other systems were able solve these problems.

Second, while all our 76 “new” SMT-LIB problems can also be solved by CVC4 and Z3 together, we note that out of these 76 problems there are 10 problems that CVC4 cannot solve, and similarly 27 problems that Z3 cannot solve.

Comparative experiments without AVATAR. Finally, we investigated the effect of subsumption demodulation in Vampire without AVATAR [18]. We used the default mode of Vampire (that is, without using a portfolio approach) and turned off the AVATAR setting. While this configuration solves less problems than the portfolio mode of Vampire, so far Vampire is the only superposition-based theorem prover implementing AVATAR. Hence, evaluating subsumption demodulation in Vampire without AVATAR is more relevant to other reasoners. Further, as AVATAR may often split non-unit clauses into unit clauses, it may potentially simulate applications of subsumption demodulation using demodulation. Table 4 shows that this is indeed the case: with both fsd and bsd enabled, subsumption demodulation in Vampire can prove 190 TPTP problems and 173 SMT-LIB examples that the default Vampire without AVATAR cannot solve. Again, the column “New” denotes the number of problems solved by the respective configuration but not by the default mode of Vampire without AVATAR.

## 7 Conclusion

We introduced the simplifying inference rule subsumption demodulation to improve support for reasoning with conditional equalities in superposition-based first-order theorem proving. Subsumption demodulation revises existing machineries of superposition provers and can therefore be efficiently integrated in superposition reasoning. Our implementation in Vampire shows that subsumption demodulation solves many new examples that existing provers, including first-order and SMT solvers, cannot handle. Future work includes the design of more sophisticated approaches for selecting rewriting equalities and improving the imperfect filtering of clauses indexes.

#### Acknowledgements.

This work was funded by the ERC Starting Grant 2014 SYMCAR 639270, the ERC Proof of Concept Grant 2018 SYMELS 842066, the Wallenberg Academy Fellowship 2014 TheProSE, and the Austrian FWF research project W1255-N23.

## References

• [1] Bachmair, L., Ganzinger, H.: Rewrite-Based Equational Theorem Proving with Selection and Simplification. J. Log. Comput. 4(3), 217–247 (1994)
• [2] Bachmair, L., Ganzinger, H., McAllester, D.A., Lynch, C.: Resolution Theorem Proving. In: Handbook of Automated Reasoning, pp. 19–99 (2001)
• [3] Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: International Conference on Computer Aided Verification. pp. 171–177. Springer (2011)
• [4] Barrett, C., Fontaine, P., Tinelli, C.: The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.org (2016)
• [5] Barthe, G., Eilers, R., Georgiou, P., Gleiss, B., Kovács, L., Maffei, M.: Verifying Relational Properties using Trace Logic. In: Proc. of FMCAD. pp. 170–178 (2019)
• [6] Bjørner, N., Gurfinkel, A., McMillan, K.L., Rybalchenko, A.: Horn clause solvers for program verification. In: Fields of Logic and Computation II. pp. 24–51 (2015)
• [7] De Moura, L., Bjørner, N.: Z3: An efficient SMT solver. In: International conference on Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–340. Springer (2008)
• [8] Ganzinger, H., Hagen, G., Nieuwenhuis, R., Oliveras, A., Tinelli, C.: DPLL(T): Fast Decision Procedures. In: Proc. of CAV. pp. 175–188 (2004)
• [9] Hillenbrand, T., Piskac, R., Waldmann, U., Weidenbach, C.: From search to computation: Redundancy criteria and simplification at work. In: Voronkov, A., Weidenbach, C. (eds.) Programming Logics: Essays in Memory of Harald Ganzinger, pp. 169–193. Springer Berlin Heidelberg, Berlin, Heidelberg (2013)
• [10] Kovács, L., Voronkov, A.: First-order theorem proving and vampire. In: International Conference on Computer Aided Verification. pp. 1–35. Springer (2013)
• [11] Nieuwenhuis, R., Rubio, A.: Paramodulation-Based Theorem Proving. In: Handbook of Automated Reasoning, pp. 371–443 (2001)
• [12] Reynolds, A., Woo, M., Barrett, C.W., Brumley, D., Liang, T., Tinelli, C.: Scaling Up DPLL(T) String Solvers Using Context-Dependent Simplification. In: Proc. of CAV. pp. 453–474 (2017)
• [13] Schulz, S., Cruanes, S., Vukmirovic, P.: Faster, higher, stronger: E 2.3. In: Proc. of CADE. pp. 495–507 (2019)
• [14] Sekar, R., Ramakrishnan, I.V., Voronkov, A.: Term indexing. In: Robinson, J.A., Voronkov, A. (eds.) Handbook of Automated Reasoning, pp. 1853–1964. Elsevier Science Publishers B. V. (2001)
• [15] Stump, A., Sutcliffe, G., Tinelli, C.: StarExec: A Cross-Community Infrastructure for Logic Solving. In: Proc. of IJCAR. pp. 367–373 (2014)
• [16] Sutcliffe, G.: The TPTP Problem Library and Associated Infrastructure. From CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning 59(4), 483–502 (Feb 2017)
• [17] Tange, O.: GNU Parallel 2018. Ole Tange (Mar 2018)
• [18] Voronkov, A.: AVATAR: the architecture for first-order theorem provers. In: Proc. of CAV. pp. 696–710 (2014)
• [19] Weidenbach, C.: Combining Superposition, Sorts and Splitting. In: Handbook of Automated Reasoning, pp. 1965–2013 (2001)
• [20] Weidenbach, C., Dimova, D., Fietzke, A., Kumar, R., Suda, M., Wischnewski, P.: SPASS version 3.5. In: Proc. of CADE. pp. 140–145 (2009)
• [21] Weidenbach, C., Wischnewski, P.: Contextual Rewriting in SPASS. In: Proc. of PAAR (2008)