## 1. Introduction

Superposition is a highly successful calculus for reasoning about first-order
logic with equality. We are interested in *graceful* generalizations to
higher-order logic: calculi that, as much as possible, coincide with standard
superposition on first-order problems and that scale up to arbitrary
higher-order problems.

As a stepping stone towards full higher-order logic, in this article we restrict our attention to a clausal -free fragment of polymorphic higher-order logic that supports partial application and application of variables (Section 2). This formalism is expressive enough to permit the axiomatization of higher-order combinators such as (intended to denote the iterated application ):

Conventionally, functions are applied without parentheses and commas, and variables are italicized. Notice the variable number of arguments to and the application of The expressiveness of full higher-order logic can be recovered by introducing -style combinators to represent -abstractions and proxies for the logical symbols [42, 53].

A widespread technique to support partial application and application of
variables in first-order logic is to make all symbols nullary and to represent
application of functions by a distinguished binary symbol ,
where is an uninterpreted binary type constructor.
Following this scheme,
the higher-order term ,
where , is translated to
—or rather
if we specify the type arguments.
We call this the *applicative encoding.*
The existence of such a reduction to first-order logic explains why
-free higher-order terms are also called “applicative first-order
terms.”
Unlike for
full higher-order logic, most general unifiers are unique for
our -free fragment, just as they are for
applicatively encoded first-order terms.

Although the applicative encoding is complete [42] and is employed fruitfully in tools such as HOLyHammer and Sledgehammer [18], it suffers from a number of weaknesses, all related to its gracelessness. Transforming all the function symbols into constants considerably restricts what can be achieved with term orders; for example, argument tuples cannot be compared using different methods for different symbols [43, Section 2.3.1]

. In a prover, the encoding also clutters the data structures, slows down the algorithms, and neutralizes the heuristics that look at the terms’ root symbols. But our chief objection is the sheer clumsiness of encodings and their poor integration with interpreted symbols. And they quickly accumulate; for example, using the traditional encoding of polymorphism relying on a distinguished binary function symbol

[17, Section 3.3] in conjunction with the applicative encoding, the term becomes . The term’s simple structure is lost in translation.Hybrid schemes have been proposed to strengthen the applicative encoding: If a given symbol always occurs with at least arguments, these can be passed directly [47]. However, this relies on a closed-world assumption: that all terms that will ever be compared arise in the input problem. This noncompositionality conflicts with the need for complete higher-order calculi to synthesize arbitrary terms during proof search [12]

. As a result, hybrid encodings are not an ideal basis for higher-order automated reasoning.

Instead, we propose to generalize the superposition calculus to *intensional* and
*extensional* clausal -free higher-order logic. For the extensional
version of the logic, the property holds for all functions of the same type. For each logic, we
present two calculi (Section 3). The intentional
calculi perfectly coincide with standard superposition on first-order
clauses; the extensional calculi depend on an extra axiom.

Superposition is parameterized by a term order, which is used to prune the
search space. If we assume that the term order is a simplification order
enjoying totality on ground terms (i.e., terms containing no term or type
variables), the standard calculus rules and completeness proof can be lifted
verbatim. The only necessary changes concern the basic definitions of terms and
substitutions.
However, there is one monotonicity property that is hard to obtain
unconditionally: *compatibility with arguments*. It states that implies for all terms such that and are well typed.
Blanchette, Waldmann, and colleagues recently introduced graceful
generalizations of the lexicographic path order (LPO)
[20] and the Knuth–Bendix order (KBO)
[6] with argument coefficients, but they both lack this
property. For example, given a KBO with , it may well be
that if
has a large enough multiplier on its argument.

Our superposition calculi are designed to be refutationally complete for such
nonmonotonic orders (Section 4).
To achieve this, they include an inference rule for argument congruence, which
derives from .
The redundancy criterion is defined in such a way that the larger,
derived clause is not subsumed by the premise. In the completeness proof, the
most difficult case is the one
that normally excludes superposition at or below variables using the induction
hypothesis. With nonmonotonicity, this approach no longer works, and we propose two
alternatives: Either perform some superposition inferences into higher-order
variables or “purify” the clauses to circumvent the issue. We refer to the
corresponding calculi as *nonpurifying* and *purifying.*

The calculi are implemented in the Zipperposition prover [29] (Section 5). We evaluate them on first- and higher-order Isabelle/HOL [23] and TPTP benchmarks [63, 62] and compare them with the applicative encoding (Section 6). We find that there is a substantial cost associated with the applicative encoding, that the nonmonotonicity is not particularly expensive, and that the nonpurifying calculi outperform the purifying variants.

An earlier version of this work was presented at IJCAR 2018 [11]. This article extends the conference paper with detailed soundness and completeness proofs and more explanations. Because of too weak selection restrictions on the purifying variants, our claim of refutational completeness in the conference version was not entirely correct. We now strengthened the selection restrictions accordingly. Moreover, we extended the logic with polymorphism, leading to minor modifications to the calculus. We also simplified the presentation of the clausal fragment of the logic that interests us. In particular, we removed mandatory arguments. The redundancy criterion also differs slightly from the conference version. Finally, we updated the empirical evaluation to reflect recent improvements in the Zipperposition prover.

## 2. Logic

Our logic is intended as a convenient intermediate step on the way towards full higher-order logic (also called simple type theory) [27, 36]. Refutational completeness of calculi for higher-order logic is usually stated in terms of Henkin semantics [38, 12], in which the universes used to interpret functions need only contain the functions that can be expressed as terms. Since the terms of -free higher-order logic exclude -abstractions, in “-free Henkin semantics” the universes interpreting functions can be even smaller. In that sense, our semantics resemble Henkin prestructures [45, Section 5.4]. In contrast to other higher-order logics [64], there are no comprehension principles, and we disallow nesting of Boolean formulas inside terms.

### 2.1. Syntax

We fix a set of type constructors with arities and a set of type variables. We require at least one nullary type constructor and a binary type constructor to be present in . Types of -free higher-order logic are either a type variable or of the form for an -ary type constructor and types . Here and elsewhere, we write or to abbreviate the tuple or product , for . We write for and for . A type declaration is an expression of the form (or simply if if ), where all type variables occurring in belong to .

We fix a set
of symbols with type declarations,
written as or ,
and a set of typed variables, written as
or .
To avoid empty
Herbrand universes, we require to contain
a symbol with type declaration .
The sets form the logic’s signature.
We reserve the letters for terms and
for variables and write
to indicate their type. The set of -free higher-order
terms is defined inductively as
follows. Every variable in is a term. If is a symbol and
are types,
then is a term. If
and , then is a term,
called an *application*. Non-application terms are called
*heads*.
A term is *ground* if it is built without using type or term variables.
Using the spine notation [26], terms can be
decomposed in a unique way as a head applied to zero or more
arguments: or (abusing notation).
Substitution and unification are generalized in the obvious way, without the
complexities associated with -abstractions; for example, the most
general unifier of and is
,
and that of and is .

An equation is formally an unordered pair of terms and . A literal is an equation or a negated equation, written or . A clause is a finite multiset of literals . The empty clause is written as .

### 2.2. Semantics

A *type interpretation* is defined as follows.
The set is a nonempty collection of nonempty sets, called
*universes*.
The function associates a function
with each -ary type constructor .
A *type valuation* is a function that maps every type variable to a universe.
The *denotation* of a type for a type interpretation
and a type valuation is defined by
and
.
Here and elsewhere, we abuse notation by applying an operation on a tuple when it must be applied elementwise;
thus, stands for .

A type valuation can be extended to be a *valuation* by additionally
assigning an element to each variable .
An *interpretation function* for a type interpretation
associates with each symbol
and universe tuple
a value
,
where is the type valuation that maps each to .
Loosely following Fitting [35, Section 2.5],
an *extension function*
associates to any pair of universes a
function
.
Together, a type interpretation, an interpretation function, and an extension function
form an *interpretation* .

An interpretation is *extensional* if is
injective for all .
Both intensional and extensional logics are widely used for interactive
theorem proving; for example, Coq’s
calculus of inductive constructions is intensional [14],
whereas Isabelle/HOL is extensional [49].
The semantics is *standard* if
is bijective for all .

For an interpretation and a valuation , the denotation of a term is defined as follows: For variables , let . For symbols , let . For applications of a term to a term , let , , and . If is a ground term, we also write for the denotation of because it does not depend on the valuation.

An equation is true in for if ; otherwise, it is false. A disequation is true if is false. A clause is true if at least one of its literals is true. The interpretation is a model of a clause , written , if is true in for all valuations . It is a model of a set of clauses if it is a model of all contained clauses.

For example, given the signature , the clause has an extensional model with , (), , , , , .

## 3. The Inference Systems

We introduce four versions of the *clausal -free higher-order
superposition calculus*, varying along two axes: intentional versus
extensional, and nonpurifying versus purifying. To avoid repetitions, our
presentation unifies them into a single framework.

### 3.1. The Inference Rules

The calculi are parameterized by a partial order on ground terms
that is well founded and total and that has the subterm property.
It must also be *compatible with green contexts*, meaning that
implies
.
On the other hand, it need not be *compatible with arguments*: need not imply .
Green contexts are built around *green subterms*,
defined inductively as follows.
A term is a green subterm of if either
; or
and is a green subterm of for some .
We write to indicate that
the subterm of is a green subterm;
correspondingly, is a green context.
For example, and are subterms of
, but not green subterms;
correspondingly, and are not
green contexts.

For nonground terms, the only requirement on is stability under grounding substitutions (i.e., implies for all substitutions grounding and ). The literal and clause orders are defined from as multiset extensions in the standard way [4]. Despite their names, the term, literal, and clause orders need not be transitive on nonground entities.

Literal selection is supported.
The selection function maps each clause to a subclause of
consisting of negative literals.
A literal is (*strictly*) *eligible* w.r.t. a substitution in if it is
selected in or there are no selected literals in and is
(strictly) maximal in
If is the identity substitution, we leave it implicit.

The following four rules are common to all four calculi. We regard positive and negative superposition as two cases of the same rule

where
;
;
;
is strictly eligible w.r.t. in ;
is eligible w.r.t. in
and, if positive, even strictly eligible;
and .
Moreover, the *variable condition* must hold;
it varies from one calculus to another and
is specified below.

The equality resolution and equality factoring rules are almost identical to their standard counterparts:

The side conditions for EqRes are and is eligible w.r.t. in . The side conditions for EqFact are , , , and is eligible w.r.t. in .

The following *argument congruence* rule compensates for the limitation
that the superposition rule applies only to green subterms:

The literal must be strictly eligible w.r.t. in , and is a nonempty tuple of distinct fresh variables. The substitution is the most general type substitution that ensures well-typedness of the conclusion. In particular, if takes arguments, there are ArgCong conclusions for this literal, for which is the identity and is a tuple of 1, …, , or variables. If the result type of is a type variable, we have in addition infinitely many ArgCong conclusions, for which instantiates the type variable in the result type of with with for some and fresh type variables and and for which is a tuple of variables.

For the intensional nonpurifying
variant, the variable condition
of the Sup rule is as follows:
“Either or there exists a grounding substitution with
and .”
This condition generalizes the standard condition that .
The two coincide if is first-order or if the term order is monotonic. In some cases
involving nonmonotonicity, the variable condition effectively mandates
Sup inferences at variable positions of the right premise, but never below.
We will call theses inferences *at variables*.

For the extensional nonpurifying calculus, the variable condition uses the following definition.

###### Definition .

A term of the form , for , *jells* with a literal if
and
for some terms , and distinct variables
that do not occur elsewhere in

Using the naming convention from Definition 3.1 for , the variable condition can be stated as follows: “If has a variable head and jells with the literal , there must exist a grounding substitution with and , where .” If is first-order, this amounts to . Since the order is compatible with green contexts, the substitution can exist only if occurs applied in .

Moreover, the extensional nonpurifying calculus has one additional rule, the positive extensionality rule, and one axiom, the extensionality axiom. The rule is

where is a tuple of distinct variables that do not occur in , , or , and is strictly eligible in the premise. The extensionality axiom uses a polymorphic Skolem symbol characterized by the axiom

(Ext) |

Unlike the nonpurifying calculi, the purifying calculi never perform superposition at variables.
Instead, they rely on purification
[24, 31, 54, 58]
(also called abstraction) to circumvent nonmonotonicity.
The idea
is to rename apart problematic occurrences of a variable in
a clause to and to add *purification literals*
, …, to connect the new variables to .
We must then ensure
that all clauses are purified, by processing the initial clause set
and the conclusion of every inference or simplification.

In the intensional purifying calculus, the purification of clause is defined as the result of the following procedure. Choose a variable that occurs applied in and also unapplied in a literal of that is not of the form . If no such variable exists, terminate. Otherwise, replace all unapplied occurrences of in by a fresh variable and add the purification literal . Then repeat the procedure with another variable. For example,

The variable condition is “.” The conclusion of ArgCong is changed to ; the other rules preserve purity of their premises.

In the extensional purifying calculus, is defined as follows. Choose a variable occurring in green subterms and in literals of that are not of the form , where and are distinct (possibly empty) term tuples. If no such variable exists, terminate. Otherwise, replace all green subterms with , where is fresh, and add the purification literal . Then repeat the procedure until no variable fulfilling the requirements is left. For example,

Like the extensional nonpurifying calculus, this calculus variant also contains the PosExt rule and axiom (Ext) introduced above. The variable condition is “either has a non-variable head or does not jell with the literal ” The conclusion of each rule is changed to , except for PosExt, which preserves purity.

Finally, we impose further restrictions on literal selection. In the nonpurifying variants, a literal may not be selected if is a maximal term of the clause and the literal contains a green subterm with . In the purifying calculi, a literal may not be selected if it contains a variable of functional type. These restrictions are needed for our completeness proof. It might be possible to avoid them at the cost of a more elaborate argument.

###### Remark .

In descriptions of first-order logic with equality, the property is often referred to as “function congruence.” It seems natural to use the same label for the higher-order version and to call the companion property “argument congruence,” whence the name ArgCong for our inference rule. This nomenclature is far from universal; for example, the Isabelle/HOL theorem fun_cong captures argument congruence and arg_cong captures function congruence.

### 3.2. Rationale for the Inference Rules

A key restriction of all four calculi is that they superpose only at
green subterms, mirroring the term order’s compatibility with green contexts. The
ArgCong rule then makes it possible to simulate superposition at
non-green subterms. However, in conjunction with the Sup rules,
ArgCong can exhibit an unpleasant behavior, which we call
*argument congruence explosion*:

In both derivation trees, the higher-order variable is effectively the target of a Sup inference. Such derivations essentially amount to superposition at variable positions (as shown on the left) or even superposition below variable positions (as shown on the right), both of which can be extremely prolific. In standard superposition, the explosion is averted by the condition on the Sup rule that In the extensional purifying calculus, the variable condition tests that either has a non-variable head or does not jell with the literal which prevents derivations such as the above. In the corresponding nonpurifying variant, some such derivations may need to be performed when the term order exhibits nonmonotonicity for the terms of interest.

In the intensional calculi, the explosion can arise because the variable conditions are weaker. The following example shows that the intensional nonpurifying calculus would be incomplete if we used the variable condition of the extensional nonpurifying calculus.

###### Example .

Consider a left-to-right LPO [20] instance with precedence , and consider the following unsatisfiable clause set:

The only possible inference is a Sup inference of the first into the second clause, but the variable condition of the extensional nonpurifying calculus is not met.

It is unclear whether the variable condition of the intensional purifying calculus could be strengthened, but our completeness proof suggests that it cannot.

The variable conditions in the extensional calculi are designed to prevent the argument congruence explosion shown above, but since they consider only the shape of the clauses, they might also block Sup inferences whose side premises do not originate from ArgCong. This is why we need the PosExt rule.

###### Example .

In the following unsatisfiable clause set, the only possible inference from these clauses in the extensional nonpurifying calculus is PosExt, showing its necessity:

The same argument applies for the purifying variant with the difference that the third clause must be purified.

Due to nonmonotonicity, for refutational completeness we need either to purify the clauses or to allow some superposition at variable positions, as mandated by the respective variable conditions. Without either of these measures, at least the extensional calculi and presumably also the intensional calculi would be incomplete, as the next example demonstrates.

###### Example .

Consider the following clause set:

Using a left-to-right LPO [20] instance with precedence , this clause set is saturated w.r.t. the extensional purifying calculus when omitting purification. It also quickly saturates using the extensional nonpurifying calculus when omitting Sup inferences at variables. By contrast, the intensional variants derive , even without purification and without Sup inferences at variables, because of the less restrictive variable conditions.

This raises the question as to whether the intensional variants actually need to purify or to perform Sup inferences at variables. Omitting purification and Sup inferences at variables in the intensional calculi is complete when redundant clauses are kept, but we conjecture that it is incomplete in general.

We initially considered inference rules instead of axiom (Ext). However, we did not find a set of inference rules that is complete and leads to fewer inferences than (Ext). We considered the PosExt rule described above in combination with the following rule:

where is a fresh Skolem symbol and and are the type and term variables occurring free in the the literal . However, these two rules do not suffice for a refutationally complete calculus, as the following example demonstrates:

###### Example .

Consider the clause set

Assuming that all four equations are oriented from left to right, this set is saturated w.r.t. the extensional calculi if (Ext) is replaced by NegExt; yet it is unsatisfiable in an extensional logic.

###### Example .

A significant advantage of our calculi over the use of standard superposition on applicatively encoded problems is the flexibility they offer in orienting equations. The following equations provide two definitions of addition on Peano numbers:

Let be the negated conjecture. With LPO, we can use a left-to-right comparison for ’s arguments and a right-to-left comparison for ’s arguments to orient all four equations from left to right. Then the negated conjecture can be simplified to by simplification (demodulation), and can be derived with a single inference. If we use the applicative encoding instead, there is no instance of LPO or KBO that can orient both recursive equations from left to right. For at least one of the two sides of the negated conjecture, simplification is replaced by 100 Sup inferences, which is much less efficient, especially in the presence of additional axioms.

### 3.3. Soundness

To show the inferences’ soundness, we need the substitution lemma for our logic:

###### Lemma (Substitution lemma).

Let be a -free higher-order interpretation. Then

for all terms , all types , and all substitutions , where for all type variables and for all term variables .

###### Proof.

First, we prove that by induction on the structure of . If is a type variable,

If for some type constructor and types ,

Next, we prove by structural induction on . If , then by the definition of the denotation of a variable

If , then by the definition of the term denotation

If , then by the definition of the term denotation

where is of type , , and . ∎

###### Lemma .

If for some interpretation and some clause , then for all substitutions .

###### Proof.

We need to show that is true in for all valuations . Given a valuation , define as in Lemma 3.3. Then, by Lemma 3.3, a literal in is true in for if and only if the corresponding literal in is true in for . There must be at least one such literal because and hence is in particular true in for . Therefore, is true in for . ∎

###### Theorem (Soundness of the intensional calculi).

The inference rules Sup, EqRes, EqFact, and ArgCong are sound (even without the variable condition and the side conditions on order and eligibility).

###### Proof.

We fix an inference and an interpretation that is a model of the premises. We need to show that it is also a model of the conclusion.

From the definition of the denotation of a term, it is obvious that congruence holds at all subterms in our logic. By Lemma 3.3, is a model of the -instances of the premises as well, where is the substitution used for the inference. Fix a valuation . By making case distinctions on the truth in under of the literals of the -instances of the premises, using the conditions that is a unifier, and applying congruence, it follows that the conclusion is also true in under . ∎

###### Theorem (Soundness of the extensional calculi).

The inference rules Sup, EqRes, EqFact, ArgCong, and PosExt are sound w.r.t. extensional interpretations (even without the variable condition and the side conditions on order and eligibility).

###### Proof.

We only need to prove soundness of PosExt. For the other rules, we can proceed as in Theorem 3.3. By induction on the length of , it suffices to prove soundness of PosExt for one variable instead of a tuple . We fix an inference and an extensional interpretation that is a model of the premise . We need to show that it is also a model of the conclusion .

Let be a valuation. If is true in under , the conclusion is clearly true as well. Otherwise is false in under , and also under for all because does not occur in . Since the premise is true in , must be true in under for all . Hence, for appropriate universes , we have . Since and do not contain , and do not depend on . Thus, . Since is extensional, is injective and hence . It follows that is true in under , and so is the entire conclusion of the inference. ∎

A problem expressed in higher-order logic must be transformed into clausal normal form before the calculi can be applied. This process works as in the first-order case, except for skolemization. The issue is that skolemization, when performed naively, is unsound for higher-order logic with a Henkin semantics [48, Section 6], because it introduces new functions that can be used to instantiate variables.

The core of this article is not affected by this because the problems are given in clausal form. For the implementation, we claim soundness only w.r.t. models that satisfy the axiom of choice, which is the semantics mandated by the TPTP THF format [62]. By contrast, refutational completeness holds w.r.t. arbitrary models as defined above. Alternatively, skolemization can be made sound by introducing mandatory arguments as described by Miller [48, Section 6] and in the conference version of this article [11].

This issue also affects axiom (Ext) because it contains the Skolem symbol . As a consequence, (Ext) does not hold in all extensional interpretations. The extensional calculi are thus only sound w.r.t. interpretations in which (Ext) holds. However, we can prove that (Ext) is compatible with our logic:

###### Theorem .

Axiom (Ext) is satisfiable.

###### Proof.

For a given signature, let be an Herbrand interpretation. That is, we define to contain the set of all terms of type for each ground type , we define by , we define by , and we define by . Then is clearly injective and hence is extensional. To show that , we need to show that (Ext) is true under all valuations. Let be a valuation. If is true under , (Ext) is also true. Otherwise is false, and hence . Then we have . Therefore,