Log In Sign Up

GRUNGE: A Grand Unified ATP Challenge

by   Chad E. Brown, et al.

This paper describes a large set of related theorem proving problems obtained by translating theorems from the HOL4 standard library into multiple logical formalisms. The formalisms are in higher-order logic (with and without type variables) and first-order logic (possibly with multiple types, and possibly with type variables). The resultant problem sets allow us to run automated theorem provers that support different logical formats on corresponding problems, and compare their performances. This also results in a new "grand unified" large theory benchmark that emulates the ITP/ATP hammer setting, where systems and metasystems can use multiple ATP formalisms in complementary ways, and jointly learn from the accumulated knowledge.


page 1

page 2

page 3

page 4


Solving QMLTP Problems by Translation to Higher-order Logic

This paper describes an evaluation of Automated Theorem Proving (ATP) sy...

On Reductions of Hintikka Sets for Higher-Order Logic

Steen's (2018) Hintikka set properties for Church's type theory based on...

Adversarial Learning to Reason in an Arbitrary Logic

Existing approaches to learning to prove theorems focus on particular lo...

HOList: An Environment for Machine Learning of Higher-Order Theorem Proving

We present an environment, benchmark, and deep learning driven automated...

Sharing a Library between Proof Assistants: Reaching out to the HOL Family

We observe today a large diversity of proof systems. This diversity has ...

Polymorphism and the obstinate circularity of second order logic: a victims' tale

The investigations on higher-order type theories and on the related noti...

Machine Learner for Automated Reasoning 0.4 and 0.5

Machine Learner for Automated Reasoning (MaLARea) is a learning and reas...

1 Introduction

A hammer [7] for an interactive theorem prover (ITP) [23] typically translates an ITP goal into a formalism used by an Automated Theorem Prover (ATP). Since the most successful ATPs have so far been first-order, the focus has been on first-order translations. There is also interest in ATPs working in richer formalisms, such as monomorphic and polymorphic, typed first-order and higher-order logics. The TPTP formats have been adopted for this work, viz. TF0 [47], TF1 [8], TH0 [5], and TH1 [27]. An interesting related task is to create a (grand) unified large-theory benchmark that will allow fair comparison of such ATP systems, their combination and integration with premise selectors and machine learners [1] across the different formalisms. As a step towards creating such benchmarks we present two families of translations from the language of HOL4 [42] to the various TPTP formats. We have implemented these translations and plan to use them as the first (grand) unified benchmarks, generalizing existing benchmarks such as the CakeML export [33] that was used in the LTB division of the CASC-J9 ATP competition [48].

The rest of the paper is structured as follows. Section 2 introduces notation and the HOL syntax. Section 3 introduces the problems – the HOL4 standard library. Section 4 introduces the first family of translations, and Section 5 introduces the second family of translations. Section 6 discusses and compares the translations on an example, and Section 7 evaluates the translations using existing ATPs. Section 8 describes the updated CASC LTB division setup. Related work is discussed in Section 9.

2 Preliminaries

Since our work is based on the HOL4 standard library, it is necessary to start with brief comments about the syntax and notion of proof in HOL4. More detailed information is in [42, 20]. HOL4, like several other ITPs (e.g., Isabelle/HOL [37], HOL Light [21], ProofPower [29]), is based on an extension of Church’s simple type theory [13] that includes prefix polymorphism and type definitions [20]. HOL4 includes a type of propositions, a type of individuals and a type of functions from a type to a type . Parentheses are omitted, with associating to the right. In addition, there are type variables and defined types. At each point in the development of the HOL4 library there is a finite set of (previously defined) constants , and a a finite set of (previously defined) type constructors giving a type for types . For simplicity we consider the signature to be fixed. To be more precise, we would need to speak of the set of types and terms relative to the current signature.

Terms are generated from constants and variables using application and -abstractions in the expected way, for terms and . Parentheses are omitted, with application associating the left. Binders have scope as far to the right as possible, consistent with parentheses. Multiple binders over the same type can be written in a combined form, e.g., means .

Constants may be polymorphic. There are two primitive polymorphic logical constants: is polymorphic with type and is polymorphic with type , where is a type variable. When terms are defined, such constants are used with a fixed type for written as a superscript. New polymorphic constants can be defined within a HOL4 theory.

Aside from and , implication of type is primitive. From these primitive logical constants it is possible to define , and as well as polymorphic operators and . The usual notation is used for these logical connectives, so that the binder notation is written as , using the same binding conventions as for -abstractions. Similarly, is written as .

Terms of type are called propositions and we use and to range over propositions. A sequent is a pair where is a finite set of propositions and is a proposition. There is a notion of HOL4 provability for sequents. While our translations map HOL4 sequents to TPTP formulas, it is not our intention to mirror HOL4 provability in the target format. The intention, roughly speaking, is to gain information about when a HOL4 theorem is a consequence of previous HOL4 theorems, in some logic weaker than HOL4. Further discussion is beyond the scope of this paper.

In the simplest case, each translation will translate HOL4 types and HOL4 terms (including propositions) to terms in the target format. A type, term or sequent with no type variables is called monomorphic. As an optimization, some of the translations will translate some monomorphic HOL4 types to types in the target language. To have common notation, for a HOL4 type we write for translated as a term and for translated as a type. Another optimization is to translate HOL4 propositions (and sequents) to the level of formulas in the target language.

3 Problem Set: The Hol4 Standard Library

The current HOL4 standard library contains 15733 formulas: 8 axioms, 2294 definitions, and 13431 theorems. If most of the formulas were monomorphic and fell into a natural first-order fragment of HOL4, then there would be a natural translation into the FOF format. However, many formulas are either polymorphic or higher-order (or both), as Table 1 shows (note that the numbers are not cumulative, e.g., the 2232 monomorphic first-order formulas does not include the 1632 uni-sorted first-order formulas, which could also be processed by an ATP that can handle the monomorphic types). The problem set consists of 12140 theorems proven in the HOL4 standard library1111291 theorems were not included due to dependencies being erased during the build of the HOL4 library., in the context of a finite set of dependencies used in the HOL4 proof [17].

First-order Higher-order Combined
Uni-sorted 1632 (FOF) 0 1632
Monomorphic 2232 (TF0) 3683 (TH0) 5915
Polymorphic 1536 (TF1) 6650 (TH1) 8186
Combined 5400 10333 15733
Table 1: Number of HOL4 formulas in each category.

4 The First Family of Translations

There already exist many families of translations for HOL to the TPTP format, usually developed for hammers [28, 36]

. We have adopted and adpated them: (i) We have made the translations more local, by making them independent for each formula, i.e., unaffected by other formulae in the problem. (ii) We have made more problems provable (in principle) by introducing additional axioms and relying on an embedding of polymorphic types instead of relying on heuristic monomorphization. (iii) For the

TH0 and TF0 formats, we make use of their polysorted nature by expressing the type of monomorphic constants directly with the available TF0 and TH0 types.

The translations are described in the order , as translations to the later formats take advantage of translation techniques used for earlier formats.

4.1 Translating to Th1

The TH1 format is a language that is strictly more expressive than HOL4. Therefore HOL4 formulas can be represented in TH1 with minimal effort. This produces the TH1-I collection of ATP problems.

Alignment of Logical Constructions.

The TPTP format contains a set of reserved objects that have implicit definitions. Thus, the HOL4 objects are mapped to their TPTP counterparts in a natural way. The boolean type of HOL4 is mapped to the defined TPTP type $o. The arrow type operator is mapped to the TPTP arrow >. All other type operators are declared to take types and give a type, using the TPTP “type of types” $tType. For example, the type operator has type $tType > $tType.

The TPTP logical connectives , , , , , , are used at the top-level of the translated formula whenever possible, but the corresponding HOL4 constants are used when necessary. Equivalences relating HOL4 logical constants to TPTP connectives are included.

Explicit type arguments.

In HOL4, a constant carries a type . This type is an instance of type that was given to when was created. By matching with , a type substitution can be inferred. Ordering the type variables in the domain of , the implicit type arguments of are deduced. Making the quantification of type variables and the type arguments explicit is required in the TH1 format. The effect that this requirement has on a constant declaration and a formula is shown in Example 1.

Example 1

(Explicit type arguments and type quantifications)

Type of

4.2 Translating to Tf1

To create the TF1-I collection of ATP problems all the higher-order features of the HOL4 problems have to be eliminated. We do this in a sequence of steps.

Lambda-lifting and Boolean-lifting.

One of the higher-order features is the presence of lambda-abstraction, so the translation starts by rewriting the formula using lambda-lifting [15, 36]. Before applying lambda-lifting, the translation is optimized by finding other ways to rewrite lambda-abstractions. The extensionality property is used to add extra arguments to lambdas appearing on either side of an equality, and then beta-reduce the formula. The lambda-lifting then creates a constant for the leftmost outermost lambda-abstractions appearing at the term-level. This constant replaces the lambda-abstraction in the formula. A definition is given for , which may involve some variable capture - see Example 2 This procedure is repeated until all the atoms are free of lambda-abstractions. The definitions are part of the background theory and are considered to be axioms in the TF1 problem even if they were created from the conjecture.

Example 2

(Lambda-lifting with variable capture)

Original formula:
New formula:

A similar method can be applied to move some logical constants from the term-level to the formula-level - see Example 3. (This optimization is not applied in our second family of translations.)

Example 3


Original formula:
New formula:

To allow the ATPs to create their own encoded lambda-abstractions, axioms for the combinators , and are added to every problem. (These axioms are omitted in our second family of translations. In the TH0-II versions of the problem, combinators are not needed since all simply typed -calculus terms are already representable. In the TF0-II and FOF-II versions only an axiom for and a partially applied axiom for are included. Combinator axioms often hinder the proof search and they are not needed for proving most problems.

Apply operator and Arity equations.

As functions cannot be passed as arguments in first-order logic, an explicit apply operator is used to apply a function to an argument. This way, all objects (constants and variables) have arity zero except for the apply operator which has arity two. The HOL4 functional extensionality axiom is added to to all problems, as it expresses the main property of the apply operator:

This axiom also demonstrates how the higher-order variables and become first-order variables after the introduction of .

To limit the number of apply operators in a formula, versions of each constant are defined for all its possible arities, in terms of the zero-arity version. These constants are used in the translated formula - see Example 4.

Example 4

(Using constants with their arity)

Original formula:
Arity equations:
New formula:

If the return type of a constant is a type variable then some of its instances can expect an arbitrarily large numbers of arguments. In the case where the number of arguments of an instance exceeds the number of arguments of the primitive constant, variants of this constant are not created for this arity. Instead, the apply operators is used to reduce the number of arguments to . For example, the term is not translated to but instead to .

Tf1 types.

As a final step, type arguments and type quantifications are added as in Section 4.2 . Moreover, the boolean type of HOL4 is replaced by $o at the formula-level, and by at the term-level (because $o is not allowed at the term-level in a first-order formula). This causes a mismatch between the type of the atom and the type of the logical connective $o. Therefore an additional operator is applied on top of every atom. The following properties of and are added to every translated problem (written here in the first-order style for function application):

In a similar manner, the TPTP arrow type cannot be used whenever a function appears as an argument. In this case, the type constructor is used, as illustrated by the following constant declaration:

4.3 Translating to Fof

The translation to FOF (which produces the FOF-I collection of ATP problems) follows exactly the same path as the translation to TF1 except that the types are encoded as first-order terms. To represent the fact that a first-order term has type , the tagging function introduced by Hurd [24] is used: every term of type is replaced by . Going from the type to the term effectively transforms type variables into term variables, and type operators into first-order functions and constants. Type arguments are not necessary anymore as the tags contain enough information. In practice, the tagging function prevents terms of different types from unifying, and allows instantiation of type variables - see Example 5.

Example 5

(type instantiation)


4.4 Translating to Tf0

An easy way to translate HOL4 formulas to TF0 (to produce the TF0-I collection of ATP problems) is to take the translation to FOF and inject it into TF0.

Trivial Injection from Fof to Tf0.

The first step is to give types to all the constants and variables appearing the the FOF formula. A naive implementation would be to give the type to symbols with arity . However, since it is known that the first argument comes from the universe of non-empty types, and the second argument comes from the universe of untyped terms, an explicit distinction can be made. The type of is defined to be , with being the universe of non-empty types, being the universe of untyped terms, and being the universe of typed terms. After this translation a type operator (or type variable) with arguments has type and a function (or term variable) with arguments has type . The type of is . Declaring the type of all these objects achieves a trivial translation from FOF with tags to TF0.

Using Special Types.

To take full advantage of the poly-sorted nature of TF0, a constant is declared for every constant , with arity and monomorphic type . The type of is declared to be , where , and are basic types. A basic type constructs a single type from a monomorphic type, e.g, for , for . The basic types are our special types and are declared using $tType. Thanks to these new constants monomorphic formulas can be expressed in a natural way, without type encodings in the formula. Nevertheless, an ATP should still be able to perform a type instantiation if necessary. That is why we relate the monomorphic representation with its tagged counterpart.

If a term has a basic type then it lives in the monomorphic world where as a term of type it belongs to the tagged world. All monomorphic terms (constructed from monomorphic variables and constants) can be expressed in the monomorphic world. To relate the two representation of the same HOL4 term an “injection” and a “surjection” is defined for each basic type . The constants and are expected to respect the following properties, which are included as axioms in the translated problems:

Whenever is an instance of a polymorphic function , the following equation is included in the TF0 problem, which relates the two representatives:

Effect on defined operators.

The operator is treated in the same way as every other constant. In particular, a different version of is created for each monomorphic type. The type of becomes , and the projection is used to transfer atoms from the tagged world to the monomorphic world.

If the presence of the predicate and the inclusion of additional equations are ignored, our translation of a HOL4 first-order monomorphic formula using special types to TF0 is simply the identity transformation.

4.5 Translating to Th0

Translating from HOL4 to TH0 (to produce the TH0-I collection of ATP problems) is achieved in a way similar to the translation to TF0. The HOL4 formulas are first translated to FOF, and then trivially injected into TH0. Special types are used for basic types extracted from monomorphic types. The set of higher-order basic types is slightly different from the first-order ones, where we recursively remove arrow types till a non-arrow constructor is found. In the higher-order setting a single monomorphic constant can be used to replace all arity versions of : . Another benefit of the expressivity of TH0 is that the basic type can be replaced by , and the the predicate can be omitted. The effect of the previous steps is illustrated in Example 6.

Example 6

(Translations of )
In this example has type where is the special type corresponding to .


In order to have the same shallowness result in TH0 as for TF0, it would be necessary to replace monomorphic constants created by the lifting procedure by their lambda-abstractions. We chose to keep the definitions for the lifted constants as they allow some term-level logical operators to be pushed to the formula level.

5 The Second Family of Translations

The second family of translations into TH0, TF0, and FOF is semanticlly motivated: we make use of constructors known to be definable in set theory. Types and terms are translated to sets, where types must translate to non-empty sets. The translation may optionally use other special types for monomorphic types in the HOL4 source. In the TH0 case the builtin type $o can be used for the HOL4 type . In the first-order cases HOL4 terms of type are sometimes translated to terms, and sometimes to formulas, depending on how the HOL4 term is used. In the TF0 case a separate type of booleans is declared, which is used as the type of terms translated from HOL4 terms of type . In the FOF case this approach is not possible, as all terms have the same type (intuitively representing sets). The other main difference between the translation to TH0 and the translations to the first-order languages is that the first-order translations make use of lambda lifting [15, 36]. As a result of the translations we obtain three new collections of ATP problems: TH0-II, TF0-II and FOF-II.

5.1 Translating to Tf0

The base types (for propositions, written as $o in TH0) and (for individuals, written as $i in TH0) are built into TH0. In addition a base type is declared. The translation treats elements of type as sets, and elements of type as non-empty sets. The basic constants used in the ATP problems are as follows:

  • is used for a fixed two element set.

  • is used for a fixed non-empty set corresponding to HOL4’s type of individuals.

  • is used to construct the function space of two sets.

  • corresponds to the membership relation on sets, where the second set is known to be non-empty. We will write for i The term is written as , and the term is written as .

  • corresponds to set theory level application. (represented as a set).

  • is used to build set bounded -abstractions as sets.

  • is a predicate that indicates whether or not an element of is true or not.

  • is an injection of into , essentially translating false to a set and true to a different set.

The basic axioms included in each ATP problem are:














If is interpreted using a model of ZFC and using a copy of the non-empty sets in this model, then the constants above can be interpreted in an obvious way so as to make the basic axioms true.

Given this theory, a basic translation from HOL4 to TH0 can be informally described as follows. Each HOL4 type (including type variables) is mapped to a term of type . HOL4 type variables (constants) are mapped to TH0 variables (constants) of type . For the remaining cases , , and are used. Each HOL4 term is mapped to a TH0 term of type , for which the context is always known. The invariant can be maintained by including the hypothesis whenever is a variable or a constant. The and constants are used to handle HOL4 applications and -abstractions. The axioms and ensure the invariant is maintained. Finally HOL4 propositions (which may quantify over type variables) are translated to TH0 propositions in an obvious way, using to go from to and to go from to when necessary. As an added heuristic, the translation makes use of TH0 connectives and quantifiers as deeply as possible, delaying the use of when possible.

Using Special Types.

As with the first family of translations, the second family optimizes by using special types for HOL4 types with no type variables, e.g., and . Unlike the first family, special types are not used for monomorphic function types. As a result, it is not necessary to consider alternative operators. A basic monomorphic type is a monomorphic type which is not of the form . If special types are used, then for each basic monomorphic type occurring in a proposition a corresponding TH0 type is declared, mappings and axioms relating to the type of sets are declared, and the type is used to translate terms of the type and quantifiers over the type when possible. For example, if a basic monomorphic type (e.g., ) occurs in a HOL4 proposition, then in addition to translating as a term we also declare a TH0 type , and along with axioms and .

One obvious basic monomorphic type is . In the case of we do not declare a new type, but use the TH0 type $o. That is, denotes $o. Note that is already declared. Additionally, is used as shorthand for , which has the desired type .

Suppose a HOL4 constant has type , where are basic monomorphic types with corresponding TH0 types . Instead of translating a term simply as a term of type , each is translated to a term of type , and a first order constant is used to translate to the term of type . In such a case an equation relating to is also included. Since the translation may return a term of type or , where is a basic monomorphic type, and are used to obtain a term of type or when one is required. If a quantifier ranges over a monomorphic type , a quantifier over type is used, instead of using a quantifier over type and using to guard the quantifier.

5.2 Translating to Tf0

There are two main modifications to the translation to TH0 when targeting TF0. Firstly, propositions cannot be treated as special kinds of terms in TF0. In order to deal with this is treated like other special types by declaring a new type and functions and along with corresponding axioms as above. Note that unlike the TH0 case, differs from . In TF0 is a unary predicate on , and is a function from to . In the TF0 version of the axioms and , is replaced with . Secondly, the background theory cannot include the higher-order operator. Therefore the operator is omitted, and lambda lifting is used to translate (most) HOL4 -abstractions. The two higher-order axioms and are also omitted.

In the TH0 case, the background axioms are enough to infer the following (internal) propositional extensionality principle

from the corresponding extensionality principle valid in TH0. This is no longer the case in TF0, so propositional extensionality is added as an axiom.

There are two special cases where lambda lifting can be avoided: identity and constant functions. For this purpose a new unary function on sets and a new binary function on sets are added. Two new basic axioms are added to the ATP problem for these functions:





A HOL4 term is translated as . For a HOL4 term , where is not free in , is translated to a first-order term of type , and the -term is translated to . If there is already a function defined for (with the same variable names), then that function is reused. Otherwise, lambda lifting of proceeds as follows. Let be type variables occurring in and be the free variables occurring in . Assume the translation of as a first-order term with of type corresponding to the variable . (Note that this may have involved some lambda lifting.) Let be a new -ary function returning sets. If special types are not being used, then each argument of is a set. If special types are used, then each argument is a set unless it corresponds to , where is a monomorphic type in which case the argument has type . The following axioms about are added to the ATP problem:





In these axioms the preconditions that each must be in if has type have been elided (otherwise special types are being used, is monomorphic, has type and no guard is required).

5.3 Translating to Fof

In order to translate to FOF all terms must be translated to the same type, effectively the type . This requires omission of any special treatment of monomorphic types, and instead all HOL4 terms must be translated to terms of type . The type of non-empty sets must also be omitted. Instead, is used wherever was used in the TF0 setting, and quantifiers that were over are guarded by a new non-emptiness predicate . Aside from these changes, the translation proceeds using lambda lifting as in the TF0 case.

6 Case Study

A very simple HOL4 theorem is , where is defined to be . Informally the proof is clear: expand the definition of and perform two -reductions. However, proving various translated versions of the problem range from trivial to challenging.

The first family of translations make use of a preprocessing step that changes the definition of from to

Note that this step makes the definition of the same (up to -conversion) as the theorem. Even if further encodings are applied to obtain a first-order problem, the axiom will still be the same as the conjecture. Consequently all versions resulting from the first family of translations are trivially provable.

The TH0-II version has conjecture

and the axiom (corresponding to the definition of )

The axiom defining combined with the basic axiom is enough to prove the theorem. However, the TH0-II version also includes all the other basic axioms along with internal versions of the logical constants for universal quantification and equality. The extra axioms make the problem hard for ATP systems, but if only the necessary axioms are provided the problem is easy. In TF0-II and FOF-II the conjecture is the same as in the TH0-II version, but the definition of is split into two functions declared when lambda lifting:


All the first-order versions of this problem are easy for current ATP systems.

7 Results

Since we are working in a large ITP library with a natural order of the problems, each translation can generate two versions of each problem. The bushy (small) version contains only the (translated) library facts that were needed for the ITP proof of each theorem. The chainy (large) version contains all the facts that precede the theorem in the library order, i.e., the real task faced by hammer systems. Chainy problems typically include thousands of axioms, requiring the use of premise selection algorithms [1] as a front-end in the ATP systems. Thus, in order to maintain the focus on ATP system performance, the results of running the ATP systems on the bushy problems are presented here.

We have run a total of 19 ATPs on the 12140 problems in each of the bushy problem sets, according to the ATPs’ support for the various TPTP formats. The ATPs supporting TH1 were HOLyHammer 0.21 [28] and Leo-III 1.3 [45, 44] . The ATPs supporting TH0 were agsyHOL 1.0 [34], cocATP 0.2.0, LEO-II 1.7.0 [4], Leo-III 1.3, Satallax 3.3 [11], and Zipperposition 1.4 [14]. The ATPs supporting TF1 were Leo-III 1.3 and Zipperposition 1.4. The ATPs supporting TF0 were Beagle 0.9.47 [3], CVC4 1.6 [2], E 2.2 [41], iProverModulo 2.5-0.1 [12], Leo-III 1.3, Princess 170717 [39, 40], Vampire 4.3 [31], and Zipperposition 1.4. The ATPs supporting FOF were CSE_E 1.0 [51], E 2.2, iProver 2.8 [30], Metis 2.4 [25]. Prover9 1109a [35] SPASS 3.9 [50], and Vampire 4.3. In each case we ran the ATP with a CPU time limit of 60s per problem. Table 2 summarizes the results.

System TH1-I TH0-I TH0-II TF1-I TF0-I TF0-II FOF-I FOF-II Union
agsyHOL 1374 1187 1605
Beagle 2007 2047 2531
cocATP 899 599 1000
CSE_E 4251 3102 4480
CVC4 4851 3991 5252
E 4277 3622 4618 3844 5118
HOLyHammer 5059 5059
iProver 2778 2894 3355
iProverMo’ 2435 1639 2699
LEO-II 2579 1923 3213
Leo-III 6668 5018 3485 3458 4032 3421 7062
Metis 2353 474 2356
Princess 3646 2138 3849
Prover9 2894 1742 3128
Satallax 2207 1292 2494
SPASS 2850 3349 3821
Vampire 4837 4693 4008 4928 5929
Zipperp’n 2252 2161 3771 3099 2576 4203
Union 6824 5209 3771 4608 5732 5073 5165 5108 7377
Table 2: Number of theorems proved, out of 12140.

Of the 12140 problems 7377 (60.7%) were solved by some ATP in one of the representations. The TacticToe [18, 19] prover built into HOL4 has been tested as a baseline comparison, and it (re)proves 5327 of 8855 chainy versions of the problems (60%). TacticToe

is a machine-learning guided prover that searches for a tactical proof by selecting suitable tactics and theorems learned from human-written tactical proofs. By design, this system works in the chainy setting. In total 8827 (73%) of the 12140 problems can be proved by either

TacticToe or one of the ATPs using one of the representations.

8 GRUNGE as CASC LTB Division

The CADE ATP System Competition (CASC) [46] is the annual evaluation of fully automatic, classical logic Automated Theorem Proving (ATP) systems – the world championship for such systems. CASC is divided into divisions according to problem and system characteristics. Each competition division uses problems that have certain logical, language, and syntactic characteristics, so that the systems that compete in the division are, in principle, able to attempt all the problems in the division. For example, the First-Order Form (FOF) division uses problems in full first-order logic, with each problem having axioms and a conjecture to be proved.

While most of the CASC divisions present the problems to the ATP systems one at a time, with an individual CPU or wall clock time limit per problem, the Large Theory Batch (LTB) division presents the problems in batches, with an overall wall clock time limit on the batch. As the name also suggests, the problems in each batch come from a “large theory”, which typically has many functors and predicates, and many axioms of which only a few are required for the proof of a theorem. The problems in a batch typically have a common core set of axioms used by all problems, and each problem typically has additional axioms that are specific to the problem. The batch presentation allows the ATP systems to load and preprocess the common core set of axioms just once, and to share logical and control results between proof searches. Each batch is accompanied by a set of training problems and their solutions, taken from the same source as the competition problems. The training data can be used for ATP system tuning during (typically at the start of) the competition.

In CASC-J9 - the most recent edition of the competition [48] - the LTB division used first-order form problems exported from CakeML [32]. At the time there was growing interest in an LTB division for typed higher-order problems, and it was soon evident that a multi-format LTB division, in which each problem is encoded in several of the available TPTP languages, would add a valuable dimension to CASC. For CASC-27 each problem will be presented in multiple formats: TH1, TH0, TF1, TF0, and FOF. The work described in this paper will provide the problem set. Systems will be able to attempt whichever versions they want, and a solution to any version will constitute a solution to the problem. There are two ways that core ATP systems can attempt a particular form of a problem: If the system is able to handle the form natively then that is the obvious first approach. The alternative is to translate the problem to another “lower” form, either internally, or using existing translation tools available in, e.g., Isabelle [37] or Why3 [16]. For example, Leo-III [43] would be able to handle all the formats natively, while E [41] would need to translate TH1, TH0, and TF1 to TF0 or FOF, which it can handle natively.

The batch presentation of problems in the LTB division provides interesting opportunities for ATP systems, including making multiple attempts on problems and learning search heuristics from proofs found. The multi-format LTB division extends these possibilities, by allowing multiple attempts on problems by virtue of the multiple formats available, and learning from proofs found in one format to improve performance on problems in another format. The latter is especially interesting, with little known research effort in this direction, and is complicated by differences in symbol naming between the various exports from HOL4.

9 Related Work

The HOL4 library already has translations for SMT solvers such as Yices [49], Z3 [9] and Beagle. A link to first-order ATPs is also available thanks to exports [17] of HOL4 theories to the HOL(y)Hammer framework [28]. Another notable project that facilitates the export of HOL4 theories is Open Theory [26]. The general approach for higher-order to first-order translations is laid out in Hurd [24]. An evaluation of the effect of different translations on ATP-provability was performed in [36]. A further study shows the potential improvements provided by the use of supercombinators [15]. In our work, the use of lambda-lifting (or combinators) is not necessary in TH0-II thanks to the use of the higher-order operator . This is similar to using higher-order abstract syntax to model syntax with binders [38].

A method for encoding of polymorphic types as terms through type tags (as in our first translation) or type guards (as in our second translation) is described in [6]. Translations [22, 45] from a polymorphic logic to a monomorphic poly-sorted logic without encoding typically rely on heuristic instantiations of type variables. However, heuristics may miss useful instantiations, and make the translation less modular (i.e., context-dependent). Our translations to TH0 and TF0 try to get the best of both worlds by using a type encoding for polymorphic types and special types for basic monomorphic types.

10 Conclusion

This work has defined, evaluated and compared two families of translations of the HOL4 logic to a number of ATP formalisms, and described a new unified large-theory ATP benchmark (GRUNGE) based on them. The first family is designed to play to the strengths of the calculi of most ATP systems, while the second family is based on more straightforward semantics rooted in set theory. The case study shows how different the translated problems may be, even in a simple example. A number of methods and optimizations have been used, however it is clear that these translations can be further optimized and that different encodings favour different provers. Out of 12140 HOL4 theorems, the ATP systems we evaluated could solve 7377 problems in one or more of the formats. The TacticToe system that works directly in the HOL4 formalism and uses HOL4 tactics could solve 5327 problems. Together the total number of problems solved is 8827. Leo-III was the strongest system in the higher-order representations. In the first-order representations the strongest systems were Zipperpin, CVC4, E and Vampire. We are also planning to pre-release a part of both the bushy and chainy versions of the problems, to allow the system developers to develop and tune their systems on them before the 2019 CASC LTB competition.


  • [1]

    Alama, J., Heskes, T., Kühlwein, D., Tsivtsivadze, E., Urban, J.: Premise selection for mathematics by corpus analysis and kernel methods. J. Autom. Reasoning

    52(2), 191–213 (2014).
  • [2] Barrett, C., Conway, C.L., Deters, M., Hadarean, L., Jovanović, D., King, T., Reynolds, A., Tinelli, C.: CVC4. In: Gopalakrishnan, G., Qadeer, S. (eds.) Conference on Computer Aided Verification (CAV). LNCS, vol. 6806, pp. 171–177. Springer (2011),
  • [3] Baumgartner, P., Waldmann, U.: Hierarchic superposition with weak abstraction. In: Bonacina, M.P. (ed.) Conference on Automated Deduction (CADE). Lecture Notes in Computer Science, vol. 7898, pp. 39–57. Springer (2013),
  • [4]

    Benzmüller, C., Paulson, L., Theiss, F., Fietzke, A.: LEO-II - A Cooperative Automatic Theorem Prover for Higher-Order Logic. In: Baumgartner, P., Armando, A., Dowek, G. (eds.) Proceedings of the 4th International Joint Conference on Automated Reasoning. pp. 162–170. No. 5195 in Lecture Notes in Artificial Intelligence, Springer-Verlag (2008)

  • [5] Benzmüller, C., Rabe, F., Sutcliffe, G.: THF0 – the core of the TPTP language for classical higher-order logic. In: Armando, A., Baumgartner, P., Dowek, G. (eds.) Automated Reasoning, 4th International Joint Conference, IJCAR 2008, Sydney, Australia, August 12-15, 2008, Proceedings. LNCS, vol. 5195, pp. 491–506. Springer (2008),
  • [6] Blanchette, J.C., Böhme, S., Popescu, A., Smallbone, N.: Encoding monomorphic and polymorphic types. In: Piterman, N., Smolka, S.A. (eds.) Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS). LNCS, vol. 7795, pp. 493–507. Springer (2013),
  • [7] Blanchette, J.C., Kaliszyk, C., Paulson, L.C., Urban, J.: Hammering towards QED. J. Formalized Reasoning 9(1), 101–148 (2016).
  • [8] Blanchette, J.C., Paskevich, A.: TFF1: The TPTP typed first-order form with rank-1 polymorphism. In: Bonacina, M.P. (ed.) CADE. LNCS, vol. 7898, pp. 414–420. Springer (2013)
  • [9] Böhme, S., Weber, T.: Fast LCF-style proof reconstruction for Z3. In: Kaufmann, M., Paulson, L.C. (eds.) Conference on Interactive Theorem Proving (ITP). LNCS, vol. 6172, pp. 179–194. Springer (2010),
  • [10] Bonichon, R., Delahaye, D., Doligez, D.: Zenon : An extensible automated theorem prover producing checkable proofs. In: Dershowitz, N., Voronkov, A. (eds.) Logic for Programming, Artificial Intelligence, and Reasoning, 14th International Conference, LPAR 2007, Yerevan, Armenia, October 15-19, 2007, Proceedings. Lecture Notes in Computer Science, vol. 4790, pp. 151–165. Springer (2007)
  • [11] Brown, C.E.: Satallax: An automatic higher-order prover. In: Gramlich, B., Miller, D., Sattler, U. (eds.) IJCAR. LNCS, vol. 7364, pp. 111–117. Springer (2012)
  • [12] Burel, G.: Experimenting with deduction modulo. In: Sofronie-Stokkermans, V., Bjørner, N. (eds.) CADE 2011. Lecture Notes in Artificial Intelligence, vol. 6803, pp. 162–176. Springer (2011)
  • [13] Church, A.: A formulation of the simple theory of types. J. Symbolic Logic 5, 56–68 (1940)
  • [14] Cruanes, S.: Extending Superposition with Integer Arithmetic, Structural Induction, and Beyond. (Extensions de la Superposition pour l’Arithmétique Linéaire Entière, l’Induction Structurelle, et bien plus encore). Ph.D. thesis, École Polytechnique, Palaiseau, France (2015),
  • [15] Czajka, L.: Improving automation in interactive theorem provers by efficient encoding of lambda-abstractions. In: Avigad, J., Chlipala, A. (eds.) Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs, Saint Petersburg, FL, USA, January 20-22, 2016. pp. 49–57. ACM (2016).
  • [16] Filliatre, J.C., Paskevich, A.: Why3 - Where Programs Meet Provers. In: Felleisen, M., Gardner, P. (eds.) Proceedings of the 22nd European Symposium on Programming. pp. 125–128. No. 7792 in Lecture Notes in Computer Science, Springer (2013)
  • [17] Gauthier, T., Kaliszyk, C.: Premise selection and external provers for HOL4. In: Certified Programs and Proofs (CPP’15). LNCS, Springer (2015).
  • [18] Gauthier, T., Kaliszyk, C., Urban, J.: TacticToe: Learning to reason with HOL4 tactics. In: Eiter, T., Sands, D. (eds.) LPAR-21, 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, Maun, Botswana, May 7-12, 2017. EPiC Series in Computing, vol. 46, pp. 125–143. EasyChair (2017),
  • [19] Gauthier, T., Kaliszyk, C., Urban, J., Kumar, R., Norrish, M.: Learning to prove with tactics. CoRR (2018),
  • [20] Gordon, M.J.C., Melham, T.F. (eds.): Introduction to HOL: A theorem proving environment for higher order logic. Cambridge University Press (1993),
  • [21] Harrison, J.: HOL Light: A tutorial introduction. In: Srivas, M.K., Camilleri, A.J. (eds.) FMCAD. LNCS, vol. 1166, pp. 265–269. Springer (1996)
  • [22] Harrison, J.: Optimizing proof search in model elimination. In: McRobbie, M., Slaney, J. (eds.) Conference on Automated Deduction (CADE). pp. 313–327. No. 1104 in LNAI, Springer (1996),
  • [23] Harrison, J., Urban, J., Wiedijk, F.: History of interactive theorem proving. In: Siekmann, J.H. (ed.) Computational Logic, Handbook of the History of Logic, vol. 9, pp. 135–214. Elsevier (2014).
  • [24] Hurd, J.: First-order proof tactics in higher-order logic theorem provers. Design and Application of Strategies/Tactics in Higher Order Logics, number NASA/CP-2003-212448 in NASA Technical Reports pp. 56–68 (2003)
  • [25] Hurd, J.: System description: The Metis proof tactic. In: Christoph Benzmueller, John Harrison, C.S. (ed.) Workshop on Empirically Successful Automated Reasoning in Higher-Order Logic (ESHOL). pp. 103–104 (2005),
  • [26] Hurd, J.: The OpenTheory standard theory library. In: Bobaru, M.G., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NASA Formal Methods. LNCS, vol. 6617, pp. 177–191. Springer (2011),
  • [27] Kaliszyk, C., Sutcliffe, G., Rabe, F.: TH1: The TPTP Typed Higher-Order Form with Rank-1 Polymorphism. In: Fontaine, P., Schulz, S., Urban, J. (eds.) Proceedings of the 5th Workshop on Practical Aspects of Automated Reasoning. pp. 41–55. No. 1635 in CEUR Workshop Proceedings (2016)
  • [28] Kaliszyk, C., Urban, J.: Learning-assisted automated reasoning with Flyspeck. J. Autom. Reasoning 53(2), 173–213 (2014).
  • [29] King, D., Arthan, R., Winnersh, I.: Development of practical verification tools. ICL Systems Journal 11, 106–122 (1996)
  • [30] Korovin, K.: iprover - an instantiation-based theorem prover for first-order logic (system description). In: Armando, A., Baumgartner, P., Dowek, G. (eds.) Automated Reasoning, 4th International Joint Conference, IJCAR 2008, Sydney, Australia, August 12-15, 2008, Proceedings. Lecture Notes in Computer Science, vol. 5195, pp. 292–298. Springer (2008)
  • [31] Kovács, L., Voronkov, A.: First-order theorem proving and Vampire. In: Sharygina, N., Veith, H. (eds.) CAV. LNCS, vol. 8044, pp. 1–35. Springer (2013)
  • [32] Kumar, R., Myreen, M., Norrish, M., Owens, S.: CakeML: A Verified Implementation of ML. In: Sewell, P. (ed.) Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. pp. 179–191. ACM Press (2014)
  • [33] Kumar, R., Myreen, M.O., Norrish, M., Owens, S.: CakeML: a verified implementation of ML. In: Jagannathan, S., Sewell, P. (eds.) The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL’14, San Diego, CA, USA, January 20-21, 2014. pp. 179–192. ACM (2014).
  • [34] Lindblad, F.: A focused sequent calculus for higher-order logic. In: Demri, S., Kapur, D., Weidenbach, C. (eds.) Automated Reasoning - 7th International Joint Conference, IJCAR 2014, Held as Part of the Vienna Summer of Logic, VSL 2014, Vienna, Austria, July 19-22, 2014. Proceedings. Lecture Notes in Computer Science, vol. 8562, pp. 61–75. Springer (2014).,
  • [35] McCune, W.: Prover9 and Mace4 (2005–2010),
  • [36] Meng, J., Paulson, L.C.: Translating higher-order clauses to first-order clauses. J. Autom. Reasoning 40(1), 35–60 (2008)
  • [37] Nipkow, T., Paulson, L., Wenzel, M.: Isabelle/HOL: A Proof Assistant for Higher-Order Logic. No. 2283 in Lecture Notes in Computer Science, Springer-Verlag (2002)
  • [38] Pfenning, F., Elliot, C.: Higher-order abstract syntax. In: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation. pp. 199–208. PLDI ’88, ACM, New York, NY, USA (1988).
  • [39] Rümmer, P.: A Constraint Sequent Calculus for First-Order Logic with Linear Integer Arithmetic. In: Cervesato, I., Veith, H., Voronkov, A. (eds.) Proceedings of the 15th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning. pp. 274–289. No. 5330 in Lecture Notes in Artificial Intelligence, Springer-Verlag (2008)
  • [40] Rümmer, P.: E-Matching with Free Variables. In: Bjørner, N., Voronkov, A. (eds.) Proceedings of the 18th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning. pp. 359–374. No. 7180 in Lecture Notes in Artificial Intelligence, Springer-Verlag (2012)
  • [41] Schulz, S.: System description: E 1.8. In: McMillan, K.L., Middeldorp, A., Voronkov, A. (eds.) LPAR. LNCS, vol. 8312, pp. 735–743. Springer (2013).
  • [42] Slind, K., Norrish, M.: A brief overview of HOL4. In: Mohamed, O.A., Muñoz, C.A., Tahar, S. (eds.) Theorem Proving in Higher Order Logics, 21st International Conference, TPHOLs 2008, Montreal, Canada, August 18-21, 2008. Proceedings. LNCS, vol. 5170, pp. 28–32. Springer (2008)
  • [43] Steen, A., Benzmüller, C.: The Higher-Order Prover Leo-III. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) Proceedings of the 8th International Joint Conference on Automated Reasoning. pp. 108–116. No. 10900 in Lecture Notes in Artificial Intelligence (2018)
  • [44] Steen, A., Benzmüller, C.: The higher-order prover Leo-III. In: Galmiche, D., Schulz, S., Sebastiani, R. (eds.) Automated Reasoning. IJCAR 2018. LNCS, vol. 10900, pp. 108–116. Springer, Cham (2018),
  • [45] Steen, A., Wisniewski, M., Benzmüller, C.: Going polymorphic - TH1 reasoning for leo-iii. In: Eiter, T., Sands, D., Sutcliffe, G., Voronkov, A. (eds.) IWIL@LPAR 2017 Workshop and LPAR-21 Short Presentations, Maun, Botswana, May 7-12, 2017. Kalpa Publications in Computing, vol. 1. EasyChair (2017),
  • [46] Sutcliffe, G.: The CADE ATP System Competition - CASC. AI Magazine 37(2), 99–101 (2016)
  • [47] Sutcliffe, G., Schulz, S., Claessen, K., Baumgartner, P.: The TPTP Typed First-order Form with Arithmetic. In: Bjørner, N., Voronkov, A. (eds.) Proceedings of the 18th International Conference on Logic for Programming, Artificial Intelligence, and Reasoning. pp. 406–419. No. 7180 in Lecture Notes in Artificial Intelligence, Springer-Verlag (2012)
  • [48] Sutcliffe, G.: The 9th IJCAR automated theorem proving system competition - CASC-J9. AI Commun. 31(6), 495–507 (2018).
  • [49] Weber, T.: SMT solvers: new oracles for the HOL theorem prover. International Journal on Software Tools for Technology Transfer 13(5), 419–429 (2011),
  • [50] Weidenbach, C., Dimova, D., Fietzke, A., Kumar, R., Suda, M., Wischnewski, P.: SPASS Version 3.5. In: Schmidt, R.A. (ed.) CADE. LNCS, vol. 5663, pp. 140–145. Springer (2009)
  • [51] Xu, Y., Liu, J. Chen, S., Zhong, X., He, X.: Contradiction Separation Based Dynamic Multi-clause Synergized Automated Deduction. Information Sciences 462, 93–113 (2018)