From LCF to Isabelle/HOL

07/05/2019 ∙ by Lawrence C. Paulson, et al. ∙ University of Cambridge 0

Interactive theorem provers have developed dramatically over the past four decades, from primitive beginnings to today's powerful systems. Here, we focus on Isabelle/HOL and its distinctive strengths. They include automatic proof search, borrowing techniques from the world of first order theorem proving, but also the automatic search for counterexamples. They include a highly readable structured language of proofs and a unique interactive development environment for editing live proof documents. Everything rests on the foundation conceived by Robin Milner for Edinburgh LCF: a proof kernel, using abstract types to ensure soundness and eliminate the need to store proofs. Compared with the research prototypes of the 1970s, Isabelle is a practical and versatile tool. It is used by system designers, mathematicians and many others.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Today’s interactive theorem provers originated in research undertaken during the 1970s on the verification of functional programs. Two quite different tools were built: the Boyer/Moore theorem prover (now ACL2, described elsewhere in this volume) and Edinburgh LCF [GMW79, Pau18].

Descendants of the latter include every member of the HOL family (HOL4, HOL Light, ProofPower) [GM93] as well as Coq [BC04] and Isabelle [NPW02]. As we shall see in the sequel, the achievements of the past 40 years lie on several dimensions, including type systems, proof languages and user interfaces. Automation is the key to usability; this includes automated search for counterexamples as well as for proofs.

These developments are reflected in the size of the tools themselves. In 1977, the Edinburgh LCF distribution was 900 KB, including an implementation of the ML programming language in Lisp. In 1986, the Isabelle distribution was a mere 324 KB (not counting ML, now a separate distribution). By 1991, Isabelle exceeded LCF: 937 KB. In 2019, the Isabelle distribution had reached 133 MB! And by way of comparison, HOL Light (which is much more closely related to LCF) was 84 MB. Much of this bulk consists of proof libraries rather than executable code, though libraries make a crucial contribution to a system’s capabilities. Today’s mature systems also include documentation and examples.111The figures are for uncompressed distribution directories containing no binaries, but possibly PDF files.

Isabelle is a leading interactive theorem prover. It is generic, supporting a number of different formal calculi, but by far the most important of these is its instantiation to higher-order logic: Isabelle/HOL. Already during the 1990s, Isabelle/HOL was being applied with great success to the task of verifying cryptographic protocols [Pau98]. Turning to mathematics, it played a critical role in Hales’s Flyspeck project, which verified his proof of the Kepler conjecture [HAB17]. It is the basis for the seL4 project, under which an entire operating system kernel was verified, proving full functional correctness [KAE10]. It was adopted by researchers outside the verification milieu for specifying and verifying algorithms for replicated datatypes that provide “eventual consistency” [GKMB17]. Numerous other projects are underway around the world. Like other proof assistants, Isabelle is not directly concerned with program verification, i.e. with verifying code written in a programming language, but it can be used as a back end to prove verification conditions. Isabelle can also be used as a verified programming environment, where mathematical functions can be proved correct and then automatically translated to executable code in one of several different programming languages. The translation process itself is currently unverified, but even this is likely to change in the near future.

This essay focuses on Isabelle/HOL and therefore has little to say about techniques common to most systems. For example, simplification by rewriting — coupled with recursive simplification to handle conditional rewrite rules — was already realised in both the Boyer/Moore theorem prover [BM79] and Edinburgh LCF by the end of the 1970s. Recursive datatype and function definitions, as well as inductive definitions, were commonplace by around 1995. Linear arithmetic decision procedures were also widely available by then.

Our title echoes Mike Gordon’s paper “From LCF to HOL: a Short History” [Gor00]. Like Mike, we begin with LCF, the source of the most fundamental ideas. But we pass over this material quickly in order to focus on Isabelle. There is no way to surpass Mike’s account of the early years. He starts in 1969, with Dana Scott’s Logic for Computable Functions, and covers the original LCF project at Stanford University. He describes in detail the development of the successor LCF systems at Edinburgh and then at Cambridge. He names all the people involved and finally outlines his own development of HOL.

We begin with LCF because of its seminal importance, continuing to HOL because it is so strongly linked (§2). Then we focus exclusively on Isabelle. We begin with the core ideas of a generic reasoner built around unification and backtracking (§3). Then we consider the task of supporting higher-order logic, which required the introduction of type classes (§4). There follows an account of automatic proof search and its dual, the search for counterexamples (§56). We also discuss the generation of code from logical functions (§7). We then turn to Isabelle’s most distinctive features: its structured proof language (§8) and the powerful user interface architecture supporting it (§9). The next section describes Isabelle’s Archive of Formal Proofs (§10). To conclude, we discuss the synergy among these ideas, making them more powerful in combination than individually (§11).

Coq and other systems built around constructive type theories represent a distinctive strand of development and fall outside our scope.

2 LCF and HOL

Edinburgh LCF is best known for the so-called LCF approach: implementing the inference rules of a logical calculus within a proof kernel that has the exclusive right to create theorems. Such a kernel is indeed found in most modern systems, and is responsible for their good record of soundness. But in fact, LCF introduced a broader set of norms that are now taken for granted: a focus on backward proof, the practice of working in a theory hierarchy, and above all, the central role of a functional programming language, ML [Mil85].

ML is short for meta language, as it was the vehicle for operating on formulas belonging to the object language (namely, the proof calculus). Radical at the time, ML was soon seen to be a general-purpose programming language and today exerts a strong influence on language design. Crucial to ML is its sound, polymorphic type system with first-class functions, and in particular, its support for abstract types. An abstract type encapsulates a data structure’s internal representation, allowing access only through a fixed set of functions. A typical abstract type might be a dictionary, where the implementation (e.g. balanced trees) is only accessible through well-defined operations such as insert, update and delete; then the implementer is free to change the representation (e.g. to improve performance or to introduce additional operations) without affecting any of the code outside the abstract data type declaration itself.

Robin Milner’s key insight was that abstract data types could save space by eliminating the need to store proofs. (In his early experiments, he had kept running out of memory [Gor15].) He declared an abstract type of theorems, where the operations were simply the inference rules of his logical calculus. The resulting type, thm, was the type of theorems and ML’s type checker was our guarantee that anything of type thm had definitely been created exclusively by the application of inference rules. Just as a dictionary doesn’t need to keep a record of the operations applied to it but only the dictionary itself, type thm doesn’t need to store the proofs of theorems but only their statements. The resultant savings of space are as important now as they were in the 1970s, for although today we have more memory, we also have vastly bigger proofs.

Edinburgh LCF also introduced proof tactics and tacticals to express backward, goal-directed reasoning. An LCF tactic typically applies a specific inference rule, while tacticals denote control structures, e.g., THEN (one tactic followed by another), ORELSE (which allows one of several tactics to be attempted) and REPEAT (repeated execution of a tactic until it fails). Remarkably, even tactics lie outside the proof kernel. An LCF tactic is a function that takes a goal (a formula with its assumptions) and returns a list of subgoals that it claims are logically sufficient. To justify this claim, it returns a function operating on values of type thm, but we have no guarantee that this function will deliver the promised theorem in the end. We see that the LCF architecture makes proof procedures harder to implement, while reducing the amount of code that has to be trusted.

We see incidentally two meanings of the word proof:

  1. formal deductions of theorems from axioms using the inference rules of a logical calculus;

  2. executable code written using tactics or other primitives, expressing the search for such deductions.

To resolve this ambiguity, the former are sometimes called proof objects or proof terms and the latter, proof scripts or proof texts. Thus we see that the LCF approach eliminates the need to store proof terms and allows proof scripts to be coded in ML using tactics and tacticals. Nevertheless, proof assistants based on constructive type theories retain proof objects, as they are intrinsic to such formalisms.

The most fundamental question in the design of a theorem prover is what calculus to support. Boyer and Moore made the inspired choice of Pure Lisp, which was sufficient for verifying simple functional programs and formalising elementary number theory in the 1970s, and which has grown to support advanced applications today. For Edinburgh LCF, Milner chose the Logic for Computable Functions, which was perfect for the domain-theoretic investigations topical at the time but proved to be too quirky for general adoption.

Mike Gordon, who was one of the designers of Edinburgh LCF, ultimately adopted higher-order logic as the basis for his research into hardware verification [Gor86, Gor00]. Higher-order logic is most easily understood as a typed predicate calculus including function and Boolean types, and therefore also set types. Gordon launched his HOL system in 1986 [Gor86, GM93], presenting both the logic and its application to hardware specification and verification. HOL turned out to be extremely versatile and it soon attracted a global user community. New versions emerged, in particular HOL Light [Har96]. Members of the HOL family have been used in verification projects of every description, including the formalisation of great bodies of mathematics.

However, it wasn’t obvious in the 1980s that one or two formalisms could be sufficient for the needs of verification. Type theories, dynamic logics and many other formalisms were being proposed. Gordon himself had reworked LCF twice in his hardware verification research. This was the origin of the idea that a theorem prover could be generic: supporting a spectrum of formalisms through a common framework. Today this includes syntactic tools (parsing, pretty printing), inference tools (rewriting, unification), a common proof language (Isar) and user interface foundation (PIDE).

3 Isabelle in The Early Days: A Logical Framework

Isabelle originated in a project to build an LCF-style proof assistant for Martin-Löf’s constructive type theory [ML84]. Two ideas influenced the design from the outset [Pau90]:

  • Reasoning should be based on unification

    rather than pattern matching, so that goals could contain variables that could be instantiated.

  • A tactic should be able to return multiple results in a lazy list, representing alternative proof attempts for backtracking.

Both ideas were aimed at supporting proof search. The LCF work had demonstrated conclusively that verification was repetitious and tedious, requiring the best possible automation. The combination of unification and backtracking would allow the use of logic programming techniques, as in Prolog

[CM87]. For example, in Martin-Löf’s type theory we have the following (derived) rule:

Backward chaining, using this rule to prove something of the form , leaves us with a subgoal of the form , where the question mark indicates that is a hole to be filled: in Prolog parlance, a logical variable. Unification fills in the holes later, while backtracking can orchestrate alternative ways of filling the holes. In this example, the form of will determine the choice of . The ability to defer choices until they become clear is as valuable in theorem proving as it is in Prolog programming.

It may be worth mentioning that Prolog was a hot topic in the 1980s. However, standard first-order unification was simply not applicable to Martin-Löf type theory, which is based on a form of typed -calculus. Higher-order unification would be necessary. Undecidable in its full generality, a reasonably practical unification procedure had recently been published by Gerard Huet [Hue75].

Higher-order unification is strikingly different from first-order unification. The latter is easily implemented and delivers a unique answer modulo variable renaming. Higher-order unification allows even functions to be variables and can instantiate such variables with -abstractions generated on the fly, yielding multiple results. For example, it is possible to unify with 3 in two different ways: (with no constraint on ) and , . Unifying with could yield four different results: , , and . The search for such unifiers works by descending through the terms one level at a time, attempting two kinds of steps: projections (use of a bound variable) and imitations (copying the opposite term). The examples above show how it can find different ways of abstracting one term from another.

With such a sophisticated syntactic mechanism built in at the lowest level, it became clear that the LCF approach to inference rules could be radically changed [Pau86]. Consider the proof rule of universal elimination:


Its LCF representation is an ML function taking two arguments: a theorem (which must have the form ) and a term (which must have the same type as ). It then generates the desired conclusion, namely , raising an exception if any of the preconditions is violated. This approach is general; the drawback is the tedium and attendant risk of error when there are dozens of rules. With our -calculus framework, inference rules such as the one above can simply be written out in the form of a template and instantiated using unification. Better still, unification could be applied either to the premise or to the conclusion, yielding forward or backward proof through a single mechanism. This is Isabelle’s central idea [Pau86].

The precise nature of these templates remained to be determined, and the best approach turned out to be a sort of logical framework. These are specialised formalisms whose purpose is to encode other formalisms. The Edinburgh Logical Framework [HHP93] is the best known of these. It differs from Isabelle’s by incorporating proof objects into the calculus itself, neutralising one of the key advantages of the LCF architecture: that proofs do not have to be stored.

So Isabelle can be seen as an instance of the traditional LCF approach, but where type thm formalises a logical framework or meta-logic, where other formalisms (the object-logics) can later be encoded. Such encodings can also be proved correct [Pau89]. Our logical framework approach is less general than the original LCF representation, where an inference rule can undertake an arbitrary computation. Nevertheless, it captures a variety of possibilities, and moreover, it is open-ended: for example, Isabelle defines intuitionistic first-order logic, which in a succession of formal theories is extended with classical logic, then with Zermelo–Fraenkel set theory, then with the axiom of choice. With the original LCF approach, once you define the type thm, it can never be extended.

The first object-logic was constructive type theory (Isabelle/CTT). It was followed by a classical first-order sequent calculus (Isabelle/LK) and by natural deduction calculi for intuitionistic and classical first-order logic (Isabelle/IFOL and Isabelle/FOL) [Pau93]. Later, a substantial development of Zermelo–Fraenkel set theory was developed on top of Isabelle/FOL [PG96]. This is one of the leading tools for formal reasoning in axiomatic set theory.

Although Isabelle took a declarative approach to defining logics, the system architecture was still based on LCF’s philosophy that everything was done in ML. All interactions with Isabelle took place at the ML toplevel. For example, the declaration of new types, constants with their definitions, and axioms all required calling appropriate ML functions with any necessary data as arguments. This began to change in the early 1990s [Pau94] when logics could be defined in a theory file with types, constants and proof rules declared using a natural syntax.222This represented a return to the original Edinburgh LCF, which also supported theory files. The Isabelle version was inspired by Goguen’s OBJ system [Gog79, FGJM85], in particular concerning the declaration of new mixfix syntax. Proofs, however, were still expressed in ML. The structured proof language Isar (Section 8.1) came later. Of today’s LCF-based systems, the HOL family has remained the most faithful to the original conception, with proofs coded in ML.

Paulson’s original implementation of higher-order unification [Pau86] was practical, but still slow. Yet in Isabelle practice, many unification problems are first-order, or almost so. Dale Miller [Mil91] discovered a subclass of -terms, later called (higher-order) patterns [Nip91a], which behave like first-order terms: unification is decidable and if two terms are unifiable, they have a most-general unifier. A term (in -normal form) is called a higher-order pattern if every free occurrence of a variable has as arguments a list of distinct bound variables. Most unification problems in Isabelle are of this form. Nipkow gave a succinct implementation of pattern unification and added it to Isabelle [Nip93a]. Full higher-order unification is invoked only if pattern unification encounters a non-pattern.

A special case of unification is matching where the variables of only one of the two terms are instantiated. Isabelle’s rewrite engine (aka the simplifier) is based on higher-order pattern matching. Thus the simplifier can deal with many standard transformations of quantified terms, for example the following ones:

It appears that Isabelle was the first theorem prover to support higher-order rewrite rules [NP98].

4 Type Classes and Isabelle/HOL

Gordon’s HOL system became a runaway success, dominating the verification arena. By 1991 it was being used in over 80 separate projects around the world [Kal91]. Isabelle claimed to be a generic theorem prover, but it couldn’t handle higher-order logic. The problem was that the templates mentioned above could not refer to types. This was not an issue in the original application of Martin-Löf type theory, where types and formulas were effectively identified and the inference rules referred to types explicitly. In the following example, and are types and the rule expresses how type checking should be done for a term of the form .

Contrast with the previous inference rule (1), where no types are visible. There are many formalisms, including many-sorted first-order logic333Beware of terminological confusion regarding the word sort, which for first-order logic is synonymous with type. Our use of sort below will be entirely different. as well as higher-order logic, where types are kept in the background and type constraints are enforced implicitly. Users would not like to be forced to prove statements like .

Adequate support for logics having implicitly-typed variables required another idea, order-sorted polymorphism, leading to an even more powerful idea, axiomatic type classes.

4.1 Order-Sorted Polymorphism

Isabelle had always supported polymorphism internally; the difficulty lay in making it available to users. In its simplest form, a polymorphic type variable may take on any type whatsoever. This could not be allowed in a logical framework, where some types are intrinsic to the framework itself and other types might be unsuitable to a particular object-logic. In first-order logic, the type of in must not involve functions or Booleans.444Quantification over Booleans (and therefore, relations) requires at least second-order logic [Bar77, page 7]. Order-sorted polymorphism solves such difficulties by introducing a hierarchy of sorts on types [Nip91b]. A sort is a finite intersection of type classes, which are essentially collections of types. For first-order logic, we may introduce the type class FO for all types for which quantification is permitted. For higher order logic, we would have a different type class (say HO), containing Boolean and function types. We write if type has class .

Now we need to express how (possibly nullary) type constructors such as nat (the type of natural numbers), Cartesian product, function space and list (the type of lists) act on type classes. For example, for first-order logic we want that nat FO and that if FO then  list FO. These assertions can be codified in Isabelle as instance declarations:   instance nat FO  instance list FO FO Thus nat list FO holds, but in the absence of corresponding instances, bool FO and bool list FO do not: truth values, or lists of them, do not belong to FO. Technically, these instance declarations form the signature of the (term) algebra of types. Under certain natural conditions on the interaction of the class hierachy with the instance declarations, order-sorted unification of types, like unsorted unification, is still unitary: solvable unification problems have most-general unifiers. Therefore these conditions guarantee that we still have principal types. This theory, combined with extensions to higher-order unification [Nip91b], allows full support for higher-order logic. And so the “templates” of our framework contain explicit type variables, in addition to the ordinary variables that we had before. This version was first released in 1991 [NP92]. At the same time, the correspondence to Haskell’s type classes was worked out [NS91, NP93].

4.2 Axiomatic Type Classes

The full power of type classes is realised when they are combined with axioms. Axiomatic type classes support the flexible and principled overloading of symbols [Nip93b, Wen97]. A type class can be introduced on the basis of a specific vocabulary of symbols (or signature) possibly coupled with axioms (a specification) to constrain those symbols—and possibly extending other type classes. Orderings are a natural example: Isabelle/HOL introduces a succession of type classes for increasingly stronger notions of ordering:

  • ord: the ordering symbols , , but with no attached properties

  • preorder: adding reflexivity and transitivity, and defining iff

  • order: adding antisymmetry

  • linorder: adding linearity

For the case of lists, the lexicographic ordering yields a partial ordering if the list elements are partially ordered and yields a linear ordering if the list elements are linearly ordered. Such details are easily expressible:   instance list order order  instance list linorder linorder Instance declarations for axiomatic type classes require the user to supply definitions of any associated symbols (here and ) for the supplied type along with proofs of the associated properties using those definitions. This overloading is principled in that although and will have separate definitions for each type for which they are defined, they will always satisfy the associated axioms.

The value of axiomatic type classes can be seen in the formalisation of mathematical analysis. Type classes for groups, rings and other algebraic structures provide overloading for the common arithmetic symbols, but in addition, we have type classes for metric and topological spaces, vector and Euclidean spaces. When we construct the type of complex numbers for example, by showing that they form a field and a Euclidean space, we instantly inherit substantial libraries of facts covering limits, convergence, derivatives and topology, which would otherwise have to be largely duplicated from the analogous facts for the real numbers

[HIH13]. The same thing happens again when we construct more advanced number systems, such as the quaternions [Woo18], a number system for three-dimensional space. We have a surprising variety of numeric types, such as the extended reals () and the nonnegative extended reals; both of these belong to a number of ordering and topological type classes.

An exposition of axiomatic type classes with fully worked out examples is available in a paper by Paulson [Pau04]. It’s slightly outdated — we have a much more elaborate type class hierarchy now and a somewhat different syntax — but the principles are the same.

4.3 Logical Foundations

Wenzel [Wen97] reduced type classes to (1) overloaded constant definitions (OCDs) for the signature and (2) predicate definitions over a single type variable for the specification. Getting the notion of definition with type-polymorphism right is notoriously tricky. For example, in an early version of Gordon’s HOL system, constant definitions could introduce inconsistencies [Art16]. The problem is simply to detect ill-formed definitions, in particular circularities, automatically. Circularities can lead to inconsistencies. Type classes link the level of types with the level of terms, opening up new opportunities to create highly obscure circularities.

Wenzel analysed the impact of OCDs on the consistency of an arbitrary theory. He sketched conditions under which they were meta-safe, which roughly means that they can be removed without affecting provability. Meta-safety implies conservativity, which in turn implies consistency preservation. However, his proof sketch considered an idealized version of Isabelle’s actual definition facilities, which were only partially implemented and in conflict with existing application theories. In particular—in order to stay within Gordon’s HOL and the Isabelle logical framework—he excluded the interleaving of OCDs with type definitions.

Obua [Obu06] found that Isabelle accepted OCDs that introduced inconsistencies. He gave a more rigorous and general formulation of Wenzel’s conditions [Wen97] and implemented them using an external termination checker, outside of the inference kernel. He also sketched a proof that these conditions ensured conservativity. As an alternative, Wenzel and Haftmann proposed a much simpler (and stronger) check on OCDs [HW06] inside the logical kernel, but this excluded a few application theories with ambitious overloading.

Several years later Kunčar and Popescu [KP19, KP18] rediscovered that combining overloaded constant definitions with type definitions could introduce circularities that were missed by the previous analyses. They gave extended checks to avoid these circularities and proved that these checks ensure consistency of theories that extend the initial HOL theory (comprising the standard HOL axioms) with overloaded constant definitions and type definitions. In the end [KP18], they strengthened the result to show that such definitional extensions are in fact meta-safe over the initial HOL theory.

These particular concerns were specific to detecting circular definitions, but all proof assistants potentially contain errors, like any other software. A notable case is PVS [Owr06], which lacking an LCF-style proof kernel was particularly vulnerable to soundness errors (at least in the 1990s):

PVS still seems to contain a lot of bugs and frequently new bugs show up. …It would be desirable that the bugs in PVS would only influence completeness and not soundness. Unfortunately, this is not the case, as some recent proofs of true==false

have shown [15, bug numbers 71, 82, 113 and 160]. …It is reasonable to assume that PVS will continue to contain soundness bugs. The obvious question thus arises, why use a proof tool that probably contains soundness bugs? Our answer is threefold: PVS is still a very critical reader of proofs. PVS lets fewer mistakes slip through than many of our human colleagues…. Furthermore, history tells us that the fixed soundness bugs are hardly ever unintentionally explored, we know of only a single case. Thirdly, most mistakes in a system that is to be verified are detected in the process of making a formal specification.

[GH98, page 134–5]

Notwithstanding such forbearance, users deserve a much higher standard of correctness than this. At the same time, it’s vital to stress that users need to take responsibility for their own definitions: we know how to detect circularity, but no system can check whether a definition conforms to the user’s true intentions.

4.4 Isabelle/HOL versus HOL

As mentioned at the start of this section, the purpose of the foregoing work was to make possible an Isabelle instantiation of higher-order logic. Isabelle/HOL emerged in 1991 [NP92] and was soon a fully capable alternative to HOL [NPW02]. In some ways it emulated the latter, particularly in its axiom system. Nevertheless, it is thoroughly Isabelle in its treatment of theorems, tactics, etc; we cannot regard Isabelle/HOL as a member of the HOL family the way that HOL Light is.

Both Isabelle and HOL implement Milner’s idea of a proof kernel implementing a formal calculus as an abstract type called thm. The essential difference is that Isabelle’s formal calculus is a logical framework in which other formalisms can be defined, while HOL’s is simply higher-order logic. Either way, a function of type thm->thm implements an inference rule. In the case of Isabelle, this would be an operation at the level of the logical framework, such as the resolution rule through which proofs are constructed. In the case of HOL, this would be an inference in higher-order logic itself, like the quantifier rule (1). In HOL, a tactic must be coded separately corresponding to each rule of inference, while in Isabelle, inference rules such as (1) are all expressed declaratively and applied using the generic resolution rule.

Isabelle’s generic nature can be seen in how many of its capabilities are shared among its various instances. The common libraries include a general representation of syntax, with parsing, pretty printing, type checking, simplification and other proof procedures, as well as the proof language and user interface. They are available in other instances of Isabelle, such as Isabelle/ZF, which has been the basis for substantial developments in Zermelo–Fraenkel set theory [Pau03].

5 Automation

It was clear from the outset that machine proof was extremely laborious and could only be feasible if the machine itself provided as much automation as possible. But it was also clear that fully automatic theorem proving was not practically achievable. Significant advances in automation had already been made by the end of the 1970s:

  1. decision procedures for arithmetic, arrays, lists, etc., as well as methods for combining decision procedures [NO80];

  2. resolution for first-order logic [Rob65, Ove75], complete in principle but frequently disappointing in practice;

  3. the signature automation of the Boyer/Moore theorem prover [BM79]

    : conditional rewriting plus powerful heuristics for induction.

Of these, simplifiers based on rewriting (by previously proved theorems of the form ) quickly found their way into LCF, HOL, etc. All such simplifiers eventually supported conditional rewriting, for rewrites of the form ; in such a case, the necessary instance of would be proved recursively by the simplifier itself. Arithmetic decision procedures were eventually adopted in many systems. On the other hand, resolution had acquired a bad reputation. As late as 2002, Shankar could write

The popularity of uniform proof methods like resolution stems from the simple dogma that since first-order logic is a generic language for expressing statements, generic first-order proof search methods must also be adequate for finding proofs. This central dogma seems absurd on the face of it. …A more sophisticated version of the dogma is that a uniform proof method can serve as the basic structure for introducing domain-specific automation. There is little empirical evidence that even this dogma has any validity. …Automated reasoning has for too long been identified with uniform proof search procedures in first-order logic. This approach shows very little promise.

[Sha02, pages 3–4]

Of the various proof assistants, only Isabelle saw a sustained effort to incorporate ideas from resolution.

5.1 The classical reasoner

As mentioned in Section 3 above, Isabelle supported both unification and backtracking from the start, with the aim of incorporating ideas from first-order automatic proof procedures. In the context of interactive proof, unification provided the ability to prove a subgoal of the form by removing the quantifier and proving , where stood as a placeholder for a concrete term to be supplied later. Through unification, this term could even be built up incrementally. Dually, unification provided a means of using a universally quantified fact , when the required instances were not immediately obvious.

Simple automation is achievable through a combination of obvious applications of the propositional connectives (, , , etc.) along with heuristics for performing quantifier reasoning. Stronger automation is obtainable by borrowing well-known techniques for classical first-order logic theorem proving. But the most important idea is to embrace the concepts of natural deduction in application theories as well as in pure logic. Natural deduction prefers the use of simple inference rules focusing on a single symbol. For example, conjunction is effectively defined by the following three rules:

The intersection of two sets has a technical definition that would greatly complicate reasoning, but it is easy to derive inference rules for intersection in the style of natural deduction (and analogous to those above):

Many other reasoning steps can be expressed similarly:

So the crucial idea is to build proof tactics that support reasoning with inference rules of this sort. Though they borrow techniques from first-logic theorem provers, they are far more effective than expanding the various operators into their low-level definitions and attempting to prove the resulting formulas of pure logic. And note: they would be formulas of higher-order logic, where automation is considerably more difficult than for first-order logic.

Such tactics have collectively become known as Isabelle’s classical reasoner [Pau99]. A principle of natural deduction is that the syntactic form of each rule clearly identifies which symbol it is concerned with. And therefore hundreds of such rules can coexist in the classical reasoner without causing a combinatorial explosion. The user who invokes auto—which combines classical reasoning and simplification—gains the benefit of built-in knowledge about everything in Isabelle’s standard libraries. As users build their own libraries, they can continue to augment this knowledge. This combination of classical reasoning with rewriting is still unique to Isabelle.

5.2 Sledgehammer

By the early 2000s, resolution theorem provers [RV02, Sch04, Wei01] were demonstrating power far beyond anything the classical reasoner could ever achieve. The idea of some sort of interface between Isabelle and these systems was beguiling. This idea wasn’t new: such linkups have been attempted on a number of past occasions, always unsuccessfully. The key was to make such a linkup useful:

The guiding idea is that user interaction should be minimal. The system should invoke automatic provers spontaneously or in response to a trivial gesture such as a mouse click. These proof attempts should run in the background, not disturbing the user unless a proof is found. Proofs should refer to a large library of known lemmas: users should not have to select the relevant ones. The automatic prover should not be trusted; instead, proofs should be translated back into the formalism of the interactive prover. Proofs should be delivered in source form to the user, who can simply paste them into her proof script. [MQP06, p. 1576]

A major difficulty with building an interface between Isabelle/HOL and first-order automatic theorem provers is that they operate on quite different formalisms. Isabelle/HOL has -abstractions, types and type classes, while first-order logic has none of these. One click invocation could only be achieved if the system itself took care of everything: the translation of higher-order syntax, some representation of type constraints, etc. An additional requirement was relevance filtering: to identify the most suitable of the thousands of facts available in an Isabelle session, since providing too many would overwhelm the first-order provers.

The final difficulty was to translate the proofs discovered by the external reasoning tools into Isabelle’s proof kernel. This translation moreover had to be expressed as a source-level proof so that the expensive proof search would not have to be repeated. Resolution theorem provers typically print a formal justification of their reasoning, but it is difficult to interpret and frequently ambiguous. The approach eventually adopted was to extract from their output nothing but the list of axioms—each a known Isabelle theorem—used in the proof. This list would typically contain no more than eight axioms, yielding a problem simple enough to be proved by simple tactics integrated with Isabelle’s kernel [PS07]. This tool became known as Sledgehammer. The original version was developed at Cambridge, but Sledgehammer was comprehensively rewritten at Munich, particularly by Blanchette [BBP13], who greatly increased its scope and power.

Sledgehammer helps beginners by identifying and using theorems that they didn’t know existed. It puts the latest theorem provers, such as Vampire and Z3 [dMB08], at their disposal with a single click (Figure 1). But even advanced users are often surprised by the proofs Sledgehammer comes up with. It is now seen as indispensable, and similar subsystems are being developed for other interactive theorem provers.

Figure 1: Sledgehammer GUI in Isabelle/jEdit: clicking on highlighted output inserts the proposed proof snippet into the text

6 Counterexample Search

Isabelle’s proof methods and Sledgehammer are effective for proving theorems, but given an invalid conjecture they normally fail to detect the invalidity, let alone produce an informative counterexample. Novices and experts alike state invalid theorems and find themselves wasting hours on impossible proofs. To make proving more enjoyable and productive, Isabelle includes counterexample generators that complement the proof tools. The main ones are Quickcheck and Nitpick. As a simple example, suppose that the user types in this lemma statement (where rev reverses a list and appends two lists):

lemma rev xs  ys  rev xs  rev ys Quickcheck is invoked automatically and displays the counterexample xs = a, ys = a.

6.1 Quickcheck

As the name suggests, Quickcheck is Isabelle’s counterpart to the QuickCheck testing tool [CH00] for Haskell. The original Quickcheck [BN04] combined Isabelle’s code generation infrastructure (Section 7) with random testing, covering both recursive functions and inductive predicates. It aimed at providing fully automatic counterexample search, in contrast to Haskell’s QuickCheck, which is an infrastructure for building specialized random generators for testing. Therefore it worked automatically whenever all functions and inductive predicates were executable, but, depending on the property, it might take forever to find a counterexample. Consider especially conditional properties where most of the random values simply falsify ; this situation requires a special random generator that yields only values that satisfy .

Therefore Bulwahn [Bul12a, Bul12b, Bul12c, BBN11] refined Quickcheck in three directions:

  • Exhaustive enumeration of small values, an idea due to Runciman et al. [RNL08].

  • Generating values not randomly but synthesizing generators from the premises of conditional properties.

  • Symbolic testing with narrowing, a technique from functional-logic programming [AH10].

Isabelle’s Quickcheck was inspired by the work of Dybjer et al. [DHT03] in the theorem prover Agda: they had followed the original QuickCheck design, which expected users to set up specialized random generators in the host language/logic. Over the next decade, other major theorem provers adopted analogous checkers: PVS [Owr06], ACL2 [CDKM11] and Coq [PHD15].

Quickcheck is one of Isabelle’s best-loved tools (after Sledgehammer), partly because it is invoked automatically and silently every time the user types in a lemma. You may not even be aware of its existence and it suddenly announces that it has found a counterexample to your purported lemma. The result is a surprised and grateful user. Quickcheck is most effective in the context of functional programming combined with inductive predicates.

6.2 Nitpick

A radically different approach to Quickcheck is based on systematic model enumeration using a SAT solver. This approach was pioneered by the tool Refute [Web05, Web08] and is now embodied by Nitpick [BN10, Bla12]. Nitpick looks for finite fragments (substructures) of infinite countermodels, soundly approximating problematic constructs. Common Isabelle idioms, such as inductive and coinductive predicates and datatypes as well as recursive and corecursive functions, are treated specially to ensure efficient SAT solving [Bla13]. The actual reduction to SAT is performed by the Kodkod library [TJ07]. Given a conjecture, Nitpick (via Kodkod and the SAT solver) searches for a standard set-theoretic model that falsifies it while satisfying any relevant axioms and definitions. Nitpick is innately better suited to problems from set theory and logic than Quickcheck. Nitpick revels in particular in finite combinatorial problems.

The first tool that exploits SAT-solving for finding counterexamples in a theorem prover (ACL2) seems to be due to Sumners [Sum02]. A second prototype tool, again for ACL2, was developed by Spiridonov and Khurshid [SK07] and was based on Kodkod. Blanchette was unaware of this work when developing Nitpick.

7 Code Generation

Code generation is the process of generating efficiently executable code from definitions in the logic of a theorem prover. It serves three main purposes: to obtain actual runnable software, to validate definitions by executing them on concrete values, and to search for counterexamples to properties by testing.

Most major theorem provers are based on logics that have an executable sublanguage. In the Boyer and Moore theorem prover, from its earliest incarnation [BM79] to present-day ACL2 [HKMS17], the logic is itself a purely functional fragment of Lisp: all expressions can be executed according to Lisp semantics. In Martin-Löf type theory [ML84], the term language — including proof terms — has a well-defined operational semantics. Terms are also executable, subject to certain conditions, in other type-theory based systems such as Coq [BC04]. Note that the types (which correspond to the formulas of predicate logic) are not executable.

In the case of HOL, a sublanguage can be identified that corresponds to a functional programming language. Then we superimpose an operational semantics on this fragment: programs are sets of equations that are to be used as rewrite rules, from left to right. This idea goes back to term rewriting and was expressed succinctly by titles like Computing in Systems Described by Equations [O’D77] and Programming with Equations [HO82]. The step-by-step execution of an equational program by rewriting corresponds to performing a proof in equational logic. If the rewrite rules come from recursive function definitions in the first place, it is natural to translate them to programs in an ML-like functional language with pattern-matching. The first such tool was created by Rajan [Raj93] for Gordon’s HOL system [GM93].

7.1 History

Berghofer and Nipkow [BN02] realised the approach above, creating a compiler in Isabelle from a subset of HOL into ML. Datatypes and recursive functions defined by pattern matching are translated directly into their ML counterparts. They also extended the approach by making a subset of inductive predicates executable. The idea is to view them as Prolog programs and to perform a mode analysis [DW88]. Each possible mode partitions the arguments of an inductive predicate into inputs and outputs, and for each mode the inductive predicate is compiled into an ML function that maps an input tuple to a stream of output tuples. A subset of HOL terms and certain queries involving inductive predicates can now be compiled and executed in ML. This formed the basis of Isabelle’s initial Quickcheck (Section 6.1).

Isabelle’s code generator is part of the trusted kernel. Work by Haftmann [Haf09] and others improved its reliability and functionality significantly: code generation for inductive predicates was moved out of the kernel by translating inductive predicates into recursive functions inside HOL, where the equivalence is proved [BBH09]. Reliability of the compilation of the purely functional sublanguage was improved by a pen-and-paper correctness proof [HN10]. Code generation was extended to type classes by eliminating them in a first step. Code generation was extended to support data refinement [HKKN13], i.e. the automatic implementation of abstract types like sets by concrete types like search trees. Due to a new modular design, further target languages are easier to add: as of this writing, the code generator supports Standard ML, OCaml, Haskell and Scala.

Most recently, Hupel [HN18] has provided an alternative verified code generator that translates HOL into CakeML, an ML-like functional language with a verified compiler [KMNO14]. CakeML is the only backend for code generation that doesn’t have to be trusted, since the CakeML compiler has itself been verified formally. This yields a verified tool chain from HOL to machine code, except that the CakeML compiler was verified in HOL4 [SN08] rather than Isabelle/HOL. Eliminating this gap is the subject of current research.

7.2 Applications

The code generator has been an enabling technology for a large number of applications. The following are some representative examples:

Imperative HOL

Bulwahn et al. [BKH08] developed a monadic embedding of imperative programs in HOL. That is, on the HOL level everything is still purely functional, but in a monadic style. They extended the code generator such that it translates these monadic definitions into actual imperative code in the target language (SML etc.). This leads to substantially improved performance and is used in a number of applications below.

Refinement Framework

Lammich [Lam13, Lam16, Lam19] has developed an HOL framework for the stepwise refinement of (possibly nondeterministic) algorithms down to (possibly imperative) executable code. This framework is used in a number of applications below. Lammich and others [LT12, Lam14, LS19] have used the framework extensively for the verification of efficient graph algorithms.

Model Checking and SAT Solvers

Esparza et al. [ELN13] developed an executable verified model LTL model checker (10–50 times slower than SPIN [Hol97] on standard benchmarks) that was later extended with partial order reduction [BL18]. Brunner and Lammich built on work by Peled [Pel96] but found that one of his lemmas was incorrect; thus they were unable to use his actual reduction algorithm. Siegel [Sie19] found a counterexample to the correctness of Peled’s algorithm with the help of the Alloy analyzer [Jac06].

Wimmer and Lammich [WL18] developed an executable model checker for timed automata whose throughput is about one order of magnitude lower than Uppaal’s [LPY97] on standard benchmarks (but degenerates for large state spaces).

Lammich [Lam17] verified an executable checker for unsatisfiability certificates emitted by SAT solvers which is twice as fast as the standard unverified checker.

Term Rewriting

Thiemann has been developing a huge formalisation of the theory of term rewriting called IsaFoR/CeTA555 over more than a decade now [TS09]. Initially, IsaFoR/CeTA was aimed primarily at checking termination proofs found by automatic tools like AProVE [GAB17]. The code generator produces these proof checkers from their verified HOL formalizations. Today, IsaFoR/CeTA can also check proofs of a term rewriting system’s complexity [DJK18] and confluence [NM16].

Computer Algebra

As representative examples we mention decision procedures for univariate real polynomials (based on Sturm [Ebe15] and on cylindrical algebraic decomposition [LPP19]) and the Berlekamp-Zassenhaus factorization algorithm [DJTY19].

Programming Languages

One key application of theorem provers has been the formalization of programming languages and compilers [Ler09, KMNO14, NK14]. The code generator was used by Lochbihler and Bulwahn [LB11] to generate an interpreter directly from the semantics of a Java-like language with threads.

8 Structured Proofs, Structured Specifications and Formal Contexts

In the beginning, formal proofs were tiny and the ML proof scripts were readable enough. A typical proof goal involved just a few assumptions involving two or three bound variables. But as the field progressed, researchers tackled increasingly ambitious problems. The longer proofs got, the more incomprehensible they became. Moreover, the traditional LCF approach of tactical proof had a tendency to retain too much, so that the user might be faced with a list of several dozen assumptions. These were not merely overwhelming to the eye but caused automatic proof procedures to bog down: their execution time could rise exponentially.

The Isar proof language, introduced in the late 1990s, addressed these concerns by allowing proofs to be structured into nested scopes. Local goals were proved from local assumptions, which were written out explicitly. Block structure is well understood in computer science, but here the ability to make declarations locally had to be retrofitted into Isabelle. This led to the idea of local contexts to encapsulate the assumptions (and associated bound variables) specific to a particular goal, along with other information. During an ambitious proof, the user may still have several dozen assumptions available, but these are now structured through the nesting of the contexts and accessible by name rather than in a single giant list.

Locales are a further structuring mechanism for expressing an extended series of proofs that rest on shared assumptions. They are useful for developing abstract mathematics, such as group theory, which can then be applied to particular groups.

8.1 Structured Proofs: the Isar Language

A distinctive feature of Isabelle is its Isar language of structured proofs [Wen07]; the acronym stands for Intelligible semi-automated reasoning. In the original LCF paradigm, a proof could be arbitrary ML code, which in some later systems was replaced by a command language for proofs (e.g. the Ltac scripting language in Coq). The problem with these traditional approaches is that somebody looking at a machine proof can have no idea what is being proved at a given point: it is like playing blindfold chess. In contrast, an Isar proof is a hierarchical structure containing explicit statements of assumptions and conclusions, with an indication of the use of local facts. Isar also provides some mechanisms to avoid redundancy: it allows the proof author to achieve a good balance of readability versus maintainability, such that small changes to definitions and theorem statements should lead to reasonably small changes to proofs.

Here is a tiny Isar proof that implicitly uses some derived rules for logical connectives taken from the library (similar to the classical reasoner from Section 5.1):

have A  B  B  A  proof    assume  A  B    show B  A    proof      from  show B       from  show A     qed  qed Here we prove a formula, A  B  B  A. We first assume A  B, giving it the label . Then we show B  A, treating B and A separately. Those subproofs refer to the assumption via its label.

The Isar approach scales from a few primitive inferences, as above, to large proof developments involving heavy automated reasoning tools, allowing the user to control the extent of proof automation. The proof engine is able to check well-structured Isar proofs more efficiently than traditional tactic scripts: the hierarchical structure helps to keep internal goals concise, without the intrusion of redundant assumptions or unused lemmas. Moreover, the compositionality of Isar proofs, together with the proof irrelevance of the Isabelle framework, allows independent checking of sub-proofs—even with parallel checking on multiple cores enabled by default, due to Matthews and Wenzel [MW10, Wen13b].

The Isar proof language reuses some ideas from Mizar, a legendary software tool for doing mathematics by machine [GKN15]. But the many impressive facilities of the Mizar language are intertwined with its unusual set theoretic formalism, making the key ideas difficult to extract, especially given Mizar’s notorious lack of documentation and closed sources. In contrast, Isar’s generic principles of proof—fixing local variables and making local assumptions—are identified with the corresponding elements of Isabelle’s logical framework. Thus it works with any object-logic that uses the builtin Natural Deduction paradigm of Isabelle, e.g. HOL, FOL, ZF. Various derived Isar language elements provide explicit proof structure for typical reasoning seen in object-logics: existential elimination and case-splitting, calculational chains of equalities and inequalities (including substitution), structured induction etc. All of this is parameterised by declarations in the library.

The Isar proof language is parametrised by user-defined Isar proof methods, which are the old idea of ML tactics fitted into the richer structure of the Isar engine. Thus new reasoning patterns may be added to a theory library without revisiting the design of the proof language itself. For example, the Isabelle standard library defines method rule for declarative forward/backward chaining explained in Section 3, and method auto for a combination of the simplifier and the classical reasoner explained in Section 5.1. Proof methods are either defined in ML (as in LCF or HOL), or in the Eisbach language [MMW16] (similar to Ltac definitions in Coq): Eisbach uses the source notation of existing proof methods to define new ones via simple recursion and pattern matching.

Despite its importance for the Isabelle ecosystem, the Isar proof language has not been adopted by other interactive theorem provers. There are no fundamental obstacles to doing so, but it takes some effort to do properly. Some isolated aspects of Isar have made it into the SSReflect language for Coq [GM10], notably the have keyword for local claims within a proof. A bit more Isar syntax made it into the Lean prover [dMKA15] with slightly different meaning, though. Lean’s fix, assume, have, show, obtain construct certain -terms in a more elementary manner than the Isabelle/Isar proof context export and goal refinement operations; e.g. see the treatment of fix-assume-show in [Wen07, §2.2]. The experiments by Wiedijk towards supporting structured proofs in HOL Light [Wie01] are not directly related to Isar. They are simply a family of HOL Light tactics that allow a proof script written in ML to have some similarity to a Mizar text. This style of working has not caught on in the HOL world.

Although some detailed aspects of Isar are specific to Isabelle—forward/backwards refinement via higher-order unification (Section 3) and local proof contexts (Section 8.3)—the language design as a whole is generic. It requires nothing from the underlying calculus but primitive notions of terms and types and logical notions equivalent to “implies” and “for all”. It’s not too hard to envisage a common language to be shared among proof assistants, offering portability of proofs exactly as today’s programming languages offer portability between different machine architectures. The crucial missing ingredient is the ability to write assertions and prove subgoals while minimising explicit references to a particular calculus.

8.2 Global Theory Context

The original LCF approach (Section 2) declares an abstract ML type thm of theorems, but the background theory is implicit in the system state: during a session there is only one big theory under development, which grows monotonically. There are primitives to introduce new types, terms, definitions and axioms which augment the theory, but without any way to undo such a change. HOL provides some tricks to hide unwanted constants via name-space manipulation, to help interactive development.

In contrast, as early as 1986, Isabelle declared the abstract ML type theory alongside the type of theorems. Type theory makes available the initial theory of the logical framework and allows for its extension as an acyclic graph of application theories: there are operations to extend and merge theories. Multiple theories can coexist in a single Isabelle/ML session, but end-users only work with a single theory document at a time. This may import other theory documents—in foundational order from bottom to top. The concrete syntax looks like this:

theory Test    imports Main HOL-LibraryFiniteMap HOL-LibraryFiniteLattice  begin    definition constant  term    theorem name statement    end Here we declare a new theory, called Test. It is built upon three other theories: Main, FiniteMap and FiniteLattice. The last two names are qualified with the Isabelle session to which they belong: HOL-Library. Qualified names eliminate the danger of name clashes between theories belonging to different sessions. Between the begin and end brackets we can make definitions and prove theorems.

Proved results are formally certified against their original theory context, e.g. a theorem for theory , and are implicitly propagated to another theory , provided that extends by construction. For efficiency, this theory relation is implemented via symbolic stamps that represent definitions, proofs, etc., extending a theory: actual theory content is not compared. Stamp inclusion needs to be checked in every inference step: its complexity is logarithmic in the number of theory extensions, which can be many thousands in typical applications. These mechanisms lie within the logical kernel.

In early versions of Isabelle as a logical framework, theories coincided with object-logics plus some examples on top of them: Isabelle/FOL, Isabelle/ZF, Isabelle/HOL etc. Derivations in different branches of the theory graph could coexist in a single file without interference. Today, theories are usually derived from the Main entry-point of Isabelle/HOL and built as a natural hierarchy according to the structure of the application. It helps users to organise formal definitions and proofs like consecutive chapters in a book; it also helps the system for parallel checking of independent paths in the theory graph. Figure 2 shows the theory hierarchy of the Isabelle/HOL number theory development. At the top are sessions such as Pure and HOL-Library; in the middle are theories such as Totient, all of which are imported into Number_Theory.

Figure 2: The theory dependencies of session HOL-Number_Theory

Isabelle theory operations are purely functional updates, making undo trivial: the earlier versions continue to exist and can be returned to. This also includes add-on content like the ML environment or hints for proof tools: thus the theory context provides a default set of parameters, according to the imports from the library.

In summary, Isabelle theories provide large-scale structure to formalisation projects, with a built-in notion of monotonic reasoning over an acyclic graph of theory nodes.

8.3 Local Proof Context

LCF and the HOL family lack an explicit notion of proof context. There are usually some auxiliary structures to manage the current proof state (maybe just a list of subgoals) for interactive theorem proving. However, proof tools cannot refer to those; they merely see an isolated goal. Such a goal is essentially a sequent: a list of formulas (the assumptions) paired with another formula (the goal itself). Proof assistants based on dependent type theories (like Coq) do have a formal context that declares variables, but this again is essentially a sequent. When applying HOL tactics it is frequently necessary to re-state the types of variables present in the statement of the theorem being proved, and one can even introduce two variables called x with different types. So at its most basic, we need a context to associate a type with each variable involved in the current proof. In fact, they do much more.

A genuine Local Proof Context implemented by the ML type Proof.context first appeared in Isabelle99, as infrastructure for the then emerging Isar proof language (Section 8.1). The idea is to support a block-structured notepad with recursive nesting of local declarations (e.g. type parameters, term parameters, assumptions) and local conclusions that are generalized when leaving a nested block: there is a generic export operation to move results from a nested context into the enclosing context: this usually turns context elements into the rule structure of the Isabelle logical framework (the quantifier and connective ). In concrete syntax, this looks as follows:

notepad  begin          fix x y z      assume A x and B y      have C x y z          note x y z A x  B y  C x y z  end

Here the final note recalls the result from the preceding proof block (enclosed in the curly brackets). This block introduces three bound variables, assumed to be fixed and to satisfy A x and B y; from these we prove C x y z. The effect of this block is to prove the formula shown in the note.

The notepad leaves nothing behind in the theory context: it is merely an experiment. In contrast, a theorem statement produces an initial proof context with a pending claim: the theorem to be proved. The subsequent proof body needs to solve that in the context, potentially with further nesting of local contexts and auxiliary claims. The final result extends the enclosing theory (section 8.2) by the new fact.

Proof tools may access the Proof.context value at each point. It contains the logical content (parameters, assumptions, proved facts), local syntax, and hints to guide proof strategies. Thus the universal Proof.context replaces Isabelle’s earlier tool-specific contexts like simpset for the simplifier or claset for the classical reasoner. This uniformity is important to combine tools: e.g. the simplifier uses the context to extract its own information (rewrite rules and auxiliary proof procedures); during the simplification process it augments the context and passes it on to other procedures, which in turn extract their own information from it (e.g. for classical reasoning). Users can even extend the context with additional components to support proof procedures specific to their own applications. Contexts provide extensibility along with modularity.

In summary, a proof context in Isabelle represents a local situation derived from the enclosing theory. Nested contexts appear and disappear; only exported results remain. For proof tools, the context is a universal environment for storing tool-specific information.

8.4 Structured Specifications: Locales

Isabelle locales provide an infrastructure for structured specifications: definitions, statements and proofs of a theory may depend on local parameters (type and term variables) and local premises (hypotheses). The subsequent example specifies partial orders axiomatically and defines a derived operation and theorem in that specification context:

locale partialorder     fixes le  a  a  bool  infixl  50    assumes refl x  x      and trans x  y  y  z  x  z      and antisym x  y  y  x  x  y This introduces a new locale, called partialorder. The locale declares le, effectively a constant to which we attach infix syntax. The locale asserts what are effectively axioms. But in reality, a locale abbreviates a predicate taking in this case a single argument, asserting that it satisfies the given assumptions. The point is to allow this small specification to be imported simply by quoting the name partialorder.

definition in partialorder    less  a  a  bool  infixl  50    where x  y  x  y  x  y  theorem in partialorder    lessletrans x  y  y  z  x  z   Now we can make definitions and prove theorems with respect to this locale, its constants and assumptions visible within these declarations.

Locales may be combined via locale expressions, to rename or instantiate parameters and merge contexts. Locale interpretation imports a given instance of a locale expression into an application context: after proving the locale assumptions as theorems, all conclusions of the locale context become available as facts. Here is an example that instantiates abstract partial orders to natural numbers:

interpretation nat partialorder   nat  nat  bool    rewrites natless    nat  nat  bool  

Thus le is instantiated as the standard order on type nat, and the derived operation less is identified with the corresponding strict order on nat (imposing a proof obligation that needs to be proven together with the other locale axioms). Afterwards, all conclusions from the partialorder context become available in terms of these native signatures on nat.

Locales are ubiquitious in Isabelle theory development today, but the majority are actually defined as type classes (section 4). These concepts started out independently, but were unified by Haftmann and Wenzel [HW06]. This is another application of the locale interpretation concepts, which are due to Ballarin [Bal06].

There are many ways of structuring mathematical developments via locales. One is simply to package up a group of related assumptions that are needed in a long series of proofs: this avoids cluttering up the theorem statements with this common material. Moreover, locales are more general than type classes. A type can belong to a class in only one way, so duality arguments (reversing the direction of a partial order for example) are out of the question; locales however can be instantiated in multiple ways at the same time. And while the type class for groups imposes the group structure on entire types, a locale for groups can have the carrier of the group as an explicit component of the locale; it can therefore be an arbitrary set. This generality is necessary to develop algebra properly. Isabelle’s HOL-Algebra session develops a substantial amount of group theory using this approach.

Ballarin [Bal14] provides a comprehensive overview of the concepts, and a tutorial on locales is part of the standard Isabelle documentation. Many people have participated in the development of locales over almost 15 years. The original work by Kammüller, Wenzel, and Paulson [KWP99] directly uses primitives of the Isabelle framework; important abstractions on top of proof contexts (section 8.3) were introduced by Haftmann and Wenzel [HW09].

LCF did not have any concept comparable to locales. But the need for structuring mechanisms has been evident for a long time, and a variety of ideas were tried in systems as diverse as AUTOMATH, Coq and HOL. The original locale concept was inspired by experiments (since abandoned) that had been done in HOL using higher-order predicates. Coq has a richer logic with a built-in notion of “sections”: the original Isabelle locales from 1999 [KWP99] were inspired by that, but without augmenting the logical framework of Isabelle. This design principle of building higher concepts without extending the existing logic was later adopted by Coq: its type classes are built on top of existing concepts like records, predicates, and implicit arguments.

8.5 ML within the Logical Context

LCF and HOL can be seen as being nothing but libraries of proof procedures written in ML. The user can call those procedures via the ML toplevel but does not have to prove theorems and could instead, say, calculate to 10,000 digits. In this sense, using LCF or HOL is the same as working with any other subroutine library.

This type of ML toplevel no longer exists in Isabelle. Instead, ML has become a sub-language of the framework. Syntactically, ML expressions (or whole modules) may appear within the theory and proof language of Isabelle/Isar: for example, in the command method_setup to define a new proof method. Semantically, program snippets may depend on symbolic entities from the logical context: types, terms, facts, etc. The ML compiler is invoked at run-time within an augmented environment; it refers to logical entities as well as its own environment (for ML types, values, modules). The static result is an updated ML environment within the context: thus it also conforms to the parallel evaluation model of Isabelle/ML.

Specialised proof procedures can be implemented in this manner within the normal theory document, using the regular Prover IDE (see Section 9). This integration of programming with logic works without augmenting the logic: like in original LCF, ML has access to the implementation of the object-logic, but is not part of the logical formalism. This is in contrast to Coq, where users often implement proof tools inside the logical language itself (with correctness proofs), but genuine extensions in OCaml need to be assembled outside the system as “plugins”.

9 Document-oriented Interaction: the Prover IDE

The now ubiquitous WIMP interface (windows, icons, menus, pointer) emerged in the late 1970s, around the same time as LCF. Many observers suggested that the tedium of theorem proving could be addressed by involving those new ideas. However, typical suggestions did not address the real difficulties. A favourite was to let users point to terms that must be rewritten. But in most situations it is infinitely more effective to execute such steps automatically, driven by an algorithm. Think of dragging 10,000 files one-by-one to a trash can icon when they could be deleted by a single command specifying a pattern. Many suggestions were geared to the needs of novices rather than to the professionals who would be the main users.

Useful interfaces for theorem provers would not appear for 20 years. What users really needed, it seems, was the ability to survey the situation around them: what has been proved, what remains to be proved, what theorems are available, where and how they were proved, etc.

9.1 Prover Interfaces: The Early Days

The original LCF proof assistant from 1979 used a line-oriented terminal or teletype. This model is known as a read-eval-print loop (or REPL): the user types one command after another, reacting on output printed by the prover. When computer screens and multiple windows arrived, there was often a split into two areas: the editor to work on a growing “proof script” and the terminal with the REPL to update its state, using manual copy-paste operations from the editor.

Around 1998, the highly influential Proof General interface for Emacs was released for the first time [Asp00], including support for Coq and Isabelle. Here a refined model of copy-paste and state-synchronization is baked into the editor (which is freely programmable in Lisp): the user can move a frontier of already checked text either forwards (apply command) or backwards (undo command); only the unchecked part may be freely edited. Proof General requires suitable undo operations of the prover, and for robustness it is better to have a restricted command-language instead of arbitrary ML. Both are missing in HOL (any version), which consequently still uses the original prover REPL, with some support through Emacs.

Today, Coq remains as the main back-end for Proof General Emacs, there is also a popular Proof General clone written in OCaml: CoqIde. In contrast, Isabelle discontinued both the REPL and its Proof General mode in 2014: interaction now works exclusively via the document-oriented Prover IDE.

9.2 The Isabelle Prover IDE (PIDE)

From 2008, multi-threaded ML programming became routinely available in Isabelle, for parallel processing of theories and proofs within a single Poly/ML process. This posed some challenges to the robustness and performance of the prover engine, addressed subsequently by Matthews and Wenzel [Wen13b].

Parallel processing is also in conflict with the traditional interaction model: the REPL acts like a single focus of single-threaded command application. In order to remove many built-in assumptions of sequential evaluation from the interaction model and to provide rich semantic information in proof authoring process, Wenzel introduced the document-oriented Prover IDE (PIDE) approach [Wen11, Wen13a, Wen14, Wen19]. The main principles of PIDE are as follows:

  • The prover supports document edits and markup reports natively. Interaction works via protocol commands (like Document.update) that take regular prover commands as data (e.g. definition, theorem). It has its own policies to process proof documents in parallel, according to the structure of the text.

  • The editor connects the physical world of editor input events and GUI painting to the mathematical document-model of the prover. There are pipelines to stream input and output events asynchronously, with explicit identification of document versions.

  • Add-on tools may participate in the ongoing document processing by conventional means, as isolated functions from input to output that are managed by PIDE. External tools merely need to ensure that interrupts work correctly: this is required when the user continues editing and old versions of the document are discontinued eventually.

Unlike Proof General, PIDE never locks the source text: edits by the user may lead to instantaneous updates by the prover, or significant delays for slow proof tools. Thanks to overall performance improvements of Isabelle and its underlying Poly/ML implementation, this ambitious interaction model works smoothly.

Isabelle/PIDE is delivered to end-users as a fully integrated desktop application called Isabelle/jEdit: it is based on the Java-based text editor jEdit (see This explains the initial motivation to use the Java platform for the outwards facing side of PIDE, which is implemented in Scala (on the JVM). Non-Java editors (e.g. VSCode) may be connected to PIDE via an extra socket connection that exchanges JSON records. There is also a Headless PIDE server with a similar protocol; this allows PIDE to run under control of another program.

Generally speaking, Isabelle/ML works best for pure applications of mathematical logic inside the prover, but Isabelle/Scala allows us to connect to the physical world: IDE front-ends, database engines, TCP services etc. Such technologies are not available in the same quality in Standard ML, nor even in OCaml (which underlies HOL Light and Coq).

10 The Archive of Formal Proofs

Proof libraries are of enormous importance to formal verification. They are analogous to software libraries, facilitating reuse and eliminating the need to construct everything from scratch.

The Isabelle distribution already comes with a basic collection of more than 700,000 lines (38 MB) of Isabelle/HOL theories. On top of it sits the Archive of Formal Proofs (AFP, see, a large online collection of proof developments contributed by the Isabelle community, all of them (as of this writing) for Isabelle/HOL. Each entry or article is a collection of Isabelle theories. It is the sole shared library of the Isabelle community. The AFP was launched in 2004 and at the time of writing contains 480 articles written by 322 authors. These comprise 2,250,000 lines (152 MB) of Isar text proving 134,000 theorems and supporting lemmas. It covers both computer science and mathematics.

As with a scientific journal, there is a small editorial board and submissions are reviewed for proof style and relevance. The AFP is an online resource and therefore more dynamic than a normal scientific journal. Articles can and do evolve. This conflicts with the purpose of archiving entries as they have been submitted and with the purpose of providing a stable interface to users. However, true preservation requires ensuring that entries continue to work as Isabelle itself evolves. The AFP deals with this conflict as follows. For each Isabelle release there is a corresponding AFP release. There is a separate development version of the AFP that is updated by Isabelle developers and AFP authors: Isabelle developers maintain all entries to be up to date with the current Isabelle development version; authors can update their articles monotonically by adding further material while ensuring that all entries that depend on theirs still work.

As Isabelle evolves, the self-imposed requirement to maintain all AFP articles in working order puts a burden on the Isabelle developers. At the same time, it acts as a reality check, helping developers to evaluate the impact of their changes on the user community. Isar’s structured proofs assist the maintenance effort by localising the impact of changes. Sledgehammer is also invaluable when fixing unfamiliar proofs.

The model for the AFP is the Mizar Mathematical Library [Miz, BBG18] which was started in 1989. The statistics today are similar: 250 authors, 1300 articles, 60,000 theorems, 3,000,000 lines (97 MB) of text. Of course one has to take into account that the languages and logics are different and Mizar has less proof automation.

11 Postscript: Synergy between Ideas

We have looked at a wide variety of ideas: a proof kernel written in a functional language; a logical framework to support multiple formalisms; polymorphism and type classes; advanced forms of automation; a structured proof language; a unique Prover IDE. Having seen the ideas in isolation, it’s worth looking at how they work in combination.

Sometimes one idea led to another straightforwardly. Unification and backtracking were included by design to support future automation. The low-level primitives of the logical framework (implication and universal quantification) provided the right foundation for the Isar language precisely because they were the most fundamental logical concepts.

In other cases, the synergy between ideas could not have been predicted. It’s remarkable that the early decision to adopt a polymorphic functional programming language (ML) turned out to be crucial 40 years later: the Isabelle Prover IDE is intertwined with pervasive parallelism. The Isabelle code base was not purely functional, but it was close enough, and the few sections that were necessarily imperative could be isolated easily. This yields an impressive speed up in multicore environments, a machine architecture nobody could have expected in 1975.

Another example of synergy is between automation (notably Sledgehammer) and the Isar language. The latter allows us to write a derivation as a chain of simple steps, which can be proved automatically by the former. If a link of this chain needs a different sort of proof, such as induction, a nested scope can be inserted on the spot. This technique is particularly helpful to beginners, who would otherwise have to learn a great many specialised proof methods for transforming one formula into another, as with other proof assistants. But even experts don’t have to think so hard when writing proofs, which are also easy to read because the chain of steps is written out explicitly. Powerful automation means the proofs don’t have to be too detailed.

The earliest design decisions, dating from Edinburgh LCF, still make sense 40 years on. Our choice of a polymorphic functional language, a minimal proof kernel and no stored proofs yields good performance with a minimum risk of soundness errors; contrast that with alternative choices in many other automated theorem provers. Our focus on pervasive automation and readability must be contrasted with the prevailing tendency for low level, “write only” proofs. Our varied ideas have produced a system that looks like a unified whole, despite being the product of many people’s contributions666Included in each release is a file entitled CONTRIBUTORS, but it only goes back to 2005 and has many omissions. over several decades.


We thank the referees, Jasmin Blanchette, Cliff Jones, Michael Norrish and Andrei Popescu for valuable comments on drafts of this paper. The work reported above was funded by the British EPSRC, the German DFG and various European Union funding agencies.


  • [AH10] Sergio Antoy and Michael Hanus. Functional logic programming. Commun. ACM, 53(4):74–85, 2010.
  • [Art16] Rob Arthan. On definitions of constants and types in HOL. J. Automated Reasoning, 56(3):205–219, 2016.
  • [Asp00] David Aspinall. Proof General: A generic tool for proof development. In Susanne Graf and Michael Schwartzbach, editors, European Joint Conferences on Theory and Practice of Software (ETAPS), volume 1785 of LNCS. Springer, 2000.
  • [Bal06] Clemens Ballarin. Interpretation of locales in Isabelle: Theories and proof contexts. In Jonathan M. Borwein and William M. Farmer, editors, Mathematical Knowledge Management, 5th International Conference, MKM 2006, volume 4108 of Lecture Notes in Computer Science, pages 31–43. Springer, 2006.
  • [Bal14] Clemens Ballarin. Locales: A module system for mathematical theories. J. Autom. Reasoning, 52(2):123–153, 2014.
  • [Bar77] J. Barwise. An introduction to first-order logic. In J. Barwise, editor, Handbook of Mathematical Logic, pages 5–46. North-Holland, 1977.
  • [BBG18] Grzegorz Bancerek, Czeslaw Bylinski, Adam Grabowski, Artur Kornilowicz, Roman Matuszewski, Adam Naumowicz, and Karol Pak. The role of the Mizar Mathematical Library for interactive proof development in Mizar. J. Autom. Reasoning, 61(1-4):9–32, 2018.
  • [BBH09] Stefan Berghofer, Lukas Bulwahn, and Florian Haftmann. Turning inductive into equational specifications. In Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors, Theorem Proving in Higher Order Logics, volume 5674 of LNCS, pages 131–146. Springer, 2009.
  • [BBN11] Jasmin Christian Blanchette, Lukas Bulwahn, and Tobias Nipkow. Automatic proof and disproof in Isabelle/HOL. In C. Tinelli and V. Sofronie-Stokkermans, editors, Frontiers of Combining Systems (FroCoS 2011), volume 6989 of LNCS, pages 12–27. Springer, 2011.
  • [BBP13] Jasmin Christian Blanchette, Sascha Böhme, and Lawrence C. Paulson. Extending Sledgehammer with SMT solvers. Journal of Automated Reasoning, 51(1):109–128, 2013.
  • [BC04] Yves Bertot and Pierre Castéran. Interactive Theorem Proving and Program Development: Coq’Art: The Calculus of Inductive Constructions. Springer, 2004.
  • [BKH08] Lukas Bulwahn, Alexander Krauss, Florian Haftmann, Levent Erkök, and John Matthews. Imperative functional programming with Isabelle/HOL. In Otmane Aït Mohamed, César A. Muñoz, and Sofiène Tahar, editors, Theorem Proving in Higher Order Logics, 21st International Conference, TPHOLs 2008, volume 5170 of Lecture Notes in Computer Science, pages 134–149. Springer, 2008.
  • [BL18] Julian Brunner and Peter Lammich. Formal verification of an executable LTL model checker with partial order reduction. J. Autom. Reasoning, 60(1):3–21, 2018.
  • [Bla12] Jasmin Christian Blanchette. Automatic Proofs and Refutations for Higher-Order Logic. PhD thesis, Technical University Munich, 2012.
  • [Bla13] Jasmin Christian Blanchette. Relational analysis of (co)inductive predicates, (co)algebraic datatypes, and (co)recursive functions. Software Quality Journal, 21(1):101–126, 2013.
  • [BM79] Robert S. Boyer and J Strother Moore. A Computational Logic. Academic Press, 1979.
  • [BN02] Stefan Berghofer and Tobias Nipkow. Executing higher order logic. In P. Callaghan, Z. Luo, J. McKinna, and R. Pollack, editors, Types for Proofs and Programs (TYPES 2000), volume 2277 of LNCS, pages 24–40. Springer, 2002.
  • [BN04] Stefan Berghofer and Tobias Nipkow. Random testing in Isabelle/HOL. In J. Cuellar and Z. Liu, editors, Software Engineering and Formal Methods (SEFM 2004), pages 230–239. IEEE Computer Society, 2004.
  • [BN10] Jasmin Christian Blanchette and Tobias Nipkow. Nitpick: A counterexample generator for higher-order logic based on a relational model finder. In Matt Kaufmann and Lawrence C. Paulson, editors, Interactive Theorem Proving, volume 6172 of LNCS, pages 131–146. Springer, 2010.
  • [Bul12a] Lukas Bulwahn. Counterexample Generation for Higher-Order Logic using Functional and Logic Programming. PhD thesis, Technical University Munich, 2012.
  • [Bul12b] Lukas Bulwahn. The new Quickcheck for Isabelle: Random, exhaustive and symbolic testing under one roof. In Chris Hawblitzel and Dale Miller, editors, Certified Programs and Proofs, volume 7679 of LNCS, pages 92–108. Springer, 2012.
  • [Bul12c] Lukas Bulwahn. Smart testing of functional programs in Isabelle. In Nikolaj Bjørner and Andrei Voronkov, editors,

    Logic for Programming, Artificial Intelligence, and Reasoning

    , volume 7180 of LNCS, pages 153–167. Springer, 2012.
  • [CDKM11] Harsh Raju Chamarthi, Peter C. Dillinger, Matt Kaufmann, and Panagiotis Manolios. Integrating testing and interactive theorem proving. In David Hardin and Julien Schmaltz, editors, 10th International Workshop on the ACL2 Theorem Prover and its Applications, ACL2 2011, volume 70 of EPTCS, pages 4–19, 2011.
  • [CH00] Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In Martin Odersky and Philip Wadler, editors, Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP ’00), pages 268–279. ACM, 2000.
  • [CM87] W. F. Clocksin and C. S. Mellish. Programming in Prolog. Springer, 3rd edition, 1987.
  • [DHT03] P. Dybjer, Q. Haiyan, and M. Takeyama. Combining testing and proving in dependent type theory. In Theorem Proving in Higher Order Logics, volume 2758 of LNCS, pages 188–203. Springer, 2003.
  • [DJK18] Jose Divasón, Sebastiaan J. C. Joosten, Ondrej Kuncar, René Thiemann, and Akihisa Yamada. Efficient certification of complexity proofs: formalizing the Perron-Frobenius theorem (invited talk paper). In June Andronick and Amy P. Felty, editors, 7th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2018, pages 2–13. ACM, 2018.
  • [DJTY19] Jose Divasón, Sebastiaan J. C. Joosten, René Thiemann, and Akihisa Yamada. A verified implementation of the Berlekamp-Zassenhaus factorization algorithm. J. Autom. Reasoning, 2019. published online.
  • [dMB08] Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In C. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer, 2008.
  • [dMKA15] Leonardo Mendonça de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. The Lean theorem prover (system description). In Amy P. Felty and Aart Middeldorp, editors, Automated Deduction — CADE-25, volume 9195 of Lecture Notes in Computer Science, pages 378–388. Springer, 2015.
  • [DW88] Saumya K. Debray and David S. Warren. Automatic mode inference for logic programs. The Journal of Logic Programming, 5(3):207 – 229, 1988.
  • [Ebe15] Manuel Eberl. A decision procedure for univariate real polynomials in Isabelle/HOL. In 2015 Conference on Certified Programs and Proofs, CPP ’15, pages 75–83. ACM, 2015.
  • [ELN13] Javier Esparza, Peter Lammich, René Neumann, Tobias Nipkow, Alexander Schimpf, and Jan-Georg Smaus. A fully verified executable LTL model checker. In Natasha Sharygina and Helmut Veith, editors, Computer Aided Verification - 25th International Conference, CAV 2013, volume 8044 of Lecture Notes in Computer Science, pages 463–478. Springer, 2013.
  • [FGJM85] Kokichi Futatsugi, Joseph A. Goguen, Jean-Pierre Jouannaud, and José Meseguer. Principles of OBJ2. In 12th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL), pages 52–66, New York, NY, USA, 1985. ACM.
  • [GAB17] Jürgen Giesl, Cornelius Aschermann, Marc Brockschmidt, Fabian Emmes, Florian Frohn, Carsten Fuhs, Jera Hensel, Carsten Otto, Martin Plücker, Peter Schneider-Kamp, Thomas Ströder, Stephanie Swiderski, and René Thiemann. Analyzing program termination and complexity automatically with AProVE. J. Autom. Reasoning, 58(1):3–31, 2017.
  • [GH98] David Griffioen and Marieke Huisman. A comparison of PVS and Isabelle/HOL. In Jim Grundy and Malcolm Newey, editors, Theorem Proving in Higher Order Logics: TPHOLs ’98, pages 123–142. Springer, 1998.
  • [GKMB17] Victor B. F. Gomes, Martin Kleppmann, Dominic P. Mulligan, and Alastair R. Beresford. Verifying strong eventual consistency in distributed systems. Proc. ACM Program. Lang., 1(OOPSLA):109:1–109:28, October 2017.
  • [GKN15] Adam Grabowski, Artur Korniłowicz, and Adam Naumowicz. Four decades of Mizar. Journal of Automated Reasoning, 55(3):191–198, Oct 2015.
  • [GM93] M. J. C. Gordon and T. F. Melham, editors. Introduction to HOL: A Theorem Proving Environment for Higher Order Logic. Cambridge University Press, 1993.
  • [GM10] Georges Gonthier and Assia Mahboubi. An introduction to small scale reflection in Coq. J. Formalized Reasoning, 3(2), 2010.
  • [GMW79] Michael J. C. Gordon, Robin Milner, and Christopher P. Wadsworth. Edinburgh LCF: A Mechanised Logic of Computation. LNCS 78. Springer, 1979.
  • [Gog79] Joseph A. Goguen. Some design principles and theory for OBJ-O, a language to express and execute algebraic specification for programs. In Edward K. Blum, Manfred Paul, and Satoru Takasu, editors, Mathematical Studies of Information Processing, volume 75 of LNCS, pages 425–473. Springer, 1979.
  • [Gor86] Michael J. C. Gordon. Why higher-order logic is a good formalism for specifying and verifying hardware. In G. Milne and P. A. Subrahmanyam, editors, Formal Aspects of VLSI Design, pages 153–177. North-Holland, 1986.
  • [Gor00] Michael J. C. Gordon. From LCF to HOL: a short history. In Gordon Plotkin, Colin Stirling, and Mads Tofte, editors, Proof, Language, and Interaction: Essays in Honor of Robin Milner, pages 169–185. MIT Press, 2000.
  • [Gor15] M. J. C. Gordon. Tactics for mechanized reasoning: A commentary on Milner (1984) ‘The use of machines to assist in rigorous proof’. Philosophical Transactions of the Royal Society. Series A, 373(2039), 2015.
  • [HAB17] Thomas Hales, Mark Adams, Gertrud Bauer, Tat Dat Dang, John Harrison, Le Truong Hoang, Cezary Kaliszyk, Victor Magron, Sean Mclaughlin, Tat Thang Nguyen, et al. A formal proof of the Kepler conjecture. Forum of Mathematics, Pi, 5:e2, 2017.
  • [Haf09] Florian Haftmann. Code Generation from Specifications in Higher Order Logic. PhD thesis, Technische Universität München, 2009.
  • [Har96] John Harrison. HOL Light: A tutorial introduction. In Mandayam K. Srivas and Albert John Camilleri, editors, Formal Methods in Computer-Aided Design: FMCAD ’96, LNCS 1166, pages 265–269. Springer, 1996.
  • [HHP93] Robert Harper, Furio Honsell, and Gordon Plotkin. A framework for defining logics. Journal of the ACM, 40(1):143–184, 1993.
  • [HIH13] Johannes Hölzl, Fabian Immler, and Brian Huffman. Type classes and filters for mathematical analysis in Isabelle/HOL. In S. Blazy, C. Paulin-Mohring, and D. Pichardie, editors, Interactive Theorem Proving — 4th International Conference, LNCS 7998, pages 279–294. Springer, 2013.
  • [HKKN13] Florian Haftmann, Alexander Krauss, Ondřej Kunčar, and Tobias Nipkow. Data refinement in Isabelle/HOL. In S. Blazy, C. Paulin-Mohring, and D. Pichardie, editors, Interactive Theorem Proving — 4th International Conference, volume 7998 of LNCS, pages 100–115. Springer, 2013.
  • [HKMS17] Warren A Hunt, Jr, Matt Kaufmann, J Strother Moore, and Anna Slobodova. Industrial hardware and software verification with ACL2. Philosophical transactions of the Royal Society. Series A, 375(2104), 2017.
  • [HN10] Florian Haftmann and Tobias Nipkow. Code generation via higher-order rewrite systems. In M. Blume, N. Kobayashi, and G. Vidal, editors, Functional and Logic Programming (FLOPS 2010), volume 6009 of LNCS, pages 103–117. Springer, 2010.
  • [HN18] Lars Hupel and Tobias Nipkow. A verified compiler from Isabelle/HOL to CakeML. In A. Ahmed, editor, European Symposium on Programming (ESOP 2018), volume 10801 of LNCS, pages 999–1026. Springer, 2018.
  • [HO82] Christoph M. Hoffmann and Michael J. O’Donnell. Programming with equations. ACM Trans. Program. Lang. Syst., 4(1):83–112, 1982.
  • [Hol97] Gerard J. Holzmann. The model checker SPIN. IEEE Trans. Software Eng., 23(5):279–295, 1997.
  • [Hue75] G. P. Huet. A unification algorithm for typed -calculus. Theoretical Computer Science, 1:27–57, 1975.
  • [HW06] Florian Haftmann and Makarius Wenzel. Constructive type classes in Isabelle. In Thorsten Altenkirch and Conor McBride, editors, Types for Proofs and Programs, volume 4502 of LNCS, pages 160–174. Springer, 2006.
  • [HW09] Florian Haftmann and Makarius Wenzel. Local theory specifications in Isabelle/Isar. In Stefano Berardi, Ferruccio Damiani, and Ugo de Liguoro, editors, Types for Proofs and Programs, TYPES 2008, volume 5497 of LNCS. Springer, 2009.
  • [Jac06] Daniel Jackson. Software Abstractions. Logic, Language, and Analysis. MIT Press, 2006.
  • [KAE10] Gerwin Klein, June Andronick, Kevin Elphinstone, Gernot Heiser, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, Thomas Sewell, Harvey Tuch, and Simon Winwood. sel4: Formal verification of an operating-system kernel. Communications of the ACM, 53(6):107–115, June 2010.
  • [Kal91] Sara Kalvala. HOL around the world. In M. Archer, J. J. Joyce, K. N. Levitt, and P. J. Windley, editors, International Workshop on the HOL Theorem Proving System and its Applications, pages 4–12. IEEE Computer Society, 1991.
  • [KMNO14] Ramana Kumar, Magnus O. Myreen, Michael Norrish, and Scott Owens. CakeML: a verified implementation of ML. In Suresh Jagannathan and Peter Sewell, editors, The 41st Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’14, pages 179–192. ACM, 2014.
  • [KP18] Ondřej Kunčar and Andrei Popescu. Safety and conservativity of definitions in HOL and Isabelle/HOL. PACMPL, 2(POPL):24:1–24:26, 2018.
  • [KP19] Ondřej Kunčar and Andrei Popescu. A consistent foundation for Isabelle/HOL. J. Automated Reasoning, 62(4):531–555, 2019.
  • [KWP99] Florian Kammüller, Markus Wenzel, and Lawrence C. Paulson. Locales: A sectioning concept for Isabelle. In Y. Bertot, G. Dowek, A. Hirschowitz, C. Paulin, and L. Thery, editors, Theorem Proving in Higher Order Logics: TPHOLs ’99, volume 1690 of LNCS. Springer, 1999.
  • [Lam13] Peter Lammich. Automatic data refinement. In Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie, editors, Interactive Theorem Proving - 4th International Conference, ITP 2013, volume 7998 of Lecture Notes in Computer Science, pages 84–99. Springer, 2013.
  • [Lam14] Peter Lammich. Verified efficient implementation of Gabow’s strongly connected component algorithm. In Gerwin Klein and Ruben Gamboa, editors, Interactive Theorem Proving — 5th International Conference, ITP 2014, volume 8558 of Lecture Notes in Computer Science, pages 325–340. Springer, 2014.
  • [Lam16] Peter Lammich. Refinement based verification of imperative data structures. In Jeremy Avigad and Adam Chlipala, editors, 5th ACM SIGPLAN Conference on Certified Programs and Proofs, pages 27–36. ACM, 2016.
  • [Lam17] Peter Lammich. Efficient verified (UN)SAT certificate checking. In Leonardo de Moura, editor, Automated Deduction — CADE-26, volume 10395 of Lecture Notes in Computer Science, pages 237–254. Springer, 2017.
  • [Lam19] Peter Lammich. Refinement to imperative HOL. J. Autom. Reasoning, 62(4):481–503, 2019.
  • [LB11] Andreas Lochbihler and Lukas Bulwahn. Animating the formalised semantics of a Java-like language. In Marko C. J. D. van Eekelen, Herman Geuvers, Julien Schmaltz, and Freek Wiedijk, editors, Interactive Theorem Proving - Second International Conference, ITP 2011, volume 6898 of Lecture Notes in Computer Science, pages 216–232. Springer, 2011.
  • [Ler09] Xavier Leroy. A formally verified compiler back-end. J. Automated Reasoning, 43:363–446, 2009.
  • [LPP19] Wenda Li, Grant Olney Passmore, and Lawrence C. Paulson. Deciding univariate polynomial problems using untrusted certificates in Isabelle/HOL. J. Autom. Reasoning, 62(1):69–91, 2019.
  • [LPY97] Kim Guldstrand Larsen, Paul Pettersson, and Wang Yi. UPPAAL in a nutshell. STTT, 1(1-2):134–152, 1997.
  • [LS19] Peter Lammich and S. Reza Sefidgar. Formalizing network flow algorithms: A refinement approach in isabelle/hol. J. Autom. Reasoning, 62(2):261–280, 2019.
  • [LT12] Peter Lammich and Thomas Tuerk. Applying data refinement for monadic programs to Hopcroft’s algorithm. In Lennart Beringer and Amy P. Felty, editors, Interactive Theorem Proving - Third International Conference, ITP 2012, volume 7406 of Lecture Notes in Computer Science, pages 166–182. Springer, 2012.
  • [Mil85] Robin Milner. The use of machines to assist in rigorous proof. In C. A. R. Hoare and J. C. Shepherdson, editors, Mathematical Logic and Programming Languages, pages 77–88. Prentice-Hall, 1985.
  • [Mil91] Dale Miller. A logic programming language with lambda-abstraction, function variables, and simple unification. J. Log. Comput., 1(4):497–536, 1991.
  • [Miz] The Mizar Mathematical Library.
  • [ML84] P. Martin-Löf. Constructive mathematics and computer programming. Philosophical Transactions of the Royal Society. Series A, 312(1522):501–518, 1984.
  • [MMW16] Daniel Matichuk, Toby C. Murray, and Makarius Wenzel. Eisbach: A proof method language for Isabelle. Journal of Automated Reasoning, 56(3), 2016.
  • [MQP06] Jia Meng, Claire Quigley, and Lawrence C. Paulson. Automation for interactive proof: First prototype. Information and Computation, 204(10):1575–1596, 2006.
  • [MW10] D. Matthews and M. Wenzel. Efficient parallel programming in Poly/ML and Isabelle/ML. In ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming (DAMP 2010), 2010.
  • [Nip91a] Tobias Nipkow. Higher-order critical pairs. In Proc. 6th IEEE Symp. Logic in Computer Science, pages 342–349. IEEE Press, 1991.
  • [Nip91b] Tobias Nipkow. Higher-order unification, polymorphism, and subsorts. In S. Kaplan and M. Okada, editors, Proc. 2nd Int. Workshop Conditional and Typed Rewriting Systems, volume 516 of LNCS. Springer, 1991.
  • [Nip93a] Tobias Nipkow. Functional unification of higher-order patterns. In Proc. 8th IEEE Symp. Logic in Computer Science, pages 64–74, 1993.
  • [Nip93b] Tobias Nipkow. Order-sorted polymorphism in Isabelle. In Gérard Huet and Gordon Plotkin, editors, Logical Environments, pages 164–188. Cambridge Uiversity Press, 1993.
  • [NK14] Tobias Nipkow and Gerwin Klein. Concrete Semantics with Isabelle/HOL. Springer, 2014. 298 pp.
  • [NM16] Julian Nagele and Aart Middeldorp. Certification of classical confluence results for left-linear term rewrite systems. In Jasmin Christian Blanchette and Stephan Merz, editors, Interactive Theorem Proving - 7th International Conference, ITP 2016, volume 9807 of Lecture Notes in Computer Science, pages 290–306. Springer, 2016.
  • [NO80] Greg Nelson and Derek C. Oppen. Fast decision procedures based on congruence closure. Journal of the ACM, 27(2):356–364, 1980.
  • [NP92] Tobias Nipkow and Lawrence C. Paulson. Isabelle-91. In Deepak Kapur, editor, Automated Deduction — CADE-11, volume 607 of LNCS, pages 673–676. Springer, 1992.
  • [NP93] Tobias Nipkow and Christian Prehofer. Type checking type classes. In Principles of Programming Languages, POPL ’93, pages 409–418, New York, NY, USA, 1993. ACM.
  • [NP98] Tobias Nipkow and Christian Prehofer. Higher-order rewriting and equational reasoning. In W. Bibel and P. Schmitt, editors, Automated Deduction — A Basis for Applications. Volume I: Foundations, volume 8 of Applied Logic Series, pages 399–430. Kluwer, 1998.
  • [NPW02] Tobias Nipkow, Lawrence C. Paulson, and Markus Wenzel. Isabelle/HOL: A Proof Assistant for Higher-Order Logic. Springer, 2002. Online at
  • [NS91] Tobias Nipkow and Gregor Snelting. Type classes and overloading resolution via order-sorted unification. In J. Hughes, editor, Proc. 5th ACM Conf. Functional Programming Languages and Computer Architecture, volume 523 of LNCS, pages 1–14. Springer, 1991.
  • [Obu06] Steven Obua. Checking conservativity of overloaded definitions in higher-order logic. In Frank Pfenning, editor, Term Rewriting and Applications, volume 4098 of LNCS, pages 212–226. Springer, 2006.
  • [O’D77] Michael J. O’Donnell. Computing in Systems Described by Equations, volume 58 of LNCS. Springer, 1977.
  • [Ove75] R. Overbeek. An implementation of hyper-resolution. Computers and Mathematics with Applications, 1:201–214, 1975.
  • [Owr06] Sam Owre. Random testing in PVS. In Workshop on Automated Formal Methods (AFM), 2006. Online at
  • [Pau86] Lawrence C. Paulson. Natural deduction as higher-order resolution. Journal of Logic Programming, 3:237–258, 1986.
  • [Pau89] Lawrence C. Paulson. The foundation of a generic theorem prover. J. Automated Resoning, 5(3):363–397, 1989.
  • [Pau90] Lawrence C. Paulson. Isabelle: The next 700 theorem provers. In P. Odifreddi, editor, Logic and Computer Science, pages 361–386. Academic Press, 1990.
  • [Pau93] Lawrence C. Paulson. Isabelle’s object-logics. Technical Report 286, Cambridge University Computer Laboratory, 1993.
  • [Pau94] Lawrence C. Paulson. Isabelle - A Generic Theorem Prover (with contributions by T. Nipkow), volume 828 of Lecture Notes in Computer Science. Springer, 1994.
  • [Pau98] Lawrence C. Paulson. The inductive approach to verifying cryptographic protocols. Journal of Computer Security, 6(1-2):85–128, 1998.
  • [Pau99] Lawrence C. Paulson. A generic tableau prover and its integration with Isabelle. Journal of Universal Computer Science, 5(3):73–87, 1999.
  • [Pau03] Lawrence C. Paulson. The relative consistency of the axiom of choice — mechanized using Isabelle/ZF. LMS Journal of Computation and Mathematics, 6:198–248, 2003.
  • [Pau04] Lawrence C. Paulson. Organizing numerical theories using axiomatic type classes. J. Automated Reasoning, 33(1):29–49, 2004.
  • [Pau18] Lawrence C. Paulson. Computational logic: Its origins and applications. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 474(2210), 2018.
  • [Pel96] Doron A. Peled. Combining partial order reductions with on-the-fly model-checking. Formal Methods in System Design, 8(1):39–64, 1996.
  • [PG96] Lawrence C. Paulson and Krzysztof Grabczewski. Mechanizing set theory: Cardinal arithmetic and the axiom of choice. J. Automated Reasoning, 17(3):291–323, December 1996.
  • [PHD15] Zoe Paraskevopoulou, Catalin Hritcu, Maxime Dénès, Leonidas Lampropoulos, and Benjamin C. Pierce. Foundational property-based testing. In Christian Urban and Xingyuan Zhang, editors, Interactive Theorem Proving, volume 9236 of LNCS, pages 325–343. Springer, 2015.
  • [PS07] Lawrence C. Paulson and Kong Woei Susanto. Source-level proof reconstruction for interactive theorem proving. In Klaus Schneider and Jens Brandt, editors, Theorem Proving in Higher Order Logics: TPHOLs 2007, LNCS 4732, pages 232–245. Springer, 2007.
  • [Raj93] Sreeranga P. Rajan. Executing HOL specifications: Towards an evaluation semantics for classical higher order logic. In Luc J. M. Claesen and Michael J. C. Gordon, editors, Higher Order Logic Theorem Proving and its Applications, volume A-20 of IFIP Transactions, pages 527–536. North-Holland/Elsevier, 1993.
  • [RNL08] Colin Runciman, Matthew Naylor, and Fredrik Lindblad. SmallCheck and lazy SmallCheck: Automatic exhaustive testing for small values. In Andy Gill, editor, Proc. 1st ACM SIGPLAN Symposium on Haskell, pages 37–48. ACM, 2008.
  • [Rob65] J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23–41, 1965.
  • [RV02] Alexandre Riazanov and Andrei Voronkov. The design and implementation of VAMPIRE. AI Communications, 15(2):91–110, 2002.
  • [Sch04] Stephan Schulz. System description: E 0.81. In David Basin and Michaël Rusinowitch, editors, Automated Reasoning — Second International Joint Conference, IJCAR 2004, LNAI 3097, pages 223–228. Springer, 2004.
  • [Sha02] Natarajan Shankar. Little engines of proof. In L.-H. Eriksson and P. Lindsay, editors, FME 2002: Formal Methods - Getting IT Right : International Symposium of Formal Methods Europe, LNCS 2391, pages 1–20. Springer, 2002.
  • [Sie19] Stephen F. Siegel. What’s wrong with on-the-fly partial order reduction. In I. Dillig and S. Tasiran, editors, Computer Aided Verification (CAV 2019), LNCS. Springer, 2019.
  • [SK07] Alexander Spiridonov and Sarfraz Khurshid. Automatic generation of counterexamples for ACL2 using Alloy. In Seventh Int. Workshop on the ACL2 Theorem prover and its Applications, 2007.
  • [SN08] Konrad Slind and Michael Norrish. A brief overview of HOL4. In Otmane Aït Mohamed, César Muñoz, and Sofiène Tahar, editors, Theorem Proving in Higher Order Logics, TPHOLs 2008, pages 28–32, 2008.
  • [Sum02] Rob Sumners. Checking ACL2 theorems via SAT checking. In Third Int. Workshop on the ACL2 Theorem prover and its Applications, 2002.
  • [TJ07] Emina Torlak and Daniel Jackson. Kodkod: A relational model finder. In Orna Grumberg and Michael Huth, editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 4424 of LNCS, pages 632–647. Springer, 2007.
  • [TS09] René Thiemann and Christian Sternagel. Certification of termination proofs using CeTA. In Stefan Berghofer, Tobias Nipkow, Christian Urban, and Makarius Wenzel, editors, Theorem Proving in Higher Order Logics, 22nd International Conference, TPHOLs 2009, volume 5674 of Lecture Notes in Computer Science, pages 452–468. Springer, 2009.
  • [Web05] Tjark Weber. Bounded model generation for Isabelle/HOL. In Wolfgang Ahrendt, Peter Baumgartner, Hans de Nivelle, Silvio Ranise, and Cesare Tinelli, editors, Selected Papers from the Workshops on Disproving and the Second International Workshop on Pragmatics of Decision Procedures (PDPAR 2004), volume 125(3) of Electronic Notes in Theoretical Computer Science, pages 103–116. Elsevier, July 2005.
  • [Web08] Tjark Weber. SAT-based finite model generation for higher-order logic. PhD thesis, Technical University Munich, Germany, 2008.
  • [Wei01] Christoph Weidenbach. Combining superposition, sorts and splitting. In Alan Robinson and Andrei Voronkov, editors, Handbook of Automated Reasoning, volume II, chapter 27, pages 1965–2013. Elsevier Science, 2001.
  • [Wen97] Markus Wenzel. Type classes and overloading in higher-order logic. In Theorem Proving in Higher Order Logics, volume 1275 of LNCS, pages 307–322. Springer, 1997.
  • [Wen07] Makarius Wenzel. Isabelle/Isar — a generic framework for human-readable proof documents. Studies in Logic, Grammar, and Rhetoric, 10(23):277–297, 2007. From Insight to Proof — Festschrift in Honour of Andrzej Trybulec.
  • [Wen11] M. Wenzel. Isabelle as document-oriented proof assistant. In J. H. Davenport et al., editors, Conference on Intelligent Computer Mathematics (CICM 2011), volume 6824 of LNAI. Springer, 2011.
  • [Wen13a] Makarius Wenzel. READ-EVAL-PRINT in parallel and asynchronous proof-checking. In Cezary Kaliszyk and Christoph Lüth, editors, User Interfaces for Theorem Provers (UITP 2012), volume 118 of Electronic Proceedings in Theoretical Computer Science, 2013.
  • [Wen13b] Makarius Wenzel. Shared-memory multiprocessing for interactive theorem proving. In Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie, editors, Interactive Theorem Proving (ITP 2013), volume 7998 of Lecture Notes in Computer Science. Springer, 2013.
  • [Wen14] Makarius Wenzel. Asynchronous user interaction and tool integration in Isabelle/PIDE. In Gerwin Klein and Ruben Gamboa, editors, Interactive Theorem Proving (ITP 2014), volume 8558 of LNCS. Springer, 2014.
  • [Wen19] Makarius Wenzel. Interaction with formal mathematical documents in Isabelle/PIDE. In Cezary Kaliszyk, Edwin Brady, Andrea Kohlhase, and Claudio Sacerdoti Coen, editors, Intelligent Computer Mathematics (CICM 2019), volume 11617 of LNAI. Springer, 2019.
  • [Wie01] Freek Wiedijk. Mizar Light for HOL Light. In Richard J. Boulton and Paul B. Jackson, editors, Theorem Proving in Higher Order Logics, TPHOLs 2001, pages 378–393, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
  • [WL18] Simon Wimmer and Peter Lammich. Verified model checking of timed automata. In Dirk Beyer and Marieke Huisman, editors, Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2018, volume 10805 of Lecture Notes in Computer Science, pages 61–78. Springer, 2018.
  • [Woo18] Charlie Wood. The strange numbers that birthed modern algebra. Online at, September 2018.