Towards Concise, Machine-discovered Proofs of Gödel's Two Incompleteness Theorems

05/06/2020 ∙ by Elijah Malaby, et al. ∙ 0

There is an increasing interest in applying recent advances in AI to automated reasoning, as it may provide useful heuristics in reasoning over formalisms in first-order, second-order, or even meta-logics. To facilitate this research, we present MATR, a new framework for automated theorem proving explicitly designed to easily adapt to unusual logics or integrate new reasoning processes. MATR is formalism-agnostic, highly modular, and programmer-friendly. We explain the high-level design of MATR as well as some details of its implementation. To demonstrate MATR's utility, we then describe a formalized metalogic suitable for proofs of Gödel's Incompleteness Theorems, and report on our progress using our metalogic in MATR to semi-autonomously generate proofs of both the First and Second Incompleteness Theorems.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


An emerging body of literature seeks to apply the recent advances of machine learning and deep networks to the field of automated theorem proving. For example, given a partially completed deductive proof, deciding which inference rules to apply might be a task that modern AI is particularly well-suited to

[Wang et al.2017, Piotrowski and Urban2019, Kaliszyk et al.2018, Lederman, Rabe, and Seshia2018, Kaliszyk, Chollet, and Szegedy2017, Alemi et al.2016]. Improved decision-making heuristics in automated reasoning are especially important in proofs using non-classical formalisms, such as second-, higher-, or meta-logics. Such logics can sometimes allow for the expression of complex proofs in far fewer steps than might be required in a first-order logic [Buss1994, Smith2007]. However, this increased expressive power also considerably expands the search space of any proof done in such logics, mandating the need for said improved heuristics.

However, a platform to easily experiment with applying AI to a plurality of logical formalisms does not exist; at least not in a way that jointly satisfies desiderata that we will state shortly. In this paper, we will describe our progress in addressing these goals by presenting MATR (Machine Accountability through Traceable Reasoning), a new automated reasoning framework. As a proof-of-concept, we introduce a metalogic capable of expressing proofs of Gödel’s Incompleteness Theorems, and show how MATR can be used as a platform for developing AI systems capable of discovering and reasoning over such proofs.

MATR is based on the following design principles:

P1. The underlying control system should be as formalism-agnostic as possible. MATR began as an in-house tool to very quickly test the formal representations and inference rules related to variants of the Cognitive Event Calculus [Arkoudas and Bringsjord2009, Bringsjord and Govindarajulu2013, Licato et al.2014, Bringsjord et al.2014, Bringsjord et al.2015], whose visual style was inspired by the diagrammatic, flowchart-like aesthetic of Slate [Bringsjord et al.2008] and the indented subproofs of Fitch-style natural deduction [Barker-Plummer, Barwise, and Etchemendy2011]. Instead of creating an automated theorem prover from scratch for each new formalism, it was decided that a more flexible framework with easily interchangeable parts would be a better long-term strategy.

P2. Semantics should be contained in the codelets and other interchangeable parts. Any actions requiring semantic understanding of the contents of the nodes in MATR should be contained in one of MATR’s interchangeable parts, preferably its codelets. Codelets are independently operating programs which perform the bulk of the work in MATR, and are described more later in this section. Syntax checking, carrying out inference rules, recording type information, and even knowing whether a proof is completed are tasks delegated to individual codelets. This is also meant to enable rapid deployment and testing of nontraditional logics (e.g. higher-order, modal, inductive, informal, etc.). One trade-off of this flexibility is that it is entirely possible for a set of codelets to be mutually incompatible. Accordingly, MATR also allows for pre-built configurations to be loaded in the form of a YAML text file. For example, if one wishes to use MATR as a natural deduction reasoner for standard first-order Peano Arithmetic, such a configuration will already be available to load.

P3. Codelets must be programmer-friendly, allowing for easy implementation and changes of inference rules without modifying the core. Although much of MATR’s code is written in Clojure, its codelets can be written in virtually any language (most are in Python).

P4. Control decisions and optimization strategies should be as interchangeable as possible.

Another motivator for MATR is as a platform for applying advances in Artificial Intelligence and Machine Learning to Automated Reasoning. For example, given partially completed proofs, a fruitful research question may be how to select which codelets to run, and over which existing formula nodes, in order to optimally complete the proofs. MATR encourages this by making two components interchangeable: the

codelet chooser, which selects which codelets to execute and the order to execute them each iteration; and the recommendation resolver, which decides which codelet recommentations to actually apply to the proof.

P5. The front-end should be modular to satisfy a variety of use cases. There are at least two primary use cases of MATR’s user interface: for students relatively new to automated reasoning, and for researchers who may want to use MATR as part of a larger AI system. For the former case, we have an interactive graphical user interface; this is the source of the figures in this paper. For the latter, MATR’s back-end can communicate with a lightweight command-line interface or interact directly with programs through an API.

The Proof Space

Figure 1: Example simple proof that follows from . Formula nodes are ovals, while inferences are rectangles. Nodes start out as light blue and are colored green as they receive the checked flag. The axioms and goals of each box receive a border, colored gold or blue (resp.).

MATR’s basic organizational unit is called the box. A box may contain a set of formula nodes, inference nodes, other boxes, and connections between nodes or boxes. If a box contains a node or box , then we say that has as a parent box. All elements in MATR have exactly one parent box, with one exception: the top-level box. It is also referred to as the proof space or root box and is the starting point of all MATR proofs.

Formula nodes contain formula objects, which are typically S-expression strings. Inference nodes are used to link a set of formula nodes (the premise node) to another formula node (the conclusion node), where . Inference nodes contain information about the nature of the inference from to , typically a string corresponding to the name of the inference rule, metadata about which codelet suggested the inference, and other information of use to codelets. Thus, directed edges connect formula nodes to inference nodes, and inference nodes to formula nodes, but never directly connect two nodes of the same type.

Formula nodes can also be axioms or goals of their parent box. If is an axiom of box , then it is an axiom of any box which has as a parent box. Formula nodes may also contain metadata such as the binary checked flag, which denotes whether a formula node follows from the axioms of its parent box. Formula nodes which are axioms of their parent boxes automatically have checked flags, and a special codelet propagates checked flags by looking at inference nodes: if, for inference node , (1) all of ’s premise nodes have checked flags, and (2) is marked as a deductive111Although not used in this paper, inference nodes can correspond to non-deductive inferences, in order to open the door to proof-solving heuristics and proof tactics. inference, then ’s conclusion node is given a checked flag.

Core Components

The most important component of MATR’s design is the codelet, an individual piece of specialized code whose job is to analyze the current proof space, and make recommendations about how to modify the proof space (typically the addition of inference and formula nodes). In keeping with the design principles listed earlier, codelets can be written in virtually any programming language, are easily interchangeable, and are principally responsible for both defining and behaving in accordance with the intended semantics of the proof state. The term ‘codelet’ is borrowed from the Copycat model of analogical reasoning [Hofstadter1984, Mitchell1993, Hofstadter and Mitchell1995], and like Copycat codelets, our codelets should be thought of as independently operating “worker ants” which each specialize in searching for a unique set of features in the proof space (or in a local subset of the proof space), and making recommendations based on their findings.

Most codelets directly correspond to inference rules; e.g., the Modus Ponens codelet simply looks for formulae pairs of the form , and recommends adding a “Modus Ponens” inference node and a formula node. Other codelets are responsible for background or support tasks, such as propagating checked flags, or ensuring that every formula node contains syntactically valid formulae. If the application calls for it, codelets may even contain entire external programs or automated theorem provers; e.g., we make use of a full first-order resolution codelet later in this paper.

MATR’s codelets typically operate either in a forward or backward mode. Forward-operating codelets look for sets of formulae that jointly satisfy some inference schema’s premises, and suggest the addition of a conclusion, as in the Modus Ponens example earlier. It is beneficial for such codelets to restrict its search to formulae that have checked flags, so that any new formula nodes added will also have checked flags propagated to them. Backwards operating codelets work in the other direction, adding formulae that lead to nodes which have yet to be checked. Allowing some codelets to operate unconstrained could result in a potentially infinite number of recommendations per iteration, such as a forward operating Disjunction Introduction codelet. This explosion is managed by heuristics in the codelet chooser and recommendation resolver.

The MATR core (sometimes referred to as the back-end) is the fixed central unit that coordinates all of MATR’s interchangeable parts. In its simplest form, it merely serves as the conduit through which the components of MATR communicate and maintains the repository of shared proof state. Currently, core presents a REST API which wraps a datascript graph database that holds all of the proof state. Core also provides provides a minimal Codelet Chooser and Recommendation Resolver. The former can be configured to trigger codelets based on the result of queries against the core database, while the latter is mostly focused on sanity checking (e.g. preventing the addition of duplicate nodes and boxes).

The front-end is the primary way to interact with MATR as a whole. It wraps some of core’s API into a graphical user interface that allows the user to upload a configuration to core, set up the axioms and goals of the proof space, trigger the execution of codelets, and view the resulting proof.

Codelet Servers act as hosts for the codelets provided by core. They present REST endpoints for the codelet chooser to send messages to trigger the execution of particular codelets. The codelets can then make additional queries against core, perform logical inferences, and the codelet server responds to the request with actions for the recommendation resolver.

Discovering the Proofs

Figure 2: MATR-discovered proof of both incompleteness theorems, after pruning and rearranging nodes to better make use of space

A Formalized Metalogic

As a demonstration and test of MATR’s proof-finding capabilities, we set out to generate a proof of Gödel’s First and Second Incompleteness Theorems, given a proof sketch. Verifying proofs of these theorems are often used as a sort of stress test for an automated reasoner’s formalism, and the the ability to automatically discover these proofs has been claimed by many with various degrees of success [O’Connor2005, Sieg and Field2005, Licato et al.2013, Paulson2014, Paulson2015]. All existing machine-discovered proofs of the incompleteness theorems rely on carefully selected symbols and inference rules. An ongoing goal of such work is to discover proofs of the incompleteness theorems with increasingly minimal assumptions and custom tailoring of the prover’s starting conditions. Here, our contribution does so with a formalism and automated reasoner robust enough to not only find concise proofs of both incompleteness theorems, but of similar theorems in the metalogical space as well.

Our formalizations of the incompleteness proofs are loosely based on the versions and terminology described by [Smith2007]. Let us assume a Gödelian numbering scheme that assigns a unique integer to all formulae in first-order Peano Arithmetic (PA), using the normal conventions. Then for any well-formed-formula (wff) in PA, denotes its corresponding Gödel number. Slightly abusing notation for convenience, if is a proof in PA, denotes its corresponding encoding (called a Super Gödel number). Given an integer in , determining whether encodes a PA wff, a well-formed proof in PA, or neither, is decidable and captured in PA itself [Smith2007].

is shorthand for the wff formula in PA which expresses (but does not capture) the numerical property “ corresponds to the Gödel number of a formula that is a theorem of PA.” This predicate also satisfies the Hilbert-Bernays provability conditions (Equations 2 - 4 below). is shorthand for the PA wff which is provable if and only if encodes a PA wff which is the negated form of the PA wff that encodes. Because this formula trivially captures a primitive recursive property, it is captured in PA [Smith2007] and thus it is an axiom that for all PA wffs , PA . Consistency in PA is expressed by the formula :


Often, explanations of the incompleteness proofs will use high-level natural language which appears to quantify over formulae—e.g., “there exists a formula such that has a certain property.” This, we believe, reflects the natural relative ease of reasoning about formula as discrete objects which can have properties, at least when working through the proofs of the incompleteness theorems. Our formalization of the incompleteness proofs thus uses a first-order modal typed metalogic, to more directly reflect the kinds of statements used in explanations of the proofs. The objects of our metalogic are the same as in , with the addition of two types: Proof theory and formula symbols. Proof theory symbols enable metalogical formulae that compare and express the properties of entire proof theories, but for this paper we only use one: . Formula symbols are objects corresponding to PA wff. Since the metalogical version of the universal and existential operators can quantify over formula symbols, the metalogic allows us to write formulae such as Equations 2 through 7 below.

More precisely, a metalogic wff is either: (1) a single formula symbol, (2) a PA wff, or (3) if and are metalogic wffs, then metalogic wffs include: , , and so on using the normal rules of logical operators. All operators in first-order PA have analogs in the metalogic. Only one additional operator is added: The provability operator is written , where is a proof theory, and is a metalogic wff. It should be read, “the proof theory has the formula corresponding to as one of its theorems.” Unlike , which is a PA wff, is a metalogic wff. Furthermore, and are provided as axioms, but the converse are not.

Given this notation, we can introduce the three Hilbert-Bernays provability conditions as axioms as well:


Primarily as a matter of notational convenience, we define an additional definition of consistency :


Note that is a metalogic wff, whereas is a PA wff. also draws on a notation which emphasizes as an object that can be replaced with other formal theories.222By this sort of design we intend, in the future, for MATR to reason about metalogical properties of multiple formal theories within the same proof. In any case, and can be derived from each other in our present proof space, and only the latter is used as an axiom. Thus, we can concisely formalize Gödel’s two Incompleteness Theorems as follows:



To establish a basis for the proof, two initial setup codelets populate the proof space with several intermediate nodes and boxes to act as the foundation for later codelets. By statically introducing these intermediate nodes and boxes we guide the proof search process in a way which avoids combinatorial explosion. While the front end could be used to introduce each of these boxes and nodes by hand, using setup codelets simplified the process of re-running the entire proof while testing. In the future we hope to use heuristics to completely automate this as well. In our proof sketch we used three subproofs, which are created in this step. They are handled in setup due to the need for intermediate nodes to be placed in those subproofs as well. Additionally, the existential elimination acting on one of the axioms occurs in the setup so that the symbol chosen to take place of the quantifier is fixed to be the same as those used in the intermediate nodes.

In order to simplify the proof space, we employ four types of resolution codelets; two for the metalogic, and two for first-order logic. The two “” codelets simply search for formulae nodes with checked flags, and either a formula without a checked flag or a formula. It then attempts to show that, subject to timeout and memory limitations, follows from using first-order resolution without equality. Likewise, the “” codelets search for formula nodes with checked flags and uses resolution to prove whether a formula or without a checked flag follows. Although these codelets are quite powerful (and potentially combinatorially explosive), they do not operate in a purely forward mode, which severely limits what can be practically derived. But neither can either of these codelets complete the entire proof on their own. Consider the three Hilbert-Bernays Provability Conditions: After -Elimination, Equations 3 and 4 produce subformulae that can be reasoned over by , but Equation 2 does not—it requires inferences at the metalogical level. Conceivably, one might simply strengthen to subsume what does, but this explodes its average run time, removes MATR’s ability to introduce smarter heuristics, and would prevent us from using a general purpose first order resolution implementation.

In order to ensure proper connection between the nodes introduced by the resolution codelet and the nodes that already exist outside of the subproof, a reiteration codelet connects the node outside of the subproof to the corresponding node inside the subproof. This allows for propagation of checks into the subproof, so that resolution can truly demonstrate that it has proven a particular formula with accordance to information both inside and outside of the subproof.

We also implemented a dedicated codelet for Hilbert-Bernays 1 (2) instead of providing it as an axiom of the root box. In testing, we discovered that providing HB1 as an axiom lead to an explosion of the number of necessary clauses to process in the resolution procedure. More specifically, the proof of Con(PA) with resolution explodes from 97 clauses to not resolving with an upper bound of 2000 clauses. The HB1 codelet works purely in a backwards mode, adding inferences for nodes of the form and connecting to (or creating) a corresponding node of the form .

Once the proof is completed, a pruning process identifies (and displays) the shortest proofs of each goal node in the root box. Figure 2 shows this pruned proof. Pruning is accomplished in two steps. The first propagates forward from the axioms of every box to count the number of justifications in the shortest proof of any reachable and checked node. Given these proof size counts, a backwards traversal is performed from each goal of the root box to extract the nodes and boxes used in the shortest proof of that goal.

With the provided intermediate nodes, the proof takes four iterations to complete. Interestingly, the proof found by MATR is actually shorter than the proof sketch we used to select our intermediate nodes and high level proof structure. Most surprising to us was MATR using part of the proof of the first incompleteness theorem in its proof of the second, short circuiting three intermediate nodes.

Conclusion and Future Work

With MATR, we sought to satisfy the design criteria P1 - P5 by designing a system that operates fluidly over various logical systems with a series of interchangeable components. MATR allows the use of various heuristics to verify or discover proofs with otherwise-explosive proof spaces. To demonstrate the potential of this system, we verified and simplified proofs of Gödel’s Incompleteness Theorems.

We see a few directions to take MATR going forward. We have already made progress automating the generation of intermediate nodes in the incompleteness theorem proofs, but we are still exploring the heuristics needed to minimize explosion of the proof space. Remarkably, MATR discovered a shorter proof than we told it to find, which we consider a strong indicator of the potential for work in this area. As MATR is designed to reason in higher order logics, it may be immensely valuable in automatically discovering shorter proofs of already known theorems. The full source code for MATR and what is necessary to replicate this proof will be available upon publication.


MATR owes its existence to work started at the Rensselaer AI and Reasoning (RAIR) Lab and contributed to by many students and collaborators since then—too many to reasonably name here. We are extremely grateful to the support and contributions of them all.

This material is based upon work supported by the Air Force Office of Scientific Research under award numbers FA9550-17-1-0191 and FA9550-18-1-0052. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the United States Air Force.


  • [Alemi et al.2016] Alemi, A. A.; Chollet, F.; Irving, G.; Szegedy, C.; and Urban, J. 2016. Deepmath - deep sequence models for premise selection. CoRR abs/1606.04442.
  • [Arkoudas and Bringsjord2009] Arkoudas, K., and Bringsjord, S. 2009. Propositional Attitudes and Causation. International Journal of Software and Informatics 3(1):47–65.
  • [Barker-Plummer, Barwise, and Etchemendy2011] Barker-Plummer, D.; Barwise, J.; and Etchemendy, J. 2011. Language Proof and Logic. Center for the Study of Language and Inf, 2 edition.
  • [Bringsjord and Govindarajulu2013] Bringsjord, S., and Govindarajulu, N. S. 2013. Toward a Modern Geography of Minds, Machines, and Math. Philosophy and Theory of Artificial Intelligence 5:151–165.
  • [Bringsjord et al.2008] Bringsjord, S.; Taylor, J.; Shilliday, A.; Clark, M.; and Arkoudas, K. 2008. Slate: An Argument-Centered Intelligent Assistant to Human Reasoners. In Grasso, F.; Green, N.; Kibble, R.; and Reed, C., eds., Proceedings of the 8th International Workshop on Computational Models of Natural Argument (CMNA 8).
  • [Bringsjord et al.2014] Bringsjord, S.; Govindarajulu, N. S.; Ellis, S.; McCarty, E.; and Licato, J. 2014. Nuclear Deterrence and the Logic of Deliberative Mindreading. Cognitive Systems Research 28:20–43.
  • [Bringsjord et al.2015] Bringsjord, S.; Govindarajulu, N. S.; Licato, J.; Sen, A.; Johnson, J.; Bringsjord, A.; and Taylor, J. 2015. On Logicist Agent-Based Economics. In Proceedings of Artificial Economics 2015 (AE 2015). Porto, Portugal: University of Porto.
  • [Buss1994] Buss, S. R. 1994. On gödel’s theorems on lengths of proofs i: Number of lines and speedup for arithmetics. Journal of Symbolic Logic 59(3):737–756.
  • [Hofstadter and Mitchell1995] Hofstadter, D. R., and Mitchell, M. 1995. The Copycat Project: A Model of Mental Fluidity and Analogy-making. In Hofstadter, D. R., and the Fluid Analogies Research Group., eds., Fluid Concepts and Creative Analogies: Computer Models of the Fundamental Mechanisms of Thought. Basic Books, Inc.
  • [Hofstadter1984] Hofstadter, D. R. 1984. The Copycat Project: An Experiment in Nondeterminism and Creative Analogies. Technical report, Massachusetts Institute of Technology, Massachusetts Institute of Technology.
  • [Kaliszyk et al.2018] Kaliszyk, C.; Urban, J.; Michalewski, H.; and Olsák, M. 2018. Reinforcement learning of theorem proving. CoRR abs/1805.07563.
  • [Kaliszyk, Chollet, and Szegedy2017] Kaliszyk, C.; Chollet, F.; and Szegedy, C. 2017. Holstep: A machine learning dataset for higher-order logic theorem proving. In Proceedings of ICLR 2017.
  • [Lederman, Rabe, and Seshia2018] Lederman, G.; Rabe, M. N.; and Seshia, S. A. 2018. Learning heuristics for automated reasoning through deep reinforcement learning. CoRR abs/1807.08058.
  • [Licato et al.2013] Licato, J.; Govindarajulu, N. S.; Bringsjord, S.; Pomeranz, M.; and Gittelson, L. 2013. Analogico-Deductive Generation of Gödel’s First Incompleteness Theorem from the Liar Paradox. Proceedings of the 23rd Annual International Joint Conference on Artificial Intelligence (IJCAI-13).
  • [Licato et al.2014] Licato, J.; Bringsjord, S.; Atkin, K.; Borkowski, M.; Cusick, J.; Eastlack, K.; Marton, N.; Pane-Joyce, J.; and Whitehead, S. 2014. ‘The Brilliant Boardroom’: Cognitive Computing with the DCEC* and ADR. In Proceedings of the IBM Research Cognitive Systems Colloquium.
  • [Mitchell1993] Mitchell, M. 1993. Analogy-Making as Perception : A Computer Model. Cambridge, Massachusetts: The MIT Press.
  • [O’Connor2005] O’Connor, R. 2005. Essential incompleteness of arithmetic verified by coq. CoRR abs/cs/0505034.
  • [Paulson2014] Paulson, L. C. 2014. A machine-assisted proof of Gödel’s incompleteness theorems for the theory of hereditarily finite sets. The Review of Symbolic Logic 7(3):484–498.
  • [Paulson2015] Paulson, L. C. 2015. A mechanised proof of Gödel’s incompleteness theorems using nominal Isabelle. Journal of Automated Reasoning 55(1):1–37.
  • [Piotrowski and Urban2019] Piotrowski, B., and Urban, J. 2019. Guiding theorem proving by recurrent neural networks. CoRR abs/1905.07961.
  • [Sieg and Field2005] Sieg, W., and Field, C. 2005. Automated Search for Gödel’s Proofs. Annals of Pure and Applied Logic 133(1-3):319–338.
  • [Smith2007] Smith, P. 2007. An Introduction to Gödel’s Theorems. Cambridge, UK: Cambridge University Press.
  • [Wang et al.2017] Wang, M.; Tang, Y.; Wang, J.; and Deng, J. 2017. Premise Selection for Theorem Proving by Deep Graph Embedding. In Guyon, I.; Luxburg, U.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; and Garnett, R., eds., Proceedings from Neural Information Processing Systems (NIPS) 2017.