Explosive Proofs of Mathematical Truths

by   Scott Viteri, et al.
Carnegie Mellon University

Mathematical proofs are both paradigms of certainty and some of the most explicitly-justified arguments that we have in the cultural record. Their very explicitness, however, leads to a paradox, because their probability of error grows exponentially as the argument expands. Here we show that under a cognitively-plausible belief formation mechanism that combines deductive and abductive reasoning, mathematical arguments can undergo what we call an epistemic phase transition: a dramatic and rapidly-propagating jump from uncertainty to near-complete confidence at reasonable levels of claim-to-claim error rates. To show this, we analyze an unusual dataset of forty-eight machine-aided proofs from the formalized reasoning system Coq, including major theorems ranging from ancient to 21st Century mathematics, along with four hand-constructed cases from Euclid, Apollonius, Spinoza, and Andrew Wiles. Our results bear both on recent work in the history and philosophy of mathematics, and on a question, basic to cognitive science, of how we form beliefs, and justify them to others.


Adventures in Mathematical Reasoning

"Mathematics is not a careful march down a well-cleared highway, but a j...

Plausibility and probability in deductive reasoning

We consider the problem of rational uncertainty about unproven mathemati...

Argumentation theory for mathematical argument

To adequately model mathematical arguments the analyst must be able to r...

Probabilistic Argumentation with Epistemic Extensions and Incomplete Information

Abstract argumentation offers an appealing way of representing and evalu...

A Probabilistic Reasoning Environment

A framework is presented for a computational theory of probabilistic arg...

The Mathematics of Benford's Law – A Primer

This article provides a concise overview of the main mathematical theory...

Modelling High-Level Mathematical Reasoning in Mechanised Declarative Proofs

Mathematical proofs can be mechanised using proof assistants to eliminat...

1 Model

Our model of belief formation is based in two core features of proofs. First, that while proofs are usually presented in a linear narrative, most will refer back to the same claims at different places. This turns a linear deduction into a network of interacting claims. A proof that combines two independent lines of reasoning at a particular point may be robust to counterexamples that invalidate one of those paths (robust): in this way, an failure rate may be improved, by orders of magnitude, to . As von Neumann established in the case of faulty computer logic gates (von1956probabilistic), multiple paths in a modular organization can overcome noise.

Second, that reasoning on the basis of coherence, intuition, or analogy can also support evidentiary flow “down” from conclusions as well as “up”, deductively, from axioms (maddy1988believing, corfield). A proof that  (rw) helps establish the validity of the axioms and propositions that precede it, rather than resolving lingering doubts about elementary school arithmetic. A wide consensus on the truth of the Clay Institute’s Millennium Prize Problems guides the mathematician’s attempts to solve them, providing at least provisional support to supporting claims (villani).

Pólya’s Patterns of Plausible Inference is a famous argument for the importance of this downward direction. It corresponds to Peircean abduction (pei, eco1988sign), an epistemic process that now plays a central role in the study of contemporary mathematical practice (zalamea). Abduction can, of course, be just as fallible as deduction, as demonstrated by long-standing gaps in proofs of famous theorems such as the Euler characteristic (lakatos2015proofs), where the intuitive truth of steps in the proof leads one to neglect flaws in more basic claims, and the historically unexpected conclusion of Gödel’s Incompleteness Theorem.

On this basis, we model mathematical belief-formation as the navigation of a network of claims where evidentiary support includes multiple, potentially bi-directional, pathways. Such networks are also the basic structures for coherence theories of belief formation in cognitive science (thagard1989explanatory, nont, jdm). We emphasise that models of this form capture the fallible, real-world process of reasoning, not the parallel normative process; the same network coherence perspective can model both (objectively) true and false beliefs; see, e.g., Ref. (rosenberg2006multiple).

To specify a general model of belief formation on these networks, we invoke three requirements. First, we require that the degree of belief in any particular claim is (all other things being equal) conditional solely on its dependencies (the claims that support it in a deductive fashion), and those that depend on it in turn; we allow for differing strengths of these respectively deductive and abductive pathways. Second, following standard models in Bayesian cognition, we require this dependency be additive in log-space. Finally, we require that the model is otherwise unconstrained in the patterns of beliefs that can be formed. Taken together, these requirements lead to a minimal (billbirds) and unique “maximum entropy” model (see Materials and Methods), corresponding to a probabilistic version of constraint satisfaction (tkavcik2013simplest, THAGARD19981). (Such a model also provides a framework for extensions to include more complex, synergistic effects; for brevity, we do not consider them here.)

Despite its simplicity, this model makes three predictions directly relevant to the question at hand. First that, under certain conditions, the very things that lead to the reliability problem can become a virtue. The physical analogues of our model are known to undergo phase transitions, where small changes in a control parameter can lead to (approximately) discontinuous shifts in global properties. Here, in the cognitive domain, the control parameter corresponds to the local degree of dependency (i.e., how good the reasoner is, or thinks he is, about drawing the correct conclusion), while the global property is the average degree of belief in a claim.

The existence of such transitions is sensitive to network structure: they can not happen, for example, for a linear network, nor indeed for any tree-like network of finite ramification (gefen1984phase). However, when the topological demands are met, justification becomes easier, not harder, as the network size, , increases, with—depending on the structure of the network—a sharp transition to total deductive certainty in the limit of large .

As mentioned above, this is due to the emergence of multiple paths between any two claims. On the one hand, error compounds exponentially along any particular path, as it does in any linear chain. On the other, the number of distinct paths between points can, depending on network structure, grow exponentially (dedeo2012dynamics). At a critical point, the exponential decay is balanced by the exponential growth, and influence can propagate undiminished across the entire network (stanley).

EPTs are a double-edged sword, however, because disbelief can propagate just as easily as truth. A second prediction of the model is that this difficulty—the explosive spread of skepticism—can be ameliorated by when the proof is made of modules: groups of claims that are significantly more tightly linked to each other than to the rest of the network.

When modular structure is present, the certainty of any claim within a cluster is reasonably isolated from the failure of nodes outside that cluster. This means that a reasoner can come to believe sub-parts of the overall network without needing to resolve the entire system at once. Such a mechanism is has been hypothesised to exist in the case of mathematical proofs (avigad2018modularity).

Modules can be identified by standard clustering algorithms such as Girvan-Newman (newman2004finding). The subsequent robustness can be tested by comparing the relative difficulty of forming a belief within a module compared to that of forming a belief in an arbitary collection of nodes (see Materials and Methods).

A third prediction of the model concerns the balance of deductive and abductive reasoning. Informally, one would imagine that increasing confidence in either process would aid the overall confidence in the proof: at a given level of deductive confidence (say), I can only become more certain by increasing confidence in my abductive intuitions.

This turns out not to be the case: for a fixed level of deductive confidence, increasing abductive confidence can lead to lower levels of certainty, and similarly in reverse. This is because, at a critical point, abduction can come to dominate deduction completely, leading to solely downward propagation and cutting off the proliferation of paths necessary for the EPT. This destroys a key topological features necessary for a phase transition: there are now fewer paths of influence between nodes and, for example, theorem “siblings” can no longer re-enforce each other.

We refer to this as the abductive paradox. Informally, the proof becomes dependent solely on the mathematician’s belief in the conclusion: doubts propagate downwards, and even the best axioms are powerless to overcome them.

2 Data

In order to determine if real-world mathematical theorems have the necessary properties to trigger the epistemic effects described in the previous section, we analyze two datasets. First, forty-eight machine-assisted proofs constructed by mathematicians with the aid of the formal verification system Coq (plugin), ranging from the Pythagorean Theorem to the Four-Color Theorem; see Table 1.

Proofs in a formal verification system are constructed by a mathematician who then invokes machine-implemented heuristics to fill in the gaps. The proofs themselves are interpretable, if exceedingly pendantic; see Materials and Methods for a proof that the number four is even. We extract the abstract syntax trees representing the underlying deductions, and then identify of equivalent claims which turns these trees into directed acyclic graphs.

This dataset is supplemented with four “human” proofs: the original texts of Euclid’s Geometry, Apollonius’ Conics, and Spinoza’s Ethics (all of which mark explicit dependencies), and a hand-coded network based on a close-reading of Andrew Wiles’ 1995 proof of Fermat’s Last Theorem (wiles1995modular). Although human networks are orders of magnitude smaller than those supplemented with the fine-grained deductions of a machine, the comparison allows us to find, and confirm, the similarities between the two.

The simultaneous examination of both machine- and human-proofs provides an important check on our claims on the epistemic status of mathematical knowledge. Human mathematicians may plausibly have introspective access to the ways in which they come to believe something, but there is no guarantee that they match the actual reasoning process itself. Machines, through expansions that are orders of magnitude larger than the self-reported steps in the corresponding human case, make visible what is idealised and implicit in human communication. Despite the vast technological gulf between them, the machine-aided proofs of the twenty-first century share, as we shall see, the same basic epistemic properties as Euclid’s.

3 Results

We present our results in three sections. First, the topological properties of the proof networks; second, the emergence of epistemic phase transitions and the existence of modular firewalls; and finally, the abductive paradox.

3.1 Network Structure

Figure 1: Implication structure for four proof networks in our database. Clockwise from top left: the Four Color Theorem (Coq), the uncountability of the Reals (Coq), Gödel’s First Incompleteness Theorem (Coq), and Euclid’s Geometry (original Greek Text). We color the top clusters by membership, and size nodes according to out-degree (i.e., the number of nodes that have that node as a deductive pre-requisite). Both human and machine-aided proofs are characterized by high levels of modularity, and a heavy-tailed distribution of out-degree.
Figure 2: Distribution of in- (/dashed fit) and out-degrees (

/solid fit) for nodes in the automated proof of the Four Color Theorem and for Godel’s First Incompleteness Theorem. While any node depends on a small number of others, following an exponential distribution, the usage of a node in further claims follows a heavy-tailed distribution, with power-law index

around two.
Theorem Nodes
Euclid’s Geometry 174,597
1st Gödel Incompleteness 28,984
Bertrand’s Ballot 24,137
Polyhedron Formula 23,750
Euler’s FLiT 22,444
Bertrand’s Postulate 22,434
F.T. Algebra 20,431
Subsets of a Set 20,205
Pythagorean Theorem 18,230
Desargues’s Theorem 18,213
Taylor’s Theorem 17,809
Heron’s Formula 17,487
F.T. Calculus 16,845
Binomial Theorem 16,314
Geometric Series 16,160
Wilson’s Theorem 16,120
Sylow’s Theorem 15,942
Ceva’s Theorem 15,279
Bezout’s Theorem 14,909
Reals Uncountable 14,574
Int. Value Theorem 14,467
Quadratic Reciprocity 14,397
Triangle Inequality 13,657
Leibniz 13,619
Pythagorean Triples 13,254
Rationals Denumerable 13,108
Isosceles Triangle 13,055
Div 3 Rule 13,037
Inclusion-Exclusion 12,886
Cauchy-Schwarz 12,647
Four Color Theorem 12,407
Factor & Remainder 11,815
Birthday Paradox 11,692
Liouville’s Theorem 11,645
Cayley-Hamilton 11,407
F.T. Arithmetic 11,362
Cubic Solution 11,271
GCD Algorithm 10,792
Cramer’s Rule 10,613
Subgroup Order 10,583
Mean Value Theorem 10,168
Ramsey’s Theorem 7,747
Schroeder-Bernstein 1,331
Triangle Angles 739
Powerset Theorem 282
Prime Squares 250
Pascal’s Hexagon 150
Induction Principle 40 n.d.
Euclid’s Geometry 475
Apollonius’s Conics 446
Spinoza’s Ethics 572
Wiles’s FLT 142
Table 1: The statistics of dependence and implication in machine- and human-proved theorems (F.T.: “Fundamental Theorem”; FLiT: “Fermat’s Little Theorem”). Both are characterized by high levels of modularity, and a long, power-law tail associated with assembly-and-tinkering construction. Over the entire dataset, machine proofs have a equal to . : average degree of belief in final theorem at one-step error rate of ; indicates . : log-likelihood penalty to within-module flip at equal to unity. Networks are truncated to the first depth expansion that produces more than 10,000 nodes, where possible otherwise to maximum depth.

Fig. 1 presents four examples of the proof networks we use in this analysis, with modules identified by the Girvan-Newman algorithm coloured, and with node size indicating out-degree. Fig. 2 shows the in- and out-degree distributions for the machine proof of the Four Color Theorem and Godel’s First Incompleteness Theorem. While in-degree (i.e., the number of prior nodes a particular claim depends on) is exponentially distributed, the out-degree (i.e., the number of nodes that use that claim) is has a heavy-tailed distribution, with a fraction of nodes having influence hundreds of times larger than average. This heavy-tailed distribution follows a power-law, with the probability of a node having degree given by

Across our sample of both machine and human proofs, these values cluster tightly around two (Table 1; fit using the methods of Ref. (clauset2009power)). This pattern, of both an exponential distribution for in-degree, and the particular value of for the out-degree power-law tail, is a characteristic sign of the generative assembly model of Ref. (redner, redner2). This construction process has two steps: first, a new node chooses some number of nodes to depend upon; second, from that set of chosen nodes, it chooses to link to some of their dependencies in a probabilistic fashion. It is found in both cultural and biological systems governed by successive accretion of links in a distinctive pattern associated with opportunistic tinkering and reuse (sole).

3.2 Epistemic Phase Transitions and Modular Firewalls

As described in Materials and Methods, we look for epistemic phase transitions as a function of both deductive and abductive implication strength: the degree to which the truth of a claim is coupled to the truth of either a claim it depends on, or a claim that it implies. We parameterize these by two terms, and , for the two pairwise effects of truth (or falsehood).

On the abductive side, is the multiplicative factor by which a correct implication makes the node more likely to be true; on the deductive side, , the multiplicative factor by which a correct deduction makes the node more likely to be true (see Materials and Methods for discussion of alternative choices). Taking (for simplicity) a symmetric error-making model, where the probability of incorrectly drawing a false conclusion from a true premise is the same as drawing a true conclusion from a false premise (and respectively for the abductive case), implies an error rate of

which corresponds to the error rate of Hume’s original paradox.

Figure 3: Top: Epistemic phase transitions in three theorems: Cantor’s theorem on the uncountability of the Reals, the Four Color Theorem, and Theorem IX.36 (the form of perfect numbers) in Euclid’s Geometry. Solid lines indicate average degree of belief over all steps of the proof; dashed lines, in the theorem itself; dotted lines, in the axioms. Bottom: the relationship between prior and posterior, after equilibrating to the heuristic model, at an inference error rate of . This error rate puts all three proofs past the phase transition point, and this means that even weak priors lead to near-unity degrees of belief. Remnant uncertainty at priors close to is due in part to frustrated freeze-in, i.e., where modules fall separately into all-true or all-false states.

Fig. 3a shows epistemic phase transitions in action for three proofs; Cantor’s theorem on the uncountability of the Reals (Coq, ), the Four Color Theorem (Coq, ), and the (arbitrarily chosen) Proposition IX.36, the form of perfect numbers, of Euclid’s Geometry with dependencies taken from the original Greek text (). For simplicity in this case, we have set equal to , and equal to . We plot three quantities: the average degree of belief over all steps of the proof, the average degree of belief in the final theorem, and the average degree of belief in the axioms. The three proofs in question show different certainty structures (for example, belief in the full proof lags that of both the theorem and axioms in the Euclidean case, while the reverse is true for the Four Color Theorem and the uncountability of the Reals), but share an overall pattern. At cognitively-plausible error rates, the graph structure leads to a sharp transition where high levels of certainty emerge even when when error rates are at levels that would invalidate proofs made by deductive reasoning alone.

Column three of Table 1 shows , the average degree of belief in the theorem (i.e., the terminal node) at a one-step error rate of . The majority reach near-unity levels of that exceed the one-step confidence. There are a few cases where this does not happen (e.g., Desargues’s Theorem); this appears to be due, in part, to the presence of nodes just below the final theorem that have both few dependents and no other implications—these dangling assumptions participate only weakly in the larger network of justification.

Fig. 3b shows the effect of shifting . Past the critical point, even weak priors can lead to the transition to deductive certainty; in the physics-style language of phase transitions, this is the finite-size analogue of a divergence in the susceptibility. Failure in the case of weak priors is due to the emergence of domain walls, i.e., localized parts of the network that freeze into all-false or all-true states. This can lead either to (1) a cascade into the all-false condition driven by the small-number statistics of the fluctuations, or (2) a long-lived metastable state because interconnections are insufficiently strong to generate global consensus. Modular structure, which we discuss now, allows a practicing reasoner to escape the metastable state.

In particular, the resolution of our data allows us to characterize how modular structure creates topological “firewalls” where different parts of a proof to decouple from each other. We compute the relative log-likelihood penalty to flip all the nodes in a module to the opposite truth value, versus, on the other, flipping the same number of nodes randomly chosen across the whole graph. We characterize this using , the log-likelihood penalty per nodes flipped, with the number of nodes set to ten and set to unity.

At high error rates, firewalls are fragile because there is little opportunity for order to propagate at any distance. However, as error decreases and order emerges, the distinct effects of within-module versus cross-module flips become apparent: the tighter connections between nodes within a module makes them easier to shift to the opposite state. As mathematicians increase their confidence their confidence in a proof, they find they can first achieve confidence in a particular module of the derivation even in the absence of strong beliefs about the truth in other places. This means that a modular proof strategy is easier than one that involves different parts of the system. Table 1 we list the values, where a positive value indicates that within-module flips are more likely than cross-module ones. Values are around , corresponding to an overwhelming preference (at the level) for within-, rather than cross-, module flips.

3.3 The Abductive Paradox

Figure 4: Average degree of belief in four theorems and their preconditions, as a function of abductive and deductive error rates. Network structure leads to levels of confidence far in excess of what can be expected on the basis of a linear derivation chain. For fixed deductive (abductive), but rising abductive (deductive) confidence, contours turn over, leading to an abductive paradox driven by an imbalance in the two modes of reasoning.

Fig. 4 shows four examples of our final result: that, at a fixed level of deductive power, increasing abductive power can eventually lead to a degradation in the final degree of belief. In each case, for deductive (or abductive) certainty beyond the transition point, we see the contours of contrast belief turn over. A vertical (or horizontal) line drawn past the (rough) EPT point of equal to will eventually cross the contours going downwards in certainty. Contour plots for all the theorems discussed can be found in the SI Appendix.

4 Discussion

Our account of the emergence of mathematical belief depends on the use of paths that go both against and with the deductive grain to generate an epistemic phase transition. While this poses a challenge to the purely deductive model, the heuristics that underlie the EPT fit naturally with accounts that balance abduction and deduction, allow intuition to play a role in the status of a claim without coming to dominate deduction, and allow those intuitions to develop over time and in the course of examining a proof.

Consider, as an analogy, the use of computer code in a research project. Suppose I mostly believe some fact A, and I write a complex computer program to check and increase my confidence in that belief. If the program produces some output B that contradicts A, then I will likely first check the program itself for errors, a move that corresponds to doubting the axioms, or earlier stages of reasoning, in abductive fashion. Later, on reflection, I might realize that the output B is actually more intuitive than A; this will now have the opposite effect, and act to increase my confidence in the earlier stages of the code even if I do no further checks. Few theories of belief formation would rule out the analogous process in mathematical reasoning, which is also found in informal accounts by practitioners (villani). A more elaborate reflection on the relationship between deduction and abduction, in the context of communication and social justification, is provided by the mathematician and computer scientist Scott Aaronson.

[A] step-by-step logical deduction tends to be seen as merely the vehicle for dragging the reader or listener, kicking and screaming, toward a gestalt switch in perspective—a switch that, once you’ve succeeded in making it, makes the statement in question (and more besides) totally obvious and natural and it couldn’t ever have been otherwise. The logical deduction is necessary, but the gestalt switch is the point. This, I think, explains the feeling of certainty that mathematicians express once they’ve deeply internalized something—they’re not multiplying out probabilities of an oversight in each step, they’re describing the scenery from a new vantage point that the steps helped them reach (after which the specific steps could be modified or discarded). (scott_quote)

We emphasize that our results bear on real-world practice, rather than any underlying normative justification. A theorem that contains a error in its logic may have little trouble deriving all sorts of (abductively) reasonable conclusions, and thereby lead mathematicians to believe, incorrectly but explosively, in its truth. (It may be the case that the kinds of errors that invalidate proofs, in ways that can not be fixed, have distinct topological structures that prevent the emergence of an EPT. Such a determination requires a parallel database of invalid proofs.)

That the path scaling required for an EPT can be achieved with a network structure associated with tinkering and reuse suggests that the method of construction may itself aid in the method of justification. Such a process fits qualitative accounts of how proofs are made. Lakatos’ dialectic model, for example, presented in Proofs and Refutations, emphasises tinkering by making proof an embedding of the truth of a conjecture into other areas of mathematics. The proof provides avenues for directed criticism, and both conjecture and proof are co-modified in response to the incremental introduction of local and global counterexamples. The structure of the final theorem is a product of this dynamic interplay.

Finally, our results allow us to draw some conclusions about the nature of proofs produced without machine aid. In the case of Wiles’ proof of Fermat’s Last Theorem, for example, we see significant deviations from the equal to two power-law tail. This is due, in part, to a thinner network structure in which a text designed for human communication neglects to explicitly mention its reliance on common axioms or lemmas at every point they occur. This leads to a deficit of high-degree nodes whose absence frustrates an epistemic phase transition because they are no longer available as “Grand Central Stations” that increase the number of paths between points. A second example is Spinoza’s Ethics, the one non-mathematical text in our set, which achieves slightly lower levels of overall certainty than others. The Ethics is devoted to a philosophical account of the nature of matter and knowledge, albeit “in geometric order”, i.e., in an attempt to parallel the deductive certainty of the arguments aimed at establishing mathematical facts.

We emphasize, however, that there are more commonalities than differences between the purely human case and the machine-aided ones, even at the level of quantitative comparison. This suggests that systems like Coq may be of use not just for verification and validation of mathematical claims, but also for their insight into the nature of mathematical practice and cognition itself.

5 Conclusion

Ever since their invention in a cultural context associated with new forms of justification (axial), mathematical proofs have provided us with some of the most explicit examples of human reasoning we have available. They are an account, intended for use by other members of the community, of why we ought to believe something that attempts to be immune, by its very explicitness, to every objection.

Seen in this way, proofs are a test-bed for justification practices more generally. Our results suggest that the confidence provided by an epistemic phase transition may also be present, in latent form, in many other kinds of claims we make in day-to-day life. As noted by Ref. (mercier2017enigma), no piece of evidence—even one concerning the evidence of the senses—is transparent, and we can always be called upon to situate it in the context of a larger argument. This means that people expand claims about the physical world, or normative claims about how things ought to be, when they are asked for further justification. Those expansions, if constructed via a process similar to tinkering and reuse, ought to be able to support the same kinds of phenomena we establish here.

The combination of modularity, abduction, and deduction, as well as the underlying assembly mechanism of tinkering and reuse appears to have the power to generate significant levels of certainty even for complex and apparently fragile arguments. The networks we have studied here may be something that we create when called upon to justify our beliefs in non-mathematical claims as well.

The justification of beliefs through reason is a basic task of the human species. Mathematical proofs provide an unusual example of what many cultures consider an ultimate standard. Our findings here suggest that underlying features of that justification can lead to firm beliefs, even when we understand ourselves to be fallible and limited beings.

6 Materials and Methods

Networks for machine proofs are drawn from https://madiot.fr/coq100/, which lists and locates Coq formalizations of famous theorems; networks for the human proofs are built from standard editions.

6.1 Network Construction

Dependency networks for machine proofs are constructed from the abstract syntax trees of Gallina terms, where Gallina is Coq’s specification language. Terms in Gallina correspond to proofs of the specification given by the term’s type. Since our dependency graphs are derived from such terms, the existence of node can be interpreted as asserting the existence of inhabitants of a particular type; e.g., “X is an object of type ‘Two is Even’.” Constructing in this case corresponds to having a proof of the proposition that two is even, via the Curry-Howard Correspondence. Types in Coq are based on the Calculus of Inductive Constructions, a dependently typed lambda calculus. Hence types themselves can be parameterized by terms. For example, a node in a Coq network might be a function F that takes a natural number X, a proof that X is even, and returns a proof that X+X is even.

Here we will take such a proof that 4 is even and show an example transformation from Coq syntax to a reified tree to a directed acyclic graph.

First we define evenness inductively — all even numbers are either zero or two greater than an even number.

Inductive ev : nat -> Prop :=
| ev_0 : ev 0
| ev_SS : forall n : nat, ev n -> ev (S (S n)).

Theorem ev_2 : ev 2.
Proof. apply (ev_SS 0 ev_0). Qed.

Then we prove that two even numbers sum to an even number by induction on the proof that the first argument is even. This is combined with the previous proof that two is even into a proof that four is even.

Theorem add_even_even :
  forall {n m : nat}, ev m -> ev n -> ev (m + n).
  intros n m Hm Hn.
  induction Hm.
    { simpl. apply Hn. }
    { simpl. apply ev_SS. apply IHHm. }

Theorem ev_4 : ev 4.
  apply (add_even_even ev_2 ev_2).

PrintAST ev_4 with depth 1.

In the last line we call our fork of University of Washington’s CoqAST plugin (https://github.com/scottviteri/CoqAST). We substitute type constructor names for indexes into constructors, such as “S” for “(Constr nat 1)”. To prevent blow-up in output size, we do not expand axioms or definitions of inductive types.

  (App Top.add_even_even
    (App S (App S O)) ;2
    (App S (App S O))

If we print the AST to depth 2, we get the following:

(Definition Top.ev_4
    (App Top.add_even_even
        (App S (App S O))
        (App S (App S O))

(Definition Top.add_even_even
    (Lambda n_2 nat
    (Lambda m_22 nat
    (Lambda Hm_222 (App ev m_22)
    (Lambda Hn_2222 (App ev n_2)
        (App Top.ev_ind

(Definition Top.ev_2 (App ev_SS O ev_0))

When we print to a greater depth, we look for each definition that has not been elaborated and add its definition as a top level tree, as demonstrated by Top.ev_4 and Top.add_even_even above. We assemble the acyclic graph iteratively from these trees. We take every depth two subtree starting from the leaves, and check if there is a match in the graph that has been built up so far. If there is no match, then the subtree is added to the graph as a node with children. Inductively, no identical pieces of the tree will be added to the graph twice. We add numbers to the names of new variables bindings to ensure we do have two identically-named variables in the same scope, which might be otherwise be falsely unified during graph creation. This process turns the tree into a directed acyclic graph that flows from axioms and definitions to increasingly high-level theorems. The source code for this process is hosted at https://github.com/scottviteri/ManipulateProofTrees. Fig. 5 shows the graph of the proof that four is even.

Figure 5: The proof that 4 is even, represented as a directed acyclic graph

Networks for the human proofs are constructed by hand, using the references by the author (i.e., we include a dependency only when it is explicitly named). For example, Proposition 9 of Book I (“I.9”) in Euclid’s Geometry depends on Propositions I.1, I.3, and 1.8. Similarly, coding of Wiles’ proof of Fermat’s Last Theorem uses only Wiles’ explicit remarks, e.g., in phrases such as “the first two conditions [of Theorem 3.1] can be achieved using Lemma 1.12”.

6.2 Evidence Propagation

Each node in the proof tree is given a binary truth value. We model the (fallible) steps between claims in a maximum-entropy fashion that fixes the average error rate of a deductive step but leaves the system otherwise unconstrained (jaynes1957information). This corresponds to the Ising model,


where is the truth value of claim (zero or one), the matrix is non-zero when there is an evidentiary link between them (i.e., when invokes the truth of as part of its justification), and governs the reliability of the connection between and ; i.e., the extent to which the truth of given is believed to be correctly inferred. We measure the perceived truth as the time-average of the node; i.e., if the heuristic observer perceives the node to be true 70% of the time, the overall degree of belief is .

The statistical properties of this model can be simulated using the Metropolis-Hastings (MH) update rule as a heuristic for belief update: a node shifts state (from believed-true to believed-false) depending on the state of its neighbours, and the strength of that influence depends upon . We change this rule to account for the differential impact of dependencies and implications, i.e., a different value of depending on whether the coupling is from to or from to . This leads to the asymmetric Ising model, where the effect of on (all other things being equal) may not equal the effect of on , used in studies of updating in game-theoretic contexts (galam2010ising).

We write the strength of a dependence as , and the strength of an implication is (i.e., the extent to which belief in a claim derivable from abductively increases confidence in ). We begin our simulations with a (weak) bias in favor of truth, here a weakly charitable predisposition for the reader to consider the proof more likely to be true than false at the level before considering evidentiary links. (It is possible to consider scaling with , the number of dependencies, to capture the idea that a claim will cite only those things necessary for the proof; because the distribution of dependencies is not very wide, however, this amounts in practice to a simple rescaling.)

Finally, firewall strength, is defined as


where is the set of all nodes with assigned modules, is the change in energy (log-likelihood) when all nodes in module are flipped to the opposite state, and is the expectation value of the change in energy when the same number of nodes are chosen randomly from all nodes with module assignments. These are computed from simulations with the prior set to , which leads to modules freezing in to opposite states; however, qualitative results are invariant to tilting the prior in favor of truth. We normalize this quantity to the total number of nodes in the module to get the per-node penalty; this enables us to compare firewall strengths across graphs with different numbers of nodes.

7 Acknowledgements

We thank Jeremy Avigad, Kevin Zollman, Scott Aaronson, and Cait Lamberton for helpful discussions, and Kent Chang for assistance with data entry.