DeepAI

# On Learning to Prove

In this paper, we consider the problem of learning a (first-order) theorem prover where we use a representation of beliefs in mathematical claims instead of a proof system to search for proofs. The inspiration for doing so comes from the practices of human mathematicians where a proof system is typically used after the fact to justify a sequence of intuitive steps obtained by "plausible reasoning" rather than to discover them. Towards this end, we introduce a probabilistic representation of beliefs in first-order statements based on first-order distributive normal forms (dnfs) devised by the philosopher Jaakko Hintikka. Notably, the representation supports Bayesian update and does not enforce that logically equivalent statements are assigned the same probability---otherwise, we would end up in a circular situation where we require a prover in order to assign beliefs. We then examine (1) conjecturing as (statistical) model selection and (2) an alternating-turn proving game amenable (in principle) to self-play training to learn a prover that is both complete in the limit and sound provided that players maintain "reasonable" beliefs. Dnfs have super-exponential space requirements so the ideas in this paper should be taken as conducting a thought experiment on "learning to prove". As a step towards making the ideas practical, we will comment on how abstractions can be used to control the space requirements at the cost of completeness.

09/12/2016

### Logical Induction

We present a computable algorithm that assigns probabilities to every lo...
01/24/2017

### Deep Network Guided Proof Search

Deep learning techniques lie at the heart of several significant AI adva...
07/31/2019

### Deduction Theorem: The Problematic Nature of Common Practice in Game Theory

We consider the Deduction Theorem that is used in the literature of game...
05/16/2021

### Order Effects in Bayesian Updates

Order effects occur when judgments about a hypothesis's probability give...
03/27/2013

### Metaprobability and Dempster-Shafer in Evidential Reasoning

Evidential reasoning in expert systems has often used ad-hoc uncertainty...
08/29/2017

### Plausibility and probability in deductive reasoning

We consider the problem of rational uncertainty about unproven mathemati...
10/27/2016

### Optimal Belief Approximation

In Bayesian statistics probability distributions express beliefs. Howeve...

## 1 Introduction

The process of discovering a mathematical proof can be seen as a perfect information game where the goal is to show that a path exists (, the proof) between a given starting state (, the axioms) and ending state (, the claim) using a predefined collection of rules (, deduction). Like other perfect information games such as Go and Chess, the complexity of the theorem proving game involves managing the combinatorial nature of the search space. We can do this, for instance, by identifying useful heuristics and patterns. This is one sense in which players can learn and improve from their experiences playing the game.

The idea of “learning from experience” suggests that we can apply machine learning

to learn these heuristics and patterns as opposed to distilling them manually from human experience. Towards this end, researchers have demonstrated that machine learned algorithms can navigate the search spaces of Go

(Silver et al., 2016) and Chess (Silver et al., 2017) at a level exceeding human experts (, consistently defeat the best human players). Researchers have also experimented with applying machine learning to theorem provers (, see Komendantskaya et al., 2012; Kaliszyk et al., 2014; Gauthier et al., 2017; Duncan, 2002; Selsam et al., 2018; Kaliszyk et al., 2017; Irving et al., 2016; Loos et al., 2017; Kaliszyk et al., 2018; Huang et al., 2018), although the problem is much more difficult compared to Go and Chess when quantifiers are involved.111The state spaces of Chess and Go, albeit large, are finite. In contrast, quantifiers can range over infinite domains.

In this paper, we consider the problem of learning a prover for first-order logic,222First-order logic along with the axioms of set theory are expressive—they are in principle sufficient to encode most of modern mathematics, although humans generally work at a higher level of abstraction and within a natural language extended with mathematical concepts as opposed to a formal language. a well-understood setting with quantification, where we directly use a representation of beliefs in mathematical claims to construct proofs.333The literature on automated theorem proving is expansive see (, see Fitting, 2012, for a survey of first-order methods). Most provers use a proof-theoretic system as the primary abstraction for representing mathematical knowledge. The inspiration for doing so comes from the practices of human mathematicians where “plausible reasoning”444Pólya has written extensively on plausible reasoning, , the heuristic and non-deductive aspects of mathematical reasoning, including (1) weighing evidence for and against a conjecture, (2) making physical analogies, and (3) reasoning from randomness (, see Pólya, 1990a, b, 2004). is used in addition to deductive reasoning to discover proofs.555The non-deductive aspects of mathematical reasoning has been recognized by mathematicians and philosophers (, see Hacking, 1967; Corfield, 2003; Parikh, 2010; Seidenfeld et al., 2012; Mazur, 2014).

We start by introducing a representation of beliefs that assigns probabilities to the exhaustive and mutually exclusive first-order possibilities found in the theory of first-order distributive normal forms (dnfs) devised by the philosopher Jaakko Hintikka (Section 3). The idea of assigning weights to dnfs has been proposed by Hintikka (1970) in the context of inductive philosophy so the idea is not new. Our contribution here is extract and formalize some of these ideas for the purposes of “learning to prove”. We show that the representation supports a form of Bayesian update and induces a distribution on the validity of first-order statements that does not enforce that logically equivalent statements are assigned the same probability—otherwise, we would end up in a circular situation where we require a prover in order to assign probabilities. In addition, we show that there is an embedding of first-order statements into an associated Hilbert space where mutual exclusion in logic translates into orthogonality in the space.

Next, we consider two applications that a direct probabilistic representation of beliefs in mathematical claims has for “learning to prove”. First, we identify conjecturing as a form of (statistical) model selection (Section 4). Second, we introduce an alternating-turn game that involves determining the consistency of possibilities (Section 5). The game is amenable (in principle) to self-play training, a technique that has demonstrated success in learning expert-level play for the games of Go and Chess, to learn beliefs that can be used to construct a prover that is complete when logical omniscience666An agent is logically omniscient if it knows all the logical consequences that follow from a set of axioms. Consequently, logical omniscience should fail in the interim while learning a prover—there is nothing to learn if an agent already possesses knowledge of all theorems. is attained and sound provided that players maintain reasonable777Roughly speaking, an agent is reasonable if it does not assign zero probability to a possibility that it has not been able to falsify. We will define this formally in Section 3.1.3. beliefs. Implementing and empirically testing self-play for these games is technically challenging and beyond the scope of this paper.

The ideas in this paper should be taken with one major caveat: the space complexity of the representation is (highly) super-exponential as a function of quantifier depth (, the maximal number of nested quantifiers) so that the ideas are not practically implementable without modification. Thus our analysis in its current form should only be seen as conducting a thought experiment. As a step towards making the ideas here more practical, we will comment on how to control the sizes of the representations at the cost of completeness by treating certain combinations of properties as observationally indistinguishable, , by making abstractions and lazily considering more properties as needed (Section 6). This suggests a path towards implementation (, for the game).

As one final qualification concerning the ideas in this paper, we acknowledge that we have taken a somewhat narrow view of “learning to prove”. First, we restrict ourselves to a first-order axiomatic view of mathematics.888Although the axiomatic approach to mathematics is widely adopted, mathematicians typically do not carry out the paradigm to its full extent and write completely formal proofs. When they do, there are a variety of formal languages they can choose from in addition to first-order logic including higher-order logic and type theories. The practicality of formalizing mathematics has been aided by the development of tools called interactive theorem provers. (For instance, Gonthier et al. (2013) formalized the Feit-Thompson theorem, a deep result in group theory, using an interactive theorem prover.) There are interactive theorem provers based on first-order logic (, see Mizar, accessed 2019-4-6), higher-order logic (, see Isabelle, accessed 2019-3-31), and type theories (, see Coq, accessed 2019-3-31). An interesting direction of future work would be to see how the ideas in this paper apply to higher-order and type-theoretic settings. Second, we consider only a probabilistic aspect of plausible reasoning.999The use of probabilistic reasoning to model plausible reasoning is not a new idea—for instance, see work on probabilistic graphical models (Pearl, 1988) and work on inductive inference (, see Solomonoff, 1964a, b; Jaeger, 2005). The field of automated reasoning (, see Robinson and Voronkov, 2001b, a, for a survey) contains work on other forms of non-deductive reasoning including reasoning by induction (, see Quinlan, 1986; Bundy, 2001; Comon, 2001), abduction (, see Console et al., 1991; Mayer and Pirri, 1993; Gabbay et al., 1998; Denecker and Kakas, 2002), and analogy (, see Davies and Russell, 1987; Ashley, 1988; Russell, 1988). Finally, we emphasize that our work is not human-style theorem proving (, see Ganesalingam and Gowers, 2017) even though we take inspiration from human mathematicians. In spite of these limitations and shortcomings, we believe that the ideas presented here offer a descriptive account of “learning to prove” that cohesively accounts for the role of beliefs in the proving process, the utility of conjecturing, and the value of abstraction.

## 2 Preliminaries

We begin by setting up the notation and terminology we will use throughout this paper (Section 2.1). Next, we provide intuition for Hintikka’s dnfs (Section 2.2) and then introduce them formally for first-order logic without equality101010The restriction to first-order logic without equality is for simplicity: dnfs are defined for first-order logic with equality as well. All results given here apply to dnfs in both cases with the appropriate modifications. The difference between the two is between an inclusive treatment of quantifiers (without equality) and an exclusive treatment of quantifiers (with equality). As usual, note that we can include a binary predicate that encodes equality in first-order logic without equality, the difference with the case of first-order logic with equality being that structures may not necessarily be normal. (Section 2.3). For more background on dnfs, we refer the reader to (Hintikka, 1965, 1973; Nelte, 1997).

### 2.1 Notation and Background

Let . We will interchangeably use for (false) and for (true). denotes the set of naturals and denotes the set of positive naturals. denotes the set of reals and denotes the set of positive reals.

We write to indicate the power set of . We write for the cardinality of the set . We will often write a binary relation such as in infix notation as for and . The notation where is a predicate on a set indicates a set comprehension. We also write set comprehensions for indexed sets as where is a predicate on an index set that indexes .

When order matters, we use for sequences instead of . We write for the set of length sequences comprised of elements from . We write for the set of length strings comprised of elements from .

We will use ellipsis notation “” frequently in this paper. As usual, it means “fill in the dots with all the missing elements in between”. For example, gives the elements , , and so on until . When the commas are omitted as in , the notation indicates a string of those elements instead.

#### 2.1.1 First-order logic

The syntax of first-order logic (without equality) is summarized below.

 \mvterm \eqdefx\bnfsepc\bnfsepfn(\mvterm1,…,\mvtermn)\bnfsep\mvpredn(\mvterm1,…,\mvtermn) \mvform \eqdef\mvterm\bnfsep¬\mvform\bnfsep\mvform∨\mvform\bnfsep(∃x)\mvform

We use the meta-variable to refer to terms. A term is either a variable , a constant , a -ary function applied to terms, or a -ary predicate on terms. We use the meta-variable to refer to formulas. A formula is either a term (), the logical negation of a formula (), the logical or of two formulas (), or an existential quantification (). As usual, we encode logical and as and universal quantification as where we assume the usual precedence and use additional (meta-level) parentheses to aid the parsing of formulas. The meta-level notation where either negates the formula or leaves it alone .

We write a formula with free variables as where is a supply of free variables. A formula without free variables is called a sentence.

We use a standard deductive system for first-order logic and write if there is a derivation of using , any logical axioms, and the rules of inference. We write to be a set of sentences. We say that is consistent if a contradiction is not derivable, , both and are not derivable for any where we take the conjunction of all sentences in when it appears to the left of .

We use the standard semantics of first-order logic based on structures.111111For more background on first-order logic, we refer the reader to (Hodges, 2001). A structure is a tuple where is a (potentially empty) set called the domain, is a signature (the functions and relations of the language), and is an interpretation of the signature. Note that an empty domain cannot be used to interpret a language with constants. We say that a formula is satisfiable in a structure if for every where is the usual satisfaction relation defined by induction on the structure of formulas and we overload to mean that the interpretation of the variable in is . A sentence is satisfiable if there is some structure such that .

Recall that first-order logic with a standard proof system and standard semantics is sound (, if ) and complete (, if ). Thus a sentence is consistent iff it is satisfiable. A sentence is inconsistent if it is satisfiable in no structures, consistent if it is satisfiable in at least one structure, and logically valid if it is satisfiable in every structure. We write when and are logically equivalent.

#### 2.1.2 Graphs and trees

A directed graph is a tuple where is a set of vertices and is a set of edges. Because we only consider directed graphs in this paper, we will abbreviate directed graph as graph. A path in a graph is a graph of the form and where all are distinct. We refer to and as the endpoints of the path.

A (rooted) tree is a tuple where is a graph such that any two vertices are connected by a unique path and is a vertex designated as a root. Because there is only one path between any two vertices, a path between and can be identified by the traversed vertices , or simply the two endpoints and . We say that is a parent of , and is a child of , if there is a path . We write so that obtains the set of children of . if We say that is an ancestor of , and is a descendant of , if there is a path . We write so that obtains the set of ancestors of ( for descendants).

### 2.2 Distributive Normal Forms: Intuition

The role of a dnf of a first-order formula is analogous to that of a disjunctive normal form of a propositional formula in that the dnf of a formula is a disjunction of mutually exclusive possibilities. That we can exhaustively describe mutually exclusive possibilities in the first-order setting is not obvious as the domain of quantification can be infinite and individuals in the domain can become related to one another as more individuals are considered. We start with an example to illustrate the basic problem and solution due to Hintikka.

Consider a first-order theory with one binary predicate , where is infix for “x is less than y”, for describing individuals and their order relations with one another. We can look at what the normal form of the statement “every individual has an individual that is smaller than it”, encoded in this language as

 (∀x)(∃m)m

could be. Assuming that we have a constant that names each element in the domain of quantification, a first attempt would be to translate each into a conjunction (over the domain of individuals) and each into a disjunction (over the domain of individuals), and use a propositional normal form. That is, we convert the result of translating the quantifiers away

 \lAndx(\lOrmm

into disjunctive normal form. Unfortunately, the domain of quantification can be infinite, so the resulting formula may be of infinite size. The “trick” for circumventing this is to enumerate how the predicates at hand can describe the relationships between individuals (uniformly in ) instead of enumerating tuples of individuals. We can then identify possible kinds of worlds by listing which kinds of individuals exist or not.

To see how this works, we rewrite the original statement as

 ¬(∃x)¬((∃m)(m

(In words, it is impossible to find an individual that does not have an individual that is less than it.) In this form, we can think of the normal form of a statement with quantification as describing whether kinds of individuals with certain relations to one another exist or not. In order to exhaust all the possibilities, we need to consider all the cases in which and can related to one another that are consistent with the original formula.

We can see this better in our specific example by introducing notation that enumerates all descriptions of one and two free individual variables describable by the predicate . When there is one free individual variable , the only possibility is to relate to itself as below

 Pa1(x1)\eqdef(±)a1(x1

where says that is not less than itself and says that is less than itself. When there are two free individual variables and , we have

 Qa1a2a3(x1,x2)\eqdef(±)a1(x1

where the subscript indexes each . For example,

 Q100(x1,x2)=x1

We enumerate all combinations of whether such individuals exist or not next.

 δb1…b2512\eqdef(±)b1[(∃x1)P0(x1)∧¬(∃x2)Q000(x1,x2)∧⋯∧¬(∃x2)Q111(x1,x2)]∧…∧(±)b2256[(∃x1)P0(x1)∧(∃x2)Q000(x1,x2)∧⋯∧(∃x2)Q111(x1,x2)]∧(±)b2256+1[(∃x1)P1(x1)∧¬(∃x2)Q000(x1,x2)∧⋯∧¬(∃x2)Q111(x1,x2)]∧…∧(±)b2512[(∃x1)P1(x1)∧(∃x2)Q000(x1,x2)∧⋯∧(∃x2)Q111(x1,x2)]

The possible kinds of worlds described by our original formula is then any

 δb1…b2512

that implies the original formula. When we introduce dnfs more formally (Section 2.3), we will see that the possible kinds of worlds are constituents.

The example deserves some remarks. First, note that we really have exhaustively enumerated all the mutually exclusive possibilities. The possibility describes one extreme where there are no individuals (and hence the original statement is vacuously true),121212Note that traditional presentations of first-order model theory disallow empty domains although this restriction is not necessary. On the syntactic side, we will need to modify proof rules (, the rule used in converting formula to prenex normal form no longer holds) to maintain soundness and completeness. the possibility requires individuals to be less than themselves, and the possibility enables every kind of individual to exist with respect to . Second, note that the number of possibilities even in this small example (two individuals and one predicate) is enormous at . The astronomical number of constituents is not an issue for theoretical purposes although it does render the straightforward application of the theory to be unfeasible.

### 2.3 Distributive Normal Forms: Background

Define the set

 S(\set\mvform1,…,\mvformk)\eqdef\set\lAndi∈\set1,…,k(±)bi\mvformi\STb1∈2,…,bk∈2.

An element of is a conjunction of every or its negation.

Let denote the set of all atomic formula (, a predicate applied to a tuple of terms) involving the free individual terms (, constants or variables) . Let denote the subset of that mentions at least once.

##### Attributive constituents

An attributive constituent with free individual terms of depth is an element of . We write for the set of all attributive constituents with free individual terms of depth . By convention, we set . An attributive constituent of depth is a formula of the form

 γ(0)r[y1,…,yk]=\lAndi∈\set1,…,ℓ\cBk(±)biBi[y1,…,yk]

where , each , and each . The subscript indexes the attributive constituent and can be identified with the string . Let be an index set for attributive constituents with free individual terms of depth . We have . The superscript indicates the depth of the formula, , the maximal number of nested quantifiers in the formula. Hence a depth of indicates that there are no quantifiers.

The set of attributive constituents of depth is defined by induction on . More concretely, we have an attributive constituent with free individual terms of depth has the form

 γ(d)r,s[y1,…,yk]=γ(0)r[y1,…,yk]∧⎧⎪⎨⎪⎩\lAndr′∈\cG0k+1(±)s(r′)(∃x)γ(0)r′[y1,…,yk,x]d=1% \lAnd(r′,s′)∈\cGd−1k+1(±)s(r′,s′)(∃x)γ(d−1)r′,s′[y1,…,yk,x]d>1

where we will explain the undefined notation below. Let be an index set for attributive constituents of depth with free individual terms . The subscript is a pair of and a function indicating whether the appropriately indexed attributive constituent (of depth with free individual terms) exists or not. When the indices do not matter, we will abbreviate as . When we refer to two distinct attributive constituents whose indices do not matter, we will overload the subscripts as in and to distinguish them.

An attributive constituent with free individual terms of depth can equivalently be defined as

 γ(d)r,s[y1,…,yk]=γ(0)r[y1,…,yk]∧\lAnd(r′,s′)∈\cGd−1k+1|s+(∃x)γ(d−1)r′,s′[y1,…,yk,x]∧((∀x)\lOr(r′,s′)∈\cGd−1k+1|s+γ(d−1)r′,s′[y1,…,yk,x])

where is the index set restricted to the positive ones as given by the function .

##### Constituents

A constituent with free individual terms of depth is a formula of the form

 δ(d)q,r,s[y1,…,yk]=Aq[y1,…,yk−1]∧γ(d)r,s[y1,…,yk]

where . Let be the set of constituents of depth with free individual terms. By convention, we set . We write for the set indexing . We use the same abbreviation scheme for the indices of constituents as we did for attributive constituents. Note that a constituent is an attributive constituent with an additional . Thus attributive constituents and constituents can be identified when there are free individual terms.

##### Distributive normal forms

A distributive normal form (dnf) with free individual terms is a disjunction of constituents

 \lOrδ(d)[y1,…,yk]∈Dδ(d)[y1,…,yk]

for some subset of constituents.

##### Properties

Attributive constituents, constituents, and dnfs have the following useful properties (Hintikka, 1965). [Existence, mutual exclusion, and exclusivity]

[noitemsep]

Existence

Every formula (of depth ) has a distributive normal form (of depth ), , there is a function such that

 \mvform(d)[y1,…,yk]=\lOrδ(d)[y1,…,yk]∈dnf(\mvform[y1,…,yk])δ(d)[y1,…,yk]

where is the set of well-formed first-order sentences with free individual terms .

Mutual exclusion

Any two constituents and attributive constituents of the same depth are mutually exclusive, , for any .

Expansion

Every constituent can be written as a disjunction of its expansion constituents, , there is a function such that

 δ(d)[y1,…,yk]≡\lOrδ(d+e)[y1,…,yk]∈expand(e,δ(d)[y1,…,yk])δ(d+e)[y1,…,yk].

Any is said to refine or is a refinement of .131313The original terminology that Hintikka uses is subordinate. We prefer the term refinement because it evokes the intuition that describes the possibility described by in finer detail. We write when refines . Let

 Δ[y1,…,yk]\eqdef⋃d∈\NΔ(d)[y1,…,yk].

Then the refinement relation is a partial order and is a poset.

It is well-known that validity of first-order formulas is undecidable. Consequently, the consistency of constituents in a dnf is undecidable. There is a weaker notion called trivial inconsistency that is decidable. There are several notions of trivial inconsistency (, see Hintikka, 1973; Nelte, 1997), although the exact form is not important for our purposes. [Completeness] An attributive constituent is inconsistent iff all of its expansions at some depth are trivially inconsistent (Hintikka, 1965). Thus, an inconsistency at depth will eventually manifest itself as trivially inconsistent at some depth , although the depth is not recursively computable.141414There are notions of trivial inconsistency that are not strong enough to ensure completeness as noted by Nelte (1997). The main idea is show that a consistent attributive constituent always has an expansion that is not trivially inconsistent; the result follows from an application of Kőnig’s tree lemma.

## 3 Representing Beliefs in Mathematical Knowledge

In this section, we introduce a representation that assigns probabilities to the exhaustive and mutually exclusive possibilities of first-order logic that we have just seen to be constituents. More concretely, we formalize a method for assigning weights to constituents and an appropriate Bayesian update following the idea of assigning weights to constituents described by Hintikka (1970, pg. 274–282) (Section 3.1

). The representation induces a probability distribution on the validity of first-order statements that does not enforce that logically equivalent statements are assigned the same probability so that the beliefs of agents that are not logically omniscient

151515The problem of logical omniscience is an issue encountered in epistemic logic (, see Sim, 1997; Fagin et al., 2004; Halpern and Pucella, 2011) where we reason about the knowledge of agents. One solution for weakening logical omniscience involves modeling impossible possible worlds, , modeling worlds that an agent considers possible but are eventually revealed to be impossible. Hintikka argues that dnfs provide a model of impossible possible worlds—an impossible possible world is an inconsistent constituent that is not trivially inconsistent at some depth and revealed to be trivially inconsistent at a later depth (by completeness, Hintikka, 1979). Thus the application of dnfs to address the problem of logical omniscience has also been hinted at by Hintikka. can be encoded (Section 3.2). At the end of the section, we identify an embedding space—a Hilbert space—for first-order statements based on the probabilistic representation where mutual exclusion in logic translates into orthogonality in the space (Section 3.3).

[Simple first-order languages] For simplicity, we restrict attention to first-order languages with a finite number of predicates, no function symbols, and no constants unless stated otherwise.161616As a reminder, the effect of equality is to give an exclusive interpretation of quantifiers. All the results that hold on constituents in first-order logic without equality also hold on constituents in first-order logic with equality with the appropriate modifications. Note that functions can be encoded as predicates in first-order logic with equality. Observe also that the current setting actually admits a finite number of constants. More concretely, we can associate each constant with a monadic predicate where the interpretation of is “ is the constant ”. Any formula that refers to the constant can thus be translated to where is not free in and we add the additional axiom to the theory. Hence, we are roughly working with first-order languages with a finite number of predicates, functions, and constants. The constant-free restriction simplifies the form of constituents and dnfs we will need to consider. As a reminder, every first-order formula with free individual terms has a dnf of depth constituents. In a constant-free setting, the free individual terms are all variables. Thus the original formula is equivalent to its universal closure , which is a formula with free individual terms (, a sentence). Consequently, we only need to consider the set of constituents , abbreviated , of depth with free individual terms. We have that by convention.

### 3.1 Hintikka Trees

We formalize a representation that assigns probabilities to constituents in this section. As the set of constituents at any depth exhaust and describe all mutually exclusive possibilities at that depth, the idea behind the representation is the standard one: list all possibilities and assign weights to them that sum to one. We construct the representation in two parts. First, we introduce a refinement tree that keeps track of the refinement relation because constituents of different depths do not denote mutually exclusive possibilities when they are related according to the refinement partial order (Section 3.1.1). Second, we describe how to assign weights to the refinement tree which completes the static representation of an agent’s beliefs (Section 3.1.2). After we introduce the representation, we introduce dynamics via a renormalization operator which can be interpreted as a form of Bayesian update for beliefs (Section 3.1.3).

#### 3.1.1 Refinement tree

Let the set of vertices be the set of constituents of any depth . Let the set of edges consist of the refinement relation omitting reflexive relations. Then is a graph that encodes the refinement relation (minus the reflexive edges).

The graph is not a tree because the expansions of two distinct constituents can share refining constituents, although the shared constituents are necessarily inconsistent. Suppose . If , then is inconsistent. Assume additionally for the sake of contradiction that is consistent. Then there exists a structure such that . Thus we have that and (because by another assumption), which contradicts that and are mutually incompatible (by exclusivity in Proposition 2.3).

By the proposition above, we can associate any shared constituent that is a refinement of two parent constituents to either parent constituent and disassociate it with the other without changing the consistency of either parent constituent. In other words, we can remove one edge. We can use this observation to convert into a tree. We call a refinement tree where is the set of edges obtained after the pruning procedure described above is applied. Throughout the rest of this paper, we will assume that we have chosen one such pruning and will write as . We will also overload to refer to the pruned refinement partial order. We have the following obvious relationships between the (pruned) refinement partial order and (pruned) refinement tree.

1. [noitemsep]

2. iff .

3. iff or (equivalently ).

Straightforward.

A path between constituents and is the sequence of constituents and their refinements . Because there is only one path between any two vertices in a tree, we can identify a constituent (, a node in a refinement tree) with the path taken through a refinement tree starting at the root node to reach it.

Figure 1 gives an illustration of a refinement tree where constituents are indexed by their paths. The root constituent of the tree is indexed by the empty path . Figure 2 gives another illustration of a refinement tree.

#### 3.1.2 Assigning weights

We assign weights to constituents by attaching a weight to each node of the refinement tree. Because the assignment of weights needs to respect the refinement partial order, we will need a notion of coherence between the weight assignments to adjacent levels of the refinement tree. A Hintikka tree (HT) is a tuple where is a refinement tree and is a function on constituents satisfying

[noitemsep]

Unitial initial beliefs

; and

Coherently constructed

.171717The method of assigning weights in a HT is slightly different than the one described in prose by Hintikka (1970, pg. 274–282). In particular, Hintikka combines the statics and dynamics of the weight assignment whereas we separate them out and only describe the statics here. We will discuss the dynamics in Section 3.1.3.

We will abbreviate a HT as . We write for the set of HTs defined with respect to the the first-order simple language . The first condition states that we start off with unitial beliefs. The second condition enforces that the probability that we assign a constituent is contained entirely within the subtree of the refinement tree rooted at . Hence the assignment of weights is conserved across depth. Observe that the assignment of weights to constituents is not constrained by the “fraction” of models that the constituents are satisfiable in. If it were, then the induced distribution on the validity of first-order statements would enforce logical omniscience.

[Normalization] The beliefs assigned to constituents at each depth by a HT are normalized:

 ∑δ(d)∈Δ(d)Hd(δ(d))=1.

We proceed by induction on . The base case follows from unitial initial beliefs. In the inductive case, we have to show that

 ∑δ(d+1)∈Δ(d+1)H(δ(d+1))=1.

We have that

 ∑δ(d+1)∈Δ(d+1)H(δ(d+1)) =∑δ(d)∈Δ(d)∑δ(d+1)≥δ(d)H(δ(d+1)) =∑δ(d)∈Δ(d)H(δ(d))

where the first equality is a rearrangement and the second equality follows because is coherently constructed. The result follows as we have by the inductive hypothesis. [Infinite supported path] For any HT , there is a chain of constituent such that for any in the chain. We proceed by induction on . The base case follows by unitial initial beliefs of . In the inductive case, we have that . The result follows as is coherently constructed and has a finite number of children so there must exist a refinement such that .

We end with several examples of HTs. Figure 3 gives an illustration of an example HT. As required by the definition, beliefs across depth are coherent. A HT is an uninformative Hintikka tree if

is a uniform distribution at every depth, ,

for any . A HT is a depth Hintikka tree if is constrained so that inconsistent constituents are assigned .181818The terminology is inspired by depth information (Hintikka, 1970). Observe that there are consistent constituents at every depth and that consistent constituents have consistent refinements by the constituent completeness theorem so that a depth HT is well-defined. For example, the sentence is logically valid at depth . Inconsistency is undecidable so that a depth HT is not computable. If a theorem proving agent represents mathematical knowledge with a depth HT, then the agent is logically omniscient. We have that iff for some depth HT .

A HT provides a static representation of an agent’s beliefs. Naturally, an agent may encounter a situation where it realizes that its beliefs need to be revised. For example, upon further inspection of all the expansions of a parent constituent, the agent may realize that they are all inconsistent so the belief in the parent constituent should be eliminated and redistributed to other constituents. Intuitively, this may occur because the increase in depth corresponds to the construction of an object (, an introduction of an existential) and the consideration of this extra object changes the valuation of the consistency of the parent possibility. Indeed, such a situation arises from the constituent completeness theorem: inconsistent constituents are eventually revealed to be trivially inconsistent at some depth even if they are not trivially inconsistent at shallower depths. We turn our attention to the dynamics of belief revision in the representation now.

#### 3.1.3 Renormalization dynamics

Hintikka (1970, pg. 281) describes a method of redistributing weights assigned to a refinement tree when belief in a node and all of its descendants is lost. The intuition for the update follows Bayesian “refute” and “rescale” dynamics: when belief in a node and all of its descendants is eliminated so that those possibilities are “refuted”, the beliefs in the smallest subtree containing that node that still has positive weight are “rescaled” appropriately. In this section, we formalize this intuition as a renormalization operation. Towards this end, we will specify (1) which constituents to redistribute beliefs to and (2) the amount of belief to redistribute to those constituents.

##### Part one of renormalization

We start with the first task and begin by identifying which constituents to redistribute beliefs to when we discontinue beliefs in in a HT . Define the function as (1) if there is some such that for and and (2) otherwise. Define the support function as

The idea is that we will transfer beliefs assigned to unsupported constituents over to the appropriate supported constituents. Define the abbreviations and . Thus and partition . Define a -redistribution point as

 ρH,δ(d)−\eqdefmax0≤r≤d\setδ(r)\STδ(r)≤δ(d),δ(r)∈\cS+H,δ(d)−,

which is the closest (, deepest by depth) ancestor constituent that has supported descendants. A -redistribution point identifies a vertex of the refinement tree that has supported descendants to redistribute beliefs in unsupported constituents to.

##### Part two of renormalization

We turn our attention towards the second task concerning the amount of belief to redistribute to each constituent now. Let be the children of that are supported. Then

 Z+H,δ(d)−\eqdef∑δ(e)∈D+H,δ(d)−H(δ(e))

is the positive renormalization constant and

is the total renormalization constant.