1 Introduction
The process of discovering a mathematical proof can be seen as a perfect information game where the goal is to show that a path exists (, the proof) between a given starting state (, the axioms) and ending state (, the claim) using a predefined collection of rules (, deduction). Like other perfect information games such as Go and Chess, the complexity of the theorem proving game involves managing the combinatorial nature of the search space. We can do this, for instance, by identifying useful heuristics and patterns. This is one sense in which players can learn and improve from their experiences playing the game.
The idea of “learning from experience” suggests that we can apply machine learning
to learn these heuristics and patterns as opposed to distilling them manually from human experience. Towards this end, researchers have demonstrated that machine learned algorithms can navigate the search spaces of Go
(Silver et al., 2016) and Chess (Silver et al., 2017) at a level exceeding human experts (, consistently defeat the best human players). Researchers have also experimented with applying machine learning to theorem provers (, see Komendantskaya et al., 2012; Kaliszyk et al., 2014; Gauthier et al., 2017; Duncan, 2002; Selsam et al., 2018; Kaliszyk et al., 2017; Irving et al., 2016; Loos et al., 2017; Kaliszyk et al., 2018; Huang et al., 2018), although the problem is much more difficult compared to Go and Chess when quantifiers are involved.^{1}^{1}1The state spaces of Chess and Go, albeit large, are finite. In contrast, quantifiers can range over infinite domains.In this paper, we consider the problem of learning a prover for firstorder logic,^{2}^{2}2Firstorder logic along with the axioms of set theory are expressive—they are in principle sufficient to encode most of modern mathematics, although humans generally work at a higher level of abstraction and within a natural language extended with mathematical concepts as opposed to a formal language. a wellunderstood setting with quantification, where we directly use a representation of beliefs in mathematical claims to construct proofs.^{3}^{3}3The literature on automated theorem proving is expansive see (, see Fitting, 2012, for a survey of firstorder methods). Most provers use a prooftheoretic system as the primary abstraction for representing mathematical knowledge. The inspiration for doing so comes from the practices of human mathematicians where “plausible reasoning”^{4}^{4}4Pólya has written extensively on plausible reasoning, , the heuristic and nondeductive aspects of mathematical reasoning, including (1) weighing evidence for and against a conjecture, (2) making physical analogies, and (3) reasoning from randomness (, see Pólya, 1990a, b, 2004). is used in addition to deductive reasoning to discover proofs.^{5}^{5}5The nondeductive aspects of mathematical reasoning has been recognized by mathematicians and philosophers (, see Hacking, 1967; Corfield, 2003; Parikh, 2010; Seidenfeld et al., 2012; Mazur, 2014).
We start by introducing a representation of beliefs that assigns probabilities to the exhaustive and mutually exclusive firstorder possibilities found in the theory of firstorder distributive normal forms (dnfs) devised by the philosopher Jaakko Hintikka (Section 3). The idea of assigning weights to dnfs has been proposed by Hintikka (1970) in the context of inductive philosophy so the idea is not new. Our contribution here is extract and formalize some of these ideas for the purposes of “learning to prove”. We show that the representation supports a form of Bayesian update and induces a distribution on the validity of firstorder statements that does not enforce that logically equivalent statements are assigned the same probability—otherwise, we would end up in a circular situation where we require a prover in order to assign probabilities. In addition, we show that there is an embedding of firstorder statements into an associated Hilbert space where mutual exclusion in logic translates into orthogonality in the space.
Next, we consider two applications that a direct probabilistic representation of beliefs in mathematical claims has for “learning to prove”. First, we identify conjecturing as a form of (statistical) model selection (Section 4). Second, we introduce an alternatingturn game that involves determining the consistency of possibilities (Section 5). The game is amenable (in principle) to selfplay training, a technique that has demonstrated success in learning expertlevel play for the games of Go and Chess, to learn beliefs that can be used to construct a prover that is complete when logical omniscience^{6}^{6}6An agent is logically omniscient if it knows all the logical consequences that follow from a set of axioms. Consequently, logical omniscience should fail in the interim while learning a prover—there is nothing to learn if an agent already possesses knowledge of all theorems. is attained and sound provided that players maintain reasonable^{7}^{7}7Roughly speaking, an agent is reasonable if it does not assign zero probability to a possibility that it has not been able to falsify. We will define this formally in Section 3.1.3. beliefs. Implementing and empirically testing selfplay for these games is technically challenging and beyond the scope of this paper.
The ideas in this paper should be taken with one major caveat: the space complexity of the representation is (highly) superexponential as a function of quantifier depth (, the maximal number of nested quantifiers) so that the ideas are not practically implementable without modification. Thus our analysis in its current form should only be seen as conducting a thought experiment. As a step towards making the ideas here more practical, we will comment on how to control the sizes of the representations at the cost of completeness by treating certain combinations of properties as observationally indistinguishable, , by making abstractions and lazily considering more properties as needed (Section 6). This suggests a path towards implementation (, for the game).
As one final qualification concerning the ideas in this paper, we acknowledge that we have taken a somewhat narrow view of “learning to prove”. First, we restrict ourselves to a firstorder axiomatic view of mathematics.^{8}^{8}8Although the axiomatic approach to mathematics is widely adopted, mathematicians typically do not carry out the paradigm to its full extent and write completely formal proofs. When they do, there are a variety of formal languages they can choose from in addition to firstorder logic including higherorder logic and type theories. The practicality of formalizing mathematics has been aided by the development of tools called interactive theorem provers. (For instance, Gonthier et al. (2013) formalized the FeitThompson theorem, a deep result in group theory, using an interactive theorem prover.) There are interactive theorem provers based on firstorder logic (, see Mizar, accessed 201946), higherorder logic (, see Isabelle, accessed 2019331), and type theories (, see Coq, accessed 2019331). An interesting direction of future work would be to see how the ideas in this paper apply to higherorder and typetheoretic settings. Second, we consider only a probabilistic aspect of plausible reasoning.^{9}^{9}9The use of probabilistic reasoning to model plausible reasoning is not a new idea—for instance, see work on probabilistic graphical models (Pearl, 1988) and work on inductive inference (, see Solomonoff, 1964a, b; Jaeger, 2005). The field of automated reasoning (, see Robinson and Voronkov, 2001b, a, for a survey) contains work on other forms of nondeductive reasoning including reasoning by induction (, see Quinlan, 1986; Bundy, 2001; Comon, 2001), abduction (, see Console et al., 1991; Mayer and Pirri, 1993; Gabbay et al., 1998; Denecker and Kakas, 2002), and analogy (, see Davies and Russell, 1987; Ashley, 1988; Russell, 1988). Finally, we emphasize that our work is not humanstyle theorem proving (, see Ganesalingam and Gowers, 2017) even though we take inspiration from human mathematicians. In spite of these limitations and shortcomings, we believe that the ideas presented here offer a descriptive account of “learning to prove” that cohesively accounts for the role of beliefs in the proving process, the utility of conjecturing, and the value of abstraction.
2 Preliminaries
We begin by setting up the notation and terminology we will use throughout this paper (Section 2.1). Next, we provide intuition for Hintikka’s dnfs (Section 2.2) and then introduce them formally for firstorder logic without equality^{10}^{10}10The restriction to firstorder logic without equality is for simplicity: dnfs are defined for firstorder logic with equality as well. All results given here apply to dnfs in both cases with the appropriate modifications. The difference between the two is between an inclusive treatment of quantifiers (without equality) and an exclusive treatment of quantifiers (with equality). As usual, note that we can include a binary predicate that encodes equality in firstorder logic without equality, the difference with the case of firstorder logic with equality being that structures may not necessarily be normal. (Section 2.3). For more background on dnfs, we refer the reader to (Hintikka, 1965, 1973; Nelte, 1997).
2.1 Notation and Background
Let . We will interchangeably use for (false) and for (true). denotes the set of naturals and denotes the set of positive naturals. denotes the set of reals and denotes the set of positive reals.
We write to indicate the power set of . We write for the cardinality of the set . We will often write a binary relation such as in infix notation as for and . The notation where is a predicate on a set indicates a set comprehension. We also write set comprehensions for indexed sets as where is a predicate on an index set that indexes .
When order matters, we use for sequences instead of . We write for the set of length sequences comprised of elements from . We write for the set of length strings comprised of elements from .
We will use ellipsis notation “” frequently in this paper. As usual, it means “fill in the dots with all the missing elements in between”. For example, gives the elements , , and so on until . When the commas are omitted as in , the notation indicates a string of those elements instead.
2.1.1 Firstorder logic
The syntax of firstorder logic (without equality) is summarized below.
We use the metavariable to refer to terms. A term is either a variable , a constant , a ary function applied to terms, or a ary predicate on terms. We use the metavariable to refer to formulas. A formula is either a term (), the logical negation of a formula (), the logical or of two formulas (), or an existential quantification (). As usual, we encode logical and as and universal quantification as where we assume the usual precedence and use additional (metalevel) parentheses to aid the parsing of formulas. The metalevel notation where either negates the formula or leaves it alone .
We write a formula with free variables as where is a supply of free variables. A formula without free variables is called a sentence.
We use a standard deductive system for firstorder logic and write if there is a derivation of using , any logical axioms, and the rules of inference. We write to be a set of sentences. We say that is consistent if a contradiction is not derivable, , both and are not derivable for any where we take the conjunction of all sentences in when it appears to the left of .
We use the standard semantics of firstorder logic based on structures.^{11}^{11}11For more background on firstorder logic, we refer the reader to (Hodges, 2001). A structure is a tuple where is a (potentially empty) set called the domain, is a signature (the functions and relations of the language), and is an interpretation of the signature. Note that an empty domain cannot be used to interpret a language with constants. We say that a formula is satisfiable in a structure if for every where is the usual satisfaction relation defined by induction on the structure of formulas and we overload to mean that the interpretation of the variable in is . A sentence is satisfiable if there is some structure such that .
Recall that firstorder logic with a standard proof system and standard semantics is sound (, if ) and complete (, if ). Thus a sentence is consistent iff it is satisfiable. A sentence is inconsistent if it is satisfiable in no structures, consistent if it is satisfiable in at least one structure, and logically valid if it is satisfiable in every structure. We write when and are logically equivalent.
2.1.2 Graphs and trees
A directed graph is a tuple where is a set of vertices and is a set of edges. Because we only consider directed graphs in this paper, we will abbreviate directed graph as graph. A path in a graph is a graph of the form and where all are distinct. We refer to and as the endpoints of the path.
A (rooted) tree is a tuple where is a graph such that any two vertices are connected by a unique path and is a vertex designated as a root. Because there is only one path between any two vertices, a path between and can be identified by the traversed vertices , or simply the two endpoints and . We say that is a parent of , and is a child of , if there is a path . We write so that obtains the set of children of . if We say that is an ancestor of , and is a descendant of , if there is a path . We write so that obtains the set of ancestors of ( for descendants).
2.2 Distributive Normal Forms: Intuition
The role of a dnf of a firstorder formula is analogous to that of a disjunctive normal form of a propositional formula in that the dnf of a formula is a disjunction of mutually exclusive possibilities. That we can exhaustively describe mutually exclusive possibilities in the firstorder setting is not obvious as the domain of quantification can be infinite and individuals in the domain can become related to one another as more individuals are considered. We start with an example to illustrate the basic problem and solution due to Hintikka.
Consider a firstorder theory with one binary predicate , where is infix for “x is less than y”, for describing individuals and their order relations with one another. We can look at what the normal form of the statement “every individual has an individual that is smaller than it”, encoded in this language as
could be. Assuming that we have a constant that names each element in the domain of quantification, a first attempt would be to translate each into a conjunction (over the domain of individuals) and each into a disjunction (over the domain of individuals), and use a propositional normal form. That is, we convert the result of translating the quantifiers away
into disjunctive normal form. Unfortunately, the domain of quantification can be infinite, so the resulting formula may be of infinite size. The “trick” for circumventing this is to enumerate how the predicates at hand can describe the relationships between individuals (uniformly in ) instead of enumerating tuples of individuals. We can then identify possible kinds of worlds by listing which kinds of individuals exist or not.
To see how this works, we rewrite the original statement as
(In words, it is impossible to find an individual that does not have an individual that is less than it.) In this form, we can think of the normal form of a statement with quantification as describing whether kinds of individuals with certain relations to one another exist or not. In order to exhaust all the possibilities, we need to consider all the cases in which and can related to one another that are consistent with the original formula.
We can see this better in our specific example by introducing notation that enumerates all descriptions of one and two free individual variables describable by the predicate . When there is one free individual variable , the only possibility is to relate to itself as below
where says that is not less than itself and says that is less than itself. When there are two free individual variables and , we have
where the subscript indexes each . For example,
We enumerate all combinations of whether such individuals exist or not next.
The possible kinds of worlds described by our original formula is then any
that implies the original formula. When we introduce dnfs more formally (Section 2.3), we will see that the possible kinds of worlds are constituents.
The example deserves some remarks. First, note that we really have exhaustively enumerated all the mutually exclusive possibilities. The possibility describes one extreme where there are no individuals (and hence the original statement is vacuously true),^{12}^{12}12Note that traditional presentations of firstorder model theory disallow empty domains although this restriction is not necessary. On the syntactic side, we will need to modify proof rules (, the rule used in converting formula to prenex normal form no longer holds) to maintain soundness and completeness. the possibility requires individuals to be less than themselves, and the possibility enables every kind of individual to exist with respect to . Second, note that the number of possibilities even in this small example (two individuals and one predicate) is enormous at . The astronomical number of constituents is not an issue for theoretical purposes although it does render the straightforward application of the theory to be unfeasible.
2.3 Distributive Normal Forms: Background
Define the set
An element of is a conjunction of every or its negation.
Let denote the set of all atomic formula (, a predicate applied to a tuple of terms) involving the free individual terms (, constants or variables) . Let denote the subset of that mentions at least once.
Attributive constituents
An attributive constituent with free individual terms of depth is an element of . We write for the set of all attributive constituents with free individual terms of depth . By convention, we set . An attributive constituent of depth is a formula of the form
where , each , and each . The subscript indexes the attributive constituent and can be identified with the string . Let be an index set for attributive constituents with free individual terms of depth . We have . The superscript indicates the depth of the formula, , the maximal number of nested quantifiers in the formula. Hence a depth of indicates that there are no quantifiers.
The set of attributive constituents of depth is defined by induction on . More concretely, we have an attributive constituent with free individual terms of depth has the form
where we will explain the undefined notation below. Let be an index set for attributive constituents of depth with free individual terms . The subscript is a pair of and a function indicating whether the appropriately indexed attributive constituent (of depth with free individual terms) exists or not. When the indices do not matter, we will abbreviate as . When we refer to two distinct attributive constituents whose indices do not matter, we will overload the subscripts as in and to distinguish them.
An attributive constituent with free individual terms of depth can equivalently be defined as
where is the index set restricted to the positive ones as given by the function .
Constituents
A constituent with free individual terms of depth is a formula of the form
where . Let be the set of constituents of depth with free individual terms. By convention, we set . We write for the set indexing . We use the same abbreviation scheme for the indices of constituents as we did for attributive constituents. Note that a constituent is an attributive constituent with an additional . Thus attributive constituents and constituents can be identified when there are free individual terms.
Distributive normal forms
A distributive normal form (dnf) with free individual terms is a disjunction of constituents
for some subset of constituents.
Properties
Attributive constituents, constituents, and dnfs have the following useful properties (Hintikka, 1965). [Existence, mutual exclusion, and exclusivity]

[noitemsep]
 Existence

Every formula (of depth ) has a distributive normal form (of depth ), , there is a function such that
where is the set of wellformed firstorder sentences with free individual terms .
 Mutual exclusion

Any two constituents and attributive constituents of the same depth are mutually exclusive, , for any .
 Expansion

Every constituent can be written as a disjunction of its expansion constituents, , there is a function such that
Any is said to refine or is a refinement of .^{13}^{13}13The original terminology that Hintikka uses is subordinate. We prefer the term refinement because it evokes the intuition that describes the possibility described by in finer detail. We write when refines . Let
Then the refinement relation is a partial order and is a poset.
It is wellknown that validity of firstorder formulas is undecidable. Consequently, the consistency of constituents in a dnf is undecidable. There is a weaker notion called trivial inconsistency that is decidable. There are several notions of trivial inconsistency (, see Hintikka, 1973; Nelte, 1997), although the exact form is not important for our purposes. [Completeness] An attributive constituent is inconsistent iff all of its expansions at some depth are trivially inconsistent (Hintikka, 1965). Thus, an inconsistency at depth will eventually manifest itself as trivially inconsistent at some depth , although the depth is not recursively computable.^{14}^{14}14There are notions of trivial inconsistency that are not strong enough to ensure completeness as noted by Nelte (1997). The main idea is show that a consistent attributive constituent always has an expansion that is not trivially inconsistent; the result follows from an application of Kőnig’s tree lemma.
3 Representing Beliefs in Mathematical Knowledge
In this section, we introduce a representation that assigns probabilities to the exhaustive and mutually exclusive possibilities of firstorder logic that we have just seen to be constituents. More concretely, we formalize a method for assigning weights to constituents and an appropriate Bayesian update following the idea of assigning weights to constituents described by Hintikka (1970, pg. 274–282) (Section 3.1
). The representation induces a probability distribution on the validity of firstorder statements that does not enforce that logically equivalent statements are assigned the same probability so that the beliefs of agents that are not logically omniscient
^{15}^{15}15The problem of logical omniscience is an issue encountered in epistemic logic (, see Sim, 1997; Fagin et al., 2004; Halpern and Pucella, 2011) where we reason about the knowledge of agents. One solution for weakening logical omniscience involves modeling impossible possible worlds, , modeling worlds that an agent considers possible but are eventually revealed to be impossible. Hintikka argues that dnfs provide a model of impossible possible worlds—an impossible possible world is an inconsistent constituent that is not trivially inconsistent at some depth and revealed to be trivially inconsistent at a later depth (by completeness, Hintikka, 1979). Thus the application of dnfs to address the problem of logical omniscience has also been hinted at by Hintikka. can be encoded (Section 3.2). At the end of the section, we identify an embedding space—a Hilbert space—for firstorder statements based on the probabilistic representation where mutual exclusion in logic translates into orthogonality in the space (Section 3.3).[Simple firstorder languages] For simplicity, we restrict attention to firstorder languages with a finite number of predicates, no function symbols, and no constants unless stated otherwise.^{16}^{16}16As a reminder, the effect of equality is to give an exclusive interpretation of quantifiers. All the results that hold on constituents in firstorder logic without equality also hold on constituents in firstorder logic with equality with the appropriate modifications. Note that functions can be encoded as predicates in firstorder logic with equality. Observe also that the current setting actually admits a finite number of constants. More concretely, we can associate each constant with a monadic predicate where the interpretation of is “ is the constant ”. Any formula that refers to the constant can thus be translated to where is not free in and we add the additional axiom to the theory. Hence, we are roughly working with firstorder languages with a finite number of predicates, functions, and constants. The constantfree restriction simplifies the form of constituents and dnfs we will need to consider. As a reminder, every firstorder formula with free individual terms has a dnf of depth constituents. In a constantfree setting, the free individual terms are all variables. Thus the original formula is equivalent to its universal closure , which is a formula with free individual terms (, a sentence). Consequently, we only need to consider the set of constituents , abbreviated , of depth with free individual terms. We have that by convention.
3.1 Hintikka Trees
We formalize a representation that assigns probabilities to constituents in this section. As the set of constituents at any depth exhaust and describe all mutually exclusive possibilities at that depth, the idea behind the representation is the standard one: list all possibilities and assign weights to them that sum to one. We construct the representation in two parts. First, we introduce a refinement tree that keeps track of the refinement relation because constituents of different depths do not denote mutually exclusive possibilities when they are related according to the refinement partial order (Section 3.1.1). Second, we describe how to assign weights to the refinement tree which completes the static representation of an agent’s beliefs (Section 3.1.2). After we introduce the representation, we introduce dynamics via a renormalization operator which can be interpreted as a form of Bayesian update for beliefs (Section 3.1.3).
3.1.1 Refinement tree
Let the set of vertices be the set of constituents of any depth . Let the set of edges consist of the refinement relation omitting reflexive relations. Then is a graph that encodes the refinement relation (minus the reflexive edges).
The graph is not a tree because the expansions of two distinct constituents can share refining constituents, although the shared constituents are necessarily inconsistent. Suppose . If , then is inconsistent. Assume additionally for the sake of contradiction that is consistent. Then there exists a structure such that . Thus we have that and (because by another assumption), which contradicts that and are mutually incompatible (by exclusivity in Proposition 2.3).
By the proposition above, we can associate any shared constituent that is a refinement of two parent constituents to either parent constituent and disassociate it with the other without changing the consistency of either parent constituent. In other words, we can remove one edge. We can use this observation to convert into a tree. We call a refinement tree where is the set of edges obtained after the pruning procedure described above is applied. Throughout the rest of this paper, we will assume that we have chosen one such pruning and will write as . We will also overload to refer to the pruned refinement partial order. We have the following obvious relationships between the (pruned) refinement partial order and (pruned) refinement tree.

[noitemsep]

iff .

iff or (equivalently ).
Straightforward.
A path between constituents and is the sequence of constituents and their refinements . Because there is only one path between any two vertices in a tree, we can identify a constituent (, a node in a refinement tree) with the path taken through a refinement tree starting at the root node to reach it.
3.1.2 Assigning weights
We assign weights to constituents by attaching a weight to each node of the refinement tree. Because the assignment of weights needs to respect the refinement partial order, we will need a notion of coherence between the weight assignments to adjacent levels of the refinement tree. A Hintikka tree (HT) is a tuple where is a refinement tree and is a function on constituents satisfying

[noitemsep]
 Unitial initial beliefs

; and
 Coherently constructed

.^{17}^{17}17The method of assigning weights in a HT is slightly different than the one described in prose by Hintikka (1970, pg. 274–282). In particular, Hintikka combines the statics and dynamics of the weight assignment whereas we separate them out and only describe the statics here. We will discuss the dynamics in Section 3.1.3.
We will abbreviate a HT as . We write for the set of HTs defined with respect to the the firstorder simple language . The first condition states that we start off with unitial beliefs. The second condition enforces that the probability that we assign a constituent is contained entirely within the subtree of the refinement tree rooted at . Hence the assignment of weights is conserved across depth. Observe that the assignment of weights to constituents is not constrained by the “fraction” of models that the constituents are satisfiable in. If it were, then the induced distribution on the validity of firstorder statements would enforce logical omniscience.
[Normalization] The beliefs assigned to constituents at each depth by a HT are normalized:
We proceed by induction on . The base case follows from unitial initial beliefs. In the inductive case, we have to show that
We have that
where the first equality is a rearrangement and the second equality follows because is coherently constructed. The result follows as we have by the inductive hypothesis. [Infinite supported path] For any HT , there is a chain of constituent such that for any in the chain. We proceed by induction on . The base case follows by unitial initial beliefs of . In the inductive case, we have that . The result follows as is coherently constructed and has a finite number of children so there must exist a refinement such that .
We end with several examples of HTs. Figure 3 gives an illustration of an example HT. As required by the definition, beliefs across depth are coherent. A HT is an uninformative Hintikka tree if
is a uniform distribution at every depth, ,
for any . A HT is a depth Hintikka tree if is constrained so that inconsistent constituents are assigned .^{18}^{18}18The terminology is inspired by depth information (Hintikka, 1970). Observe that there are consistent constituents at every depth and that consistent constituents have consistent refinements by the constituent completeness theorem so that a depth HT is welldefined. For example, the sentence is logically valid at depth . Inconsistency is undecidable so that a depth HT is not computable. If a theorem proving agent represents mathematical knowledge with a depth HT, then the agent is logically omniscient. We have that iff for some depth HT .A HT provides a static representation of an agent’s beliefs. Naturally, an agent may encounter a situation where it realizes that its beliefs need to be revised. For example, upon further inspection of all the expansions of a parent constituent, the agent may realize that they are all inconsistent so the belief in the parent constituent should be eliminated and redistributed to other constituents. Intuitively, this may occur because the increase in depth corresponds to the construction of an object (, an introduction of an existential) and the consideration of this extra object changes the valuation of the consistency of the parent possibility. Indeed, such a situation arises from the constituent completeness theorem: inconsistent constituents are eventually revealed to be trivially inconsistent at some depth even if they are not trivially inconsistent at shallower depths. We turn our attention to the dynamics of belief revision in the representation now.
3.1.3 Renormalization dynamics
Hintikka (1970, pg. 281) describes a method of redistributing weights assigned to a refinement tree when belief in a node and all of its descendants is lost. The intuition for the update follows Bayesian “refute” and “rescale” dynamics: when belief in a node and all of its descendants is eliminated so that those possibilities are “refuted”, the beliefs in the smallest subtree containing that node that still has positive weight are “rescaled” appropriately. In this section, we formalize this intuition as a renormalization operation. Towards this end, we will specify (1) which constituents to redistribute beliefs to and (2) the amount of belief to redistribute to those constituents.
Part one of renormalization
We start with the first task and begin by identifying which constituents to redistribute beliefs to when we discontinue beliefs in in a HT . Define the function as (1) if there is some such that for and and (2) otherwise. Define the support function as
The idea is that we will transfer beliefs assigned to unsupported constituents over to the appropriate supported constituents. Define the abbreviations and . Thus and partition . Define a redistribution point as
which is the closest (, deepest by depth) ancestor constituent that has supported descendants. A redistribution point identifies a vertex of the refinement tree that has supported descendants to redistribute beliefs in unsupported constituents to.
Part two of renormalization
We turn our attention towards the second task concerning the amount of belief to redistribute to each constituent now. Let be the children of that are supported. Then
is the positive renormalization constant and
is the total renormalization constant.