# Probabilities on Sentences in an Expressive Logic

Automated reasoning about uncertain knowledge has many applications. One difficulty when developing such systems is the lack of a completely satisfactory integration of logic and probability. We address this problem directly. Expressive languages like higher-order logic are ideally suited for representing and reasoning about structured knowledge. Uncertain knowledge can be modeled by using graded probabilities rather than binary truth-values. The main technical problem studied in this paper is the following: Given a set of sentences, each having some probability of being true, what probability should be ascribed to other (query) sentences? A natural wish-list, among others, is that the probability distribution (i) is consistent with the knowledge base, (ii) allows for a consistent inference procedure and in particular (iii) reduces to deductive logic in the limit of probabilities being 0 and 1, (iv) allows (Bayesian) inductive reasoning and (v) learning in the limit and in particular (vi) allows confirmation of universally quantified hypotheses/sentences. We translate this wish-list into technical requirements for a prior probability and show that probabilities satisfying all our criteria exist. We also give explicit constructions and several general characterizations of probabilities that satisfy some or all of the criteria and various (counter) examples. We also derive necessary and sufficient conditions for extending beliefs about finitely many sentences to suitable probabilities over all sentences, and in particular least dogmatic or least biased ones. We conclude with a brief outlook on how the developed theory might be used and approximated in autonomous reasoning agents. Our theory is a step towards a globally consistent and empirically satisfactory unification of probability and logic.

## Authors

• 50 publications
• 1 publication
• 6 publications
• 1 publication
• ### Probability Distributions Over Possible Worlds

In Probabilistic Logic Nilsson uses the device of a probability distribu...
03/27/2013 ∙ by Fahiem Bacchus, et al. ∙ 0

• ### Of Starships and Klingons: Bayesian Logic for the 23rd Century

Intelligent systems in an open world must reason about many interacting ...
07/04/2012 ∙ by Kathryn Blackmond Laskey, et al. ∙ 0

• ### Exploiting Uncertain and Temporal Information in Correlation

A modelling language is described which is suitable for the correlation ...
02/06/2013 ∙ by John Bigham, et al. ∙ 0

• ### Asymptotic Logical Uncertainty and The Benford Test

We give an algorithm A which assigns probabilities to logical sentences....
10/12/2015 ∙ by Scott Garrabrant, et al. ∙ 0

• ### An Expressive Probabilistic Temporal Logic

This paper argues that a combined treatment of probabilities, time and a...
03/24/2016 ∙ by Bruno Woltzenlogel Paleo, et al. ∙ 0

• ### A Logic for Default Reasoning About Probabilities

A logic is defined that allows to express information about statistical ...
02/27/2013 ∙ by Manfred Jaeger, et al. ∙ 0

• ### Quantified Markov Logic Networks

Markov Logic Networks (MLNs) are well-suited for expressing statistics s...
07/03/2018 ∙ by Víctor Gutiérrez-Basulto, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### Motivation.

Sophisticated computer applications generally require expressive languages for knowledge representation and reasoning. In particular, such languages need to be able to represent both structured knowledge and uncertainty [Nil86, Hal03, Mug96, DK03, RD06, Háj01, Wil02]. A suitable language for this purpose is higher-order logic [Chu40, Hen50, And02, Llo03, vBD83, Lei94, Sha01], which admits higher-order functions that can take functions as arguments and/or return functions as results. This facility is convenient for probabilistic modeling since it means that theories can contain probability densities [Far08, Pfe07, GMR08]. In particular, many forms of probabilistic reasoning can be done in higher-order logic using the traditional axiomatic method: a theory can be written down which has the intended interpretation as a model and then conventional proof and computation techniques can be used to answer queries [NL09, NLU08]. While such a computational approach is effective, it is sometimes more natural to pose a problem as one where the probability of some sentences in the theory being true may be strictly less than one and/or the query sentence (and its negation) may not be a logical consequence of the theory. In such cases, deductive reasoning does not suffice for answering queries and it becomes necessary to use probabilistic methods [Par94, KD07, RD06, Mug96, MR07].

### Main aim.

These considerations lead to the main technical issue studied in this paper:

Given a set of sentences, each having some probability of being true,
what probability should be ascribed to other (query) sentences?

We build on the work of Gaifman [Gai64] whose paper with Snir [GS82] develops a quite comprehensive theory of probabilities on sentences in first-order Peano arithmetic. We take up these ideas, using non-dogmatic priors [GS82] and additionally the minimum relative entropy principle as in [Wil08a], but for general theories and in a higher-order setting. We concentrate on developing probabilities on sentences in a higher-order logic. This sets the stage for combining it with the probabilities inside sentences approach [NL09, NLU08].

### Summary of key concepts.

Section 2 introduces higher-order logic and its relevant properties. We use the higher-order logic (Definitions 2, 2, and 2) based on Church’s simple theory of types [Chu40, Hen50, And02]. We employ the Henkin semantics and make use of a particular class of interpretations, called separating interpretations (Definition 2).

Section 3 gives the definition of probabilities on sentences in higher-order logic (Definition 3), introduces the Gaifman condition, and develops some basic properties of such probabilities. Section 4 then introduces probabilities on interpretations and shows their close connection with probabilities on sentences. Gaifman [Gai64] (generalized in Definition 3 and Propositions 333) introduced a condition, called Gaifman in [SK66], that connects probabilities of quantified sentences to limits of probabilities of finite conjunctions. In our case, it effectively restricts probabilities to separating interpretations while maintaining countable additivity.

While generally accepted in probability theory (Definition

4), some circles argue that countable additivity (CA) does not have a good philosophical justification, and/or that it is not needed since real experience is always finite, hence only non-asymptotic statements are of practical relevance, for which CA is not needed. On the other hand, it is usually much easier to first obtain asymptotic statements which requires CA, and then improve upon them. Furthermore we will show that CA can guide us in the right direction to find good finitary prior probabilities.

Another principle which has received much less attention than CA but is equally if not more important is that of Cournot [Cou43, Sha06]: An event of probability (close to) zero singled out in advance is physically impossible; or conversely, an event of probability 1 will physically happen for sure. In short: zero probability means impossibility. The history of the semantics of probability is stony [Fin73]. Cournot’s “forgotten” principle is one way of giving meaning to probabilistic statements like, “the relative frequency of heads of a fair coin converges to 1/2 with probability 1”. The contraposition of Cournot is that one must assign non-zero probability to possible events. If “events” are described by sentences and “possible” means it is possible to satisfy these sentences, i.e. they possess a model, then we arrive at the strong Cournot principle that satisfiable sentences should be assigned non-zero probability (Definitions 3 and 4). This condition has been appropriately called ‘non-dogmatic’ in [GS82]. As long as something is not proven false, there is a (small) chance it is valid in the intended interpretation. This non-dogmatism is crucial in Bayesian inductive reasoning, since no evidence (however strong) can increase a zero prior belief to a non-zero posterior belief [RH11]. The Gaifman condition is inconsistent with the strong Cournot principle (Example 5), but consistent with a weaker version (Definition 3). Probabilities that are Gaifman and (plain, not strong) Cournot allow learning in the limit (Theorem 3 and Corollary 8).

A standard way to construct (general / Cournot / Gaifman) probabilities on sentences is to construct (general / non-dogmatic / separating) probabilities on interpretations, and then transfer them to sentences (Propositions 4, 4, and 4). At the same time we give model-theoretic characterizations of the Gaifman condition (Corollary 4) and the Cournot condition (Definition 4). In Section 5, we give a particularly simple construction of a probability that is Cournot and Gaifman (Theorem 5) and a complete characterization of general/Cournot/Gaifman probabilities (Theorems 5 and 5 and Corollary 5). We also give various examples of (strong) (non)Cournot and/or Gaifman probabilities and (non)separating interpretations for countable domains (Examples 5, 5, and 5) and finite domains (Examples 5, 5, 5, 5).

Section 7 considers the important practical situation of whether a real-valued function on a set of sentences can be extended to a probability on all sentences; a necessary and sufficient condition is given for this, as is a method for determining such probabilities using minimum relative entropy introduced in Section 6. Prior knowledge and data constrain our (belief) probabilities in various ways, which we need to take into account when constructing probabilities. Prior knowledge is usually given in the form of probabilities on sentences like “the coin has head probability 1/2”, or facts like “all electrons have the same charge”, or non-logical axioms like “there are infinitely many natural numbers”. They correspond to requiring their probability to be 1/2, extremely close to 1, and 1, respectively. It is therefore necessary to be able to go from probabilities on sentences to probability on interpretations (Proposition 4). This allows us to prove various necessary and sufficient conditions under which such partial probability specifications can be completed and what properties they have (Propositions 7 and 7). In particular we show that hierarchical probabilistic knowledge (Definitions 7) is always probabilistically consistent (Proposition 7). Further, seldom does knowledge constrain the probability on all sentences to be uniquely determined. In this case it is natural to choose a probability that is least dogmatic or biased [Nil86, Wil08a]. The minimum relative entropy (Definition 6) principle can be used to construct such a unique minimally more informative probability that is consistent with our prior knowledge (Definition 6 and Propositions 6 and 7).

Section 8 is a brief outlook on how the developed theory might be used and approximated in autonomous reasoning agents. In particular, certain knowledge, learning in the limit (8), the infamous black raven paradox, and the Monty Hall problem are discussed, but only briefly. The paper ends with a more detailed discussion in Section 9 of the broader context and motivation of this work, as well as related results in the literature, the outline of a framework for probabilistic reasoning and modeling in higher-order logic, and future research directions.

While some of the results presented in this paper are known in the first-order case and their extension to the higher-order case is straightforward, it nevertheless seems useful to provide a survey of this material (with proofs included). Also, many beautiful ideas in the long and technical paper by Gaifman [GS82] deserve wider attention than they have received. We hope our exposition helps to rectify this situation.

## 2 Logic

We review here a standard formulation of higher-order logic [And02] that is based on Church’s simple theory of types [Chu40]. Other references on higher-order logic include [Llo03, Far08, vBD83, Lei94, Sha01]. Some discussion of the interesting history of the simple theory of types is given in [And02, Far08].

The best way to think about higher-order logic is that it is the formalization of everyday informal mathematics: whatever mathematical description one might give of some situation, the formalization of that situation in higher-order logic is likely to be a straightforward translation of the informal description. In particular, higher-order logic provides a suitable foundation for mathematics itself which has several advantages over more traditional approaches that are based on axiomatizing sets in first-order logic. Furthermore, higher-order logic is the logical formalism of choice for much of theoretical computer science and also applications areas such as software and hardware verification. For a convincing account of the advantages of higher-order over first-order logic in computer science, see [Far08].

The logic presented here differs in a minor way from that in [And02] in that we omit the description operator , for reasons that are discussed later. All the results from [And02] that are used here also hold for the logic with omitted, by obvious changes to their proofs. In addition the notation for the logic used here differs somewhat from that in [And02], but the correspondences will always be clear. There are also a few differences in terminology here compared to [And02] that are noted along the way.

We begin with the definition of a type.

###### Definition (type α)

A type is defined inductively as follows.

1. is a type.

2. is a type.

3. If and are types, then is a type.

In this definition, is the type of the truth values, is the type of individuals, and is the type of functions from elements of type to elements of type . We use the convention that is right associative. So, for example, when we write we mean . A function type is a type of the form , for some and .

There is a denumerable list of variables of each type. The logical constants are , for each type . The denotation of equality is the identity relation between individuals of type . In addition, there may be other non-logical constants of various types. The alphabet is the set of all variables and constants.

Next comes the definition of a term.

###### Definition (term t)

A term, together with its type, is defined inductively as follows.

1. A variable of type is a term of type .

2. A constant of type is a term of type .

3. If is a term of type and a variable of type , then is a term of type .

4. If is a term of type and a term of type , then is a term of type .

A formula is a term of type . A closed term is a term with no free variables. A sentence is a closed formula. A theory is a set of formulas.

If the set of non-logical constants is countable, then the set of terms is denumerable. As shown in [And02, p.212], using equality, it is easy to define (truth), (falsity), (conjunction), (disjunction), (negation), (universal quantification), and (existential quantification). The axioms for the logic are as follows [And02, p.213]:

###### Axiom (logical axioms)
1. Truth values:

2. Leibniz’ law:

3. Extensionality:

4. -reduction:

In the above, , …are variables of the indicated type, is a syntactical variable for variables of type , and , …are syntactical variables for terms of the indicated type. Also is the result of simultaneously substituting for all free occurrences of in .

Axiom (1) expresses the idea the truth and falsity are the only truth values; Axioms (2) (for each type ) express a basic property of equality; Axioms (3) (for each type ) are the axioms of extensionality; and Axiom schemata (4) is the axiom for -reduction.

Here is the single rule of inference [And02, p.213]:

###### Rule (rule of inference; equality substitution)

From and , infer the result of replacing one occurrence of in by an occurrence of , provided that the occurrence of in is not (an occurrence of a variable) immediately preceded by a .

The logic also has an equational reasoning system that has been used as the computational basis for a functional logic programming language

[Llo03, NL09, NLU08, LN11].

In the following, to simplify the notation, we usually omit the type subscripts on terms; the type of a term will always either be unimportant or clear from the context. We use for sentences and sometimes for formulas, and for terms. With this notation, and .

The logic includes Church’s -calculus: a term of the form is an abstraction and a term of the form is an application.

The logic is given a conventional Henkin semantics [Hen50].

###### Definition (frame {Dα}α)

A frame is a collection of non-empty sets, one for each type , satisfying the following conditions.

1. .

2. is some collection of functions from to .

For each type , is a called a domain.

The members of are called the truth values and the members of are called individuals.

###### Definition (valuation V)

Given a frame , a valuation is a function that maps each constant having type to an element of such that is the function from into defined by

 V(=α→α→o)xy={Tif x=yFotherwise,

for .

###### Definition (variable assignment V)

A variable assignment with respect to a frame is a function that maps each variable of type to an element of .

An interpretation can now be defined.

###### Definition (interpretation ⟨{Dα}α,V⟩)

A pair is an interpretation if there is a function such that, for each variable assignment and for each term of type , and the following conditions are satisfied.

1. , where is a variable.

2. , where is a constant.

3. the function whose value for each is , where has type and is except .

4. .

If is an interpretation, then the function is uniquely defined. is called the denotation of with respect to and . If is a closed term, then is independent of and we write it as . Not every pair is an interpretation; to be an interpretation, every term must have a denotation with respect to each variable assignment.

What is called an interpretation here is called a general model in [And02], following Henkin. In [And02], a general model is called a standard model if, for each and , is the set of all functions from to . Moving from standard models to general models was the crucial step that allowed Henkin to prove the completeness of the logic [Hen50].

###### Definition (satisfiable)

Let be a formula, an interpretation, and a variable assignment with respect to .

1. satisfies in if .

2. is satisfiable in if there is a variable assignment which satisfies in .

3. is valid in if every variable assignment satisfies in .

4. is valid if is valid in every interpretation.

5. A model for a theory is an interpretation in which each formula in the theory is valid.

###### Definition (consistency)

A theory is consistent if cannot be derived from the theory.

###### Definition (logical consequence)

A formula is a logical consequence of a theory if is valid in every model of the theory.

We will have need for a particular class of interpretations, defined as follows.

###### Definition (separating interpretation/model)

An interpretation for an alphabet is separating if, for every pair , of closed terms of the same function type, say, , such that , there exists a closed term of type such that .

A separating model is a separating interpretation that is a model (for some set of formulas).

We emphasize that, in the definition of a separating interpretation, the closed term is formed only from symbols in the given alphabet. Intuitively, an interpretation is separating if, for every pair , of closed terms of the same type , whose respective denotations in the interpretation are different, there exists a closed term of type for which the respective denotations in the interpretation of and are different. Thus, in a separating interpretation, closed terms that have distinct functions as denotations must be distinct on an argument in the domain that is the denotation of some closed term using the given alphabet and thus is ‘accessible’ or ‘nameable’ via that term.

The concept of a separating interpretation is closely related to the concept of an extensionally complete theory that plays a crucial part in the proof of completeness [And02, p.248].

###### Definition (extensionally complete)

A set of sentences is extensionally complete if, for every pair , of closed terms of the same function type, say, , there exists a closed term of type such that is derivable from .

A connection with separating interpretations is provided by the following result.

###### Proposition (extensionally complete ⇒ separating)

Every model of an extensionally complete set of sentences is separating.

###### Proof

Let be a set of sentences that is extensionally complete and be a model for . Suppose that , is a pair of closed terms of the same function type, say, , such that . By extensional completeness, there exists a closed term such that is derivable from . Since is a model for and the proof system is sound, it follows that . Hence is separating.

Now we show that, if we are willing to expand the alphabet, any set of sentences having a model also has a separating model in an expanded alphabet.

###### Proposition (existence of separating models)

If a set of sentences has a model, then there exists an alphabet that includes the original alphabet and an interpretation based on the expanded alphabet which is a separating model for .

###### Proof

Since has a model, is consistent. By [And02, Theorem 5500], there is an expansion of the original alphabet and a set of sentences such that , is consistent, and is extensionally complete in the expanded alphabet. Since is consistent, by Henkin’s Theorem [And02, Theorem 5501], it has a model (based on the expanded alphabet). By Proposition 2, this model must be a separating one, and it is also a model for .

The most important property of the logic that we will need is compactness [And02, Theorem 5503].

###### Theorem (compactness)

If every finite subset of a set of sentences has a model, then has a model.

In fact, most of the development in the paper can be carried out in any logic that has the compactness property.

While the version of higher-order logic introduced in this section generally provides much more direct and succinct formalisations than first-order logic, for practical applications a number of extensions are highly desirable. Some of these extensions are nothing more than abbreviations, such as those used to introduce the connectives and quantifiers, and some are deeper. These extensions include many-sortedness, which allows more than one domain of individuals; tuples and product types; and type constructors and polymorphism. The logic of [Llo03], which is also used in [NL09, NLU08], includes all these extensions. These and other extensions are discussed in [Far08].

## 3 Probabilities on Sentences

We now define probabilities on sentences. They are not probabilities in the conventional sense of probability theory (on -algebras); however, a connection between probabilities on sentences and (conventional) probabilities on a -algebra on the set of interpretations will be made below.

###### Definition (probability on sentences)

Let be the set of all sentences (for some alphabet). A probability (on sentences) is a non-negative function satisfying the following conditions:

1. If is valid, then .

2. If is valid, then .

For a sentence , where , one can define the conditional probability by

 μ(φ|ψ)=μ(φ∧ψ)μ(ψ),

for each sentence .

A probability on sets of sentences has the following intended meaning:

For a sentence , is the degree of belief that is true.

###### Definition (pairwise disjoint sentences)

The sentences are pairwise disjoint if, for each such that , is valid.

###### Proposition (properties of probability on sentences)

Let be a probability on sentences. Then the following hold:

1. , for each .

2. , for each .

3. If is unsatisfiable, then .

4. If is valid, then .

5. If is valid, then .

6. If is a finite subset of pairwise disjoint
sentences in , then .

7. If is a finite subset of , then .

8. The following are equivalent:
(a) For each , implies is valid.
(b) For each , implies is unsatisfiable.

9. If , then is a probability.

10. .

###### Proof

The proof is elementary and standard, and only included for completeness.

1. Since is valid, . Also, since is valid, . Thus .

2. Since , we have that .

3. Note that is unsatisfiable iff is valid. Thus , so that .

4. Note first that is valid iff is valid. Thus . Hence .

5. This follows immediately from Part 4.

6. The proof is by induction on . When the result is obvious. Assume now the result is true for . Note that is valid and so is valid. Then

 μ(⋁ni=1φi) = μ(φ1∨⋁ni=2φi) = μ(φ1)+μ(⋁ni=2φi) [¬(φ1∧⋁ni=2φi) is valid] = μ(φ1)+∑ni=2μ(φi) [induction hypothesis] = ∑ni=1μ(φi).

7. The proof is by induction on . When the result is obvious. Assume now the result is true for . Then

 μ(⋁ni=1φi) = μ((φ1∧¬⋁ni=2φi)∨⋁ni=2φi) = μ(φ1∧¬⋁ni=2φi)+μ(⋁ni=2φi) ≤ μ(φ1)+∑ni=2μ(φi) [Part 4 and induction hypothesis] = ∑ni=1μ(φi).

8. Suppose that, for each , implies is valid. Now let satisfy . By Part 1, . Thus is valid and so is unsatisfiable.

Conversely, suppose that, for each , implies is unsatisfiable. Now let satisfy . By Part 1, . Thus is unsatisfiable and so is valid.

9. Suppose that is valid. Then .

Suppose that is valid. Then

 μ(φ∨χ|ψ) = μ((φ∨χ)∧ψ) / μ(ψ) = μ((φ∧ψ)∨(χ∧ψ)) / μ(ψ) = [μ(φ∧ψ)+μ(χ∧ψ)] / μ(ψ) [¬((φ∧ψ)∧(χ∧ψ)) is valid] = μ(φ|ψ)+μ(χ|ψ).

Thus is a probability.

10. Let . Then

 μ(φ∨ψ)+μ(φ∧ψ) = μ(φ∨χ)+μ(φ∧ψ) [elementary logic] = μ(φ)+μ(χ)+μ(φ∧ψ) [¬(φ∧χ) is valid and Def. 3.2] = μ(φ)+μ(χ∨(φ∧ψ)) [¬(χ∧(φ∧ψ)) is valid and Def. 3.2] = μ(φ)+μ(ψ) [elementary logic]

Next we introduce Gaifman probabilities.

###### Definition (Gaifman probability)

Let be a probability on sentences. Then is Gaifman if

 μ(r=s)=inf{t1,...,tn}μ(n⋀i=1((rti)=(sti))),

for every pair and of closed terms having the same function type, say, , and where ranges over all finite sets of closed terms of type .

###### Proposition (Gaifman probability)

Let be a probability on sentences. Then the following are equivalent.

1. is Gaifman.

2. ,
for every pair and of closed terms having the same function type, say, , and where ranges over all finite sets of closed terms of type .

3. ,
for every formula having a single free variable of type , say, and where ranges over all finite sets of closed terms of type .

4. ,
for every formula having a single free variable of type , say, and where ranges over all finite sets of closed terms of type .

###### Proof

1. implies 2. Suppose that the probability is Gaifman. Then

 μ(r≠s) = 1−μ(r=s) = 1−inf{t1,...,tn}μ(⋀ni=1((rti)=(sti))) = 1−inf{t1,...,tn}μ(¬⋁ni=1((rti)≠(sti))) = 1−inf{t1,...,tn}(1−μ(⋁ni=1((rti)≠(sti))) = sup{t1,...,tn}μ(⋁ni=1((rti)≠(sti))).

Hence 2. holds.

2. implies 3. Suppose that 2. holds. Then

 μ(∃x.φ) = μ(λx.φ≠λx.F) = sup{t1,...,tn}μ(⋁ni=1((λx.φti)≠(λx.Fti))) = sup{t1,...,tn}μ(⋁ni=1φ{x/ti}).

Hence 3. holds.

3. implies 4. Suppose that 3. holds. Then

 μ(∀x.φ) = μ(¬∃x.¬φ) = 1−μ(∃x.¬φ) = 1−sup{t1,...,tn}μ(⋁ni=1¬φ{x/ti}) = 1−sup{t1,...,tn}μ(¬⋀ni=1φ{x/ti}) = 1−sup{t1,...,tn}(1−μ(⋀ni=1φ{x/ti})) = inf{t1,...,tn}μ(⋀ni=1φ{x/ti}).

Hence 4. holds.

4. implies 1. Suppose that 4. holds. Then

 μ(r=s) = μ(∀x.((rx)=(sx))) [Axioms of Extensionality] = inf{t1,...,tn}μ(⋀ni=1((rx)=(sx)){x/ti}) = inf{t1,...,tn}μ(⋀ni=1((rti)=(sti))).

Hence 1. holds.

###### Proposition (limits for countable alphabet)

Let the alphabet be countable, a probability on sentences, and a formula having a single free variable of type .

1. ,

where, on the LHS, ranges over all finite sets of closed terms of type and, on the RHS, is an enumeration of all closed terms of type .

###### Proof

Since the alphabet is countable, the set of all closed terms of type is countable and hence can be enumerated.

1. Let be a subset of closed terms of type . Let be sufficiently large so that each , for , appears in the enumeration of the first terms of an enumeration of all closed terms of type .

Then is valid, so that

 μ(⋁mj=1φ{x/t′j})≤μ(⋁ni=1φ{x/ti}),

by Proposition 3.4. By first taking the supremum on the RHS and then the supremum on the LHS we get

 sup{t′1,...,t′m}μ(⋁mj=1φ{x/t′j})≤supnμ(⋁ni=1φ{x/ti}).

Conversely we have

 sup{t′1,...,t′m}μ(⋁mj=1φ{x/t′j})≥μ(⋁ni=1φ{x/ti}).

since the sup on the LHS includes . Now taking the limit and combining both inequalities gives equality. Proposition 3.4 gives that ; hence is monotone non-decreasing in , which allows the replacement of by .

2. The proof is similar.

We can reduce the class of terms that is necessary to “browse” through even further, by considering only one term from each equivalence class, where two terms and are equivalent iff is valid.

###### Proposition (Gaifman for countable alphabet)

Let the alphabet be countable and a probability on sentences. Then the following are equivalent.

1. is Gaifman.

2. ,
for every pair and of closed terms having the same function type, say, , and where is an enumeration of all closed terms of type .

3. ,
for every pair and of closed terms having the same function type, say, , and where is an enumeration of all closed terms of type .

4. ,
for every formula having a single free variable of type , say, and where is an enumeration of all closed terms of type .

5. ,
for every formula having a single free variable of type , say, and where is an enumeration of all closed terms of type .

In each case, the enumeration of closed terms of type can be reduced to one where a single representative is chosen from each equivalence class under the equivalence relation and are equivalent if is valid.

###### Proof

Two terms and are said to be equivalent iff is valid, which implies is valid. This allows us to relax in the proof of Proposition 3 ‘appears’ by ‘is equivalent to some term in’ and ‘includes’ by ‘includes a term equivalent to some term in’. Finally combine this with Proposition 3 and Definition 3.

While these forms of the Gaifman condition closely resemble the continuity condition (countable additivity (CA) axiom) in measure theory, we will see that CA over (general) interpretations is derived from the compactness theorem and not from the Gaifman condition (see Definition 4 and Proposition 4 in the next section). But the Gaifman condition confines probabilities to separating interpretations while preserving CA (Propositions 4 and 4).

###### Example (natural numbers Nat)

Consider the standard type Nat of natural numbers, as the type of individuals, and the usual Peano axioms. Let be the constant of type Nat whose denotation is the natural number 0, and be the term of type Nat whose denotation is the natural number , where is a constant of type whose denotation is the successor function. In practice one usually defines denumerably many constants , one for each natural number, directly. Further, let be functions with their usual axioms and meaning. Now there are many closed terms that represent the same natural number. For instance , , , are different terms, all having the number as denotation. For type Nat, it is sufficient to choose in Proposition 3.4, and so the condition in Definition 3 (indeed) reduces to the one used by Gaifman [GS82].

Of particular interest are probabilities that are strictly positive on satisfiable sentences since this is a desirable property of a prior. This suggests the following definition.

###### Definition (strongly Cournot probability)

A probability is strongly Cournot if, for each , is satisfiable implies .

By Part 8 of Proposition 3, a probability is strongly Cournot iff, for each , is not valid implies , or, by contraposition, implies is valid. This is akin to Cournot’s principle as discussed in the introduction that an event of probability 1 singled out in advance will happen for sure in the real world. We will see this general idea plays an important role for inductive inference.

However, the following weaker form of the Cournot principle will turn out to be more useful.

###### Definition (Cournot probability)

A probability is Cournot if, for each , has a separating model implies .

Clearly a strongly Cournot probability is Cournot. It will be the Cournot probabilities (not the strongly Cournot ones) that will be of most interest in the subsequent development. The major reasons for this are as follows. First, Theorem 5 below shows that, if the alphabet is countable, there exists a probability on sentences that is Cournot and Gaifman. Such a probability makes a good prior. Second, the Cournot and Gaifman conditions are necessary and sufficient to do learning in the limit of universal hypotheses as the following theorem shows and as discussed in more detail in Section 8.

###### Theorem (confirming universal hypotheses)

Let the alphabet be countable, a probability on sentences, a formula having a single free variable of some type , an enumeration of (representatives of) all closed terms of type . Then

 μ(∀x.φ|n⋀i=1φ{x/ti})n→∞⟶1⇔μ(n⋀i=1φ{x/ti})n→∞⟶μ(∀x.φ)>0

If the left hand side (hence also the r.h.s.) holds, we say that can confirm universal hypothesis . It also holds that

 μ can confirm all universal hypotheses that have a % separating model⇔μ is Gaifman and Cournot

###### Proof

 limn→∞μ(∀x.φ|⋀ni=1φ{x/ti}) = μ(∀x.φ)limn→∞μ(⋀ni=1φ{x/ti}) [∀x.φ→n⋀i=1φ{x/ti}] = μ(∀x.φ)μ(∀x.φ) [n⋀i=1φ{x/ti})n→∞⟶μ(∀x.φ)] = 1 [μ(∀x.φ)>0]

As can be seen from the proof, if one or both of the conditions fail, then does not converge to 1.

For the bottom we abbreviate the statements

 L(φ) :=[μ(∀x.φ|⋀ni=1φ{x/ti})n→∞⟶1] G(φ) :=[μ(⋀ni=1φ{x/ti})n→∞⟶μ(∀x.φ)] S(φ) :=[∀x.φ has a separating model] A(φ) :=[μ(∀x.φ)>0]

In this notation, the top reads iff and .

Assume is Gaifman and Cournot and . This implies and . By we get . We have shown that for any , if is Gaifman and Cournot, then implies .

Case 1 [ is true] Then by assumption, . Then by we get and . Note that every sentence can be written as with being a formula having a single free variable . Therefore, for all that have a separating model. Hence is Cournot.
Case 2 [ is false] That is, has no separating model, therefore must have (at least one) separating model, say . Since is a separating model of , Definition 2 implies that there exists a closed term such that is also a separating model of . Now

 μ(∀x.φ)+μ(χ) = μ(∀x.φ∨χ) [∀x.φ and χ are disjoint] = μ(∀x.(φ∨χ)) [x is not free in χ] = limnμ(⋀ni=1(φ∨χ){x/ti}) [since S(φ∨χ), Case 1 implies G(φ∨χ)] = limnμ(⋀ni=1φ{x/ti}∨χ)) [x is not free in χ] = limnμ(⋀ni=1φ{x/ti})+μ(χ) [t=ti for some i, and φ{x/t}∧χ false]

This proves for false.

Case 1 and 2 together prove for all , hence is Gaifman.

## 4 Probabilities on Interpretations

We now study probabilities defined on sets of interpretations.

Consider the set of interpretations (for the alphabet). A Borel -algebra can be defined on . For that, a topology needs to be defined first. Given some alphabet, let denote the set of sentences based on the alphabet. For each sentence , let denote the set

 {I∈I|φ is valid in I}.

Consider the set . Since is closed under finite intersections, it is a basis for a topology on . is also an algebra, since it is closed under complementation and finite unions, and