Asymptotic Logical Uncertainty and The Benford Test

10/12/2015
by   Scott Garrabrant, et al.
0

We give an algorithm A which assigns probabilities to logical sentences. For any simple infinite sequence of sentences whose truth-values appear indistinguishable from a biased coin that outputs "true" with probability p, we have that the sequence of probabilities that A assigns to these sentences converges to p.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/27/2013

Probability Distributions Over Possible Worlds

In Probabilistic Logic Nilsson uses the device of a probability distribu...
09/12/2016

Logical Induction

We present a computable algorithm that assigns probabilities to every lo...
05/19/2020

Proving P!=NP in first-order PA

We show that it is provable in PA that there is an arithmetically defina...
09/12/2012

Probabilities on Sentences in an Expressive Logic

Automated reasoning about uncertain knowledge has many applications. One...
10/28/2012

Illustrating a neural model of logic computations: The case of Sherlock Holmes' old maxim

Natural languages can express some logical propositions that humans are ...
02/22/2020

Extracting and Validating Explanatory Word Archipelagoes using Dual Entropy

The logical connectivity of text is represented by the connectivity of w...
01/19/2021

Paraconsistent Foundations for Quantum Probability

It is argued that a fuzzy version of 4-truth-valued paraconsistent logic...

1 Introduction

Let be a simple enumeration of all sentences in first order logic over ZFC. The goal of logical uncertainty is to construct an algorithm which on input outputs a probability , which represents the probability that is true Christiano:2014a; Demski:2012; Gaifman:2004; Soares:2015g.111The problem has also been studied in the case where we don’t require computability even in the limit Gaifman:1964; Hutter:2013; Scott:1966. The problem was first studied in the context of measures on Boolean algebras Horn:1948; Kelley:1959; Maharam:1947.

This notion of probability does not refer to random variables. It refers to the degree of uncertainty that one might have about logical sentences whose truth-values have not been calculated.

Much work has been done on a related problem where on input outputs an infinite sequence of numbers and is defined to be the limit of the sequence output by on input Christiano:2014a; Demski:2012; Soares:2015. In this case, is not computable, and can easily be for all provable and for all disprovable , so all of the work is in figuring out how should behave when is independent of ZFC.

In this paper, we take a different approach, which we call asymptotic logical uncertainty. We require that be computable and have runtime bounded by some function of .

We propose as a baseline that any method of quickly assigning probabilities should be able to pass a test we call the Benford test. Consider the infinite sequence of sentences given by “The first digit of is a 1.” We say that passes the Benford test if

as prescribed by Benford’s law. More generally, we say that passes the generalized Benford test if it converges to the correct probability on any similar infinite sequences whose truth values appear indistinguishable from independent flips of a biased coin. We then give an algorithm which passes the generalized Benford test.

Logical uncertainty is one aspect of the problem of combining probability and logic, of which statistical relational learning is another Getoor:2007. Statistical relational learning addresses the problem of representing probabilistic models with logical structure, including regularities such as repeated entities and other complexities such as uncertainty about the number of entities. In contrast, logical uncertainty deals with uncertainty about logic. As Paul Christiano put it: “any realistic agent is necessarily uncertain not only about its environment or about the future, but also about the logically necessary consequences of its beliefs.” Christiano:2014a

2 The Benford Test

Benford’s law states that in naturally occurring numbers, the leading digit of that number in base 10 occurs with probability . Many mathematical sequences have been shown to have frequencies of first digits that satisfy Benford’s law Pietronero:2001. In particular, the frequencies of the first digits of powers of provably satisfy Benford’s law.

The function is defined by , , and . Throughout the paper, let be an increasing time complexity function in the range of for some fixed , and let .

Consider the sequence . Clearly this sequence only contains powers of 3. We might hypothesize that the frequencies of the first digits in this sequence also satisfy Benford’s law. However, is very large, and first digit of is probably very difficult to compute. It is unlikely that the first digit of will ever be known.

If asked to quickly assign a probability to the sentence “The first digit of is a 1,” for some , the only reasonable answer would be . Note that is either true or false; there are no random variables. The probability here represents a reasonable guess in the absence of enough time or resources to compute .

Definition 2.1.

Let

be a Turing machine which on input

runs in time and outputs a probability , which represents the probability assigned to . We say that passes the Benford test if

where “The first digit of is a 1.”

It is easy to pass the Benford test by hard-coding in the probability. It is more difficult to pass the Benford test in a natural way. That the best probability to assign to is depends not only on the fact that the frequency with which is true tends toward , but also on the fact that the sequence of truth-values of contains no patterns that can be used to quickly compute a better probability on some subsequence. We therefore assume that this sequence of truth-values is indistinguishable from a sequence produced by a coin that outputs “true” with probability . Formally, we are assuming that is an irreducible pattern with probability , as defined in the next section.

3 Irreducible Patterns

Fix a universal Turing machine and an encoding scheme for machines, and let denote running the machine to simulate with input .

Definition 3.1.
222We tailored this definition of irreducible pattern to our needs. The theory of algorithmic randomness may offer alternatives. However, algorithmic randomness generally considers all computable tests and focuses on the case where Ko:1986; Martin-Lof:1966; Downey:2010. We believe that any reasonable definition inspired by algorithmic randomness would imply Definition 3.1.

Let be an infinite subset of natural numbers such that is provable or disprovable for all , and there exists a Turing machine such that runs in time and accepts if and only if .

We say that is an irreducible pattern with probability if there exists a constant such that for every positive integer and every Turing machine expressible in bits, if

has at least elements and is the probability that is provable when is chosen uniformly at random from the first elements of , we have

The intuition behind the formula is that the observed frequency for any sequence we select should not stray far from . The right hand side of the inequality needs to shrink slowly enough that a true random process would stay within it with probability 1 (given choice of sufficiently large to accommodate initial variation). The law of the iterated logarithm gives such a formula, which is also tight in the sense that we cannot replace it with a formula which diminishes more quickly as a function of .

Proposition 3.2.

If we replace provability in Definition 3.1 with a random process, such that for each the sentence is independently called “provable” with probability , then would almost surely be an irreducible pattern with probability .

Proof.

Fix a Turing machine . By the law of the iterated logarithm, there exists a constant such that

almost surely. Therefore

almost surely. We will use as a shorthand for this supremum. For any , there therefore exists a such that .

We now show that

. By the chain rule for probabilities, it suffices to show that

. Assume and Let be the first such that

It suffices to show that the probability that there exists an with

is at most .

Observe that

and that the probability there exists an with

is the same as the probability that , which is at most .

We have thus shown that for every there exists a constant such that the probability that is at most .

Partition the set of all Turing machines into sets such that contains all Turing machines expressed in at least but fewer than bits. The probability that a Turing machine in violates

()

for any is at most . The number of Turing machines in is at most , so the probability that there is any and which violate () is at most . Therefore, the probability that there is any Turing machine and which violate () is at most

For small enough this goes to 0, so for large enough , the probability that () holds for all and goes to 1. Therefore, with probability 1, there exists a such that

for all and . ∎

We now use the concept of irreducible patterns to generalize the Benford test.

Definition 3.3.

Let be a Turing machine which on input runs in time and outputs a probability , which represents the probability assigned to . We say that passes the generalized Benford test if

whenever is an irreducible pattern with probability .

Note that if we conjecture that the from Definition 2.1 is an irreducible pattern with probability , then any which passes the generalized Benford test also passes the Benford test.

4 A Learning Algorithm

We now introduce an algorithm that passes the generalized Benford test (see Algorithm 1).

Let be the Turing machine which accepts on input if ZFC proves , rejects on input if ZFC disproves , and otherwise does not halt. For convenience, in Algorithm 1, we define for .

1:
2:
3:for  do
4:     
5:     for  a Turing machine expressible in bits do
6:         
7:         for  a Turing machine expressible in bits do
8:              if  and both accept in time  then
9:                  
10:                  
11:                  
12:                  while  do
13:                       if  and both accept in time  then
14:                           if  accepts in time  then
15:                                
16:                           else if  rejects in time  then
17:                                
18:                           else
19:                                                                                   
20:                                          
21:                  
22:                  
23:                  if  then
24:                                                                 
25:         if  then
26:                             
27:     if  then
28:         
29:               
30:return
Algorithm 1

Let be the set of all Turing machines expressible in at most bits such that accepts in time at most . The encoding of Turing machines must be prefix-free, which in particular means that no Turing machine is encoded in 0 bits. Let denote the set of rational numbers of the form with .

For and Turing machines, let be the number of bits necessary to encode . Let be the subset of natural numbers which are accepted by both and in time at most . Let be the greatest number less than or equal to such that for every in the first elements of , halts in time . Let be the proportion of the first elements of which accepts. Let

Lemma 4.1.

The output of on input is in

Proof.

The algorithm has three for loops, the outer ranging over and the inner two ranging over and respectively, both restricted to Turing machines expressible in bits. The condition on line 8 means that and effectively range over all Turing machines in , and ranges over .

The inner while loop will increment the variables or a total of exactly times. Thus, is set to in line 22. Similarly, is sent to in line 21. Clearly and are and respectively. Therefore, the expression on lines 23 and 24 is

Considering the for loops from inner to outer, we minimize this quantity in , maximize it in , and find of the form minimizing the whole quantity. The returned is therefore a minimizer of

The code is not optimized for computational efficiency. The following proposition is just to ensure that the runtime is not far off from .

Proposition 4.2.

The runtime of is in .

Proof.

Simulating on any input for time steps can be done in time for some fixed constant Hennie:1966. The bulk of the runtime comes from simulating Turing machines on lines 8, 13, 14, and 16. Each of these lines takes at most time, and we enter each of these lines at most times. Therefore, the program runs in time . ∎

5 Passing the Generalized Benford Test

We are now ready to show that passes the generalized Benford test. The proof will use the following two lemmas.

Lemma 5.1.

Let be an irreducible pattern with probability , and let be a Turing machine such that accepts in time if and only if .

There exists a constant such that if , then there exists a such that

Proof.

Let . From the definition of irreducible pattern, we have that there exists such that for all ,

Clearly,

Setting , we get

so

Clearly, , so for all . Therefore,

Lemma 5.2.

Let be an irreducible pattern with probability , and let be a Turing machine such that accepts in time if and only if .

For all , for all , for all sufficiently large, for all , if , and

then .

Proof.

Fix a and a . It suffices to show that for all sufficiently large, if and , then for all , we have

Observe that since , this claim trivially holds when . Therefore we only have to check the claim for the finitely many Turing machines expressible in fewer than bits.

Fix an arbitrary . Since is an irreducible pattern, there exists a such that

We may assume that is infinite, since otherwise if we take large enough, . Thus, by taking sufficiently large, we can get sufficiently large, and in particular satisfy

Take large enough that this holds for each with , and assume . By the triangle inequality, we have

Therefore

which proves the claim. ∎

Theorem 5.3.

passes the generalized Benford test.

Proof.

Let be an irreducible pattern with probability . We must show that

Let be a Turing machine such that accepts in time if and only if .

By considering the case when Lemma 5.1 implies that there exists a constant such that for all sufficiently large, there exists a such that

Similarly, using this value of , and considering the case where , Lemma 5.2 implies that for all , for all sufficiently large, for all if , and

then .

Combining these, we get that for all , for all sufficiently large, if and if is in

then .

Thus, by Lemma 4.1, we get that for all , for all sufficiently large, if , then so

6 Final Remarks

Definition 6.1.

Given a sentence , consider the infinite sequence of integers given by and . If a machine satisfies

we say that converges to on .

Corollary 6.2.

If is provable, then converges to 1 on . If is disprovable, then converges to 0 on .

Proof.

If is provable, then is an irreducible pattern with probably 1. If is disprovable, then is an irreducible pattern with probably 0. ∎

If is neither provable nor disprovable, then it is not clear whether or not even converges on .

Question 6.3.

Does there exist a machine such that passes the generalized Benford test, and for each sentence , there exists a such that converges to on ?

Definition 6.4.

A function from logical sentences to is called coherent if it satisfies the following three properties:

  1. for all provable ,

  2. for all disprovable , and

  3. for all and .

Coherent functions correspond to probability distributions on the space of complete extensions of a given theory.

Question 6.5.

Does there exist a machine and a coherent function such that passes the generalized Benford test, and for each sentence , converges to on ?