In this article, we study induction and in particular Nicod’s Condition (NC) from a Bayesian (in the sense of subjective probability) point of view. Rules of induction can be thought of as such restrictions on the class of probability measures (or equivalently, on the class of rational agents111 The term rational agent (that is, the performer of induction), as used in fields such as decision theory and artificial intelligence [RN03], by definition, refers to an agent that satisfies certain consistency axioms (see [Sav54]). Representation theorems show that this implies that the agent has a probability distribution representing a priori beliefs for what will happen and utilities for the various possible outcomes, such that decisions can be explained as maximizing expected utility. Given utilities we can from simpler axioms [SH11] infer the existence of a probability distribution. ).
The question is: “How can we agree that any particular rule of induction is plausible and generally entails sensible consequences and therefore should be accepted a priori?”. We are interested in a specific rule, namely NC which informally speaking states that “A proposition of the form All are is supported by the observation that a particular object is both and ” [Hem45]. Does the fact that NC, does not seem to be counterintuitive suffice to persuade us that it is a plausible rule of induction? How can we be sure that it does not violate other intuitively acceptable rules and principles? As the notorious raven paradox [Hem45] shows, NC actually does entail counterintuitive consequences, and more than seven decades of discussion about this paradox shows that assessment of rules of induction can be extremely problematic.
A summary of the raven paradox is as follows: The hypothesis := “All ravens are black” is logically equivalent to := “Every thing that is not black is not a raven”. A green apple is neither black nor a raven therefore according to NC its observation should confirm . But is logically equivalent to , so we end up with a Paradoxical Conclusion (PC), that an observation of a green apple confirms that all ravens are black, which is counterintuitive. In order to resolve the paradox, either it should be shown that NC is not a plausible rule of induction or it should be claimed that PC holds and should not be considered as being counterintuitive.222Some authors have even denied the equivalence of and [SG72].
In order to study the paradox from a Bayesian perspective, first we make a distinction between (objective) background knowledge by which we exclusively refer to the knowledge that can be represented (and consequently can be thought of) as previously observed events, and any other kinds of information which we consider as being subjective and a property of the a priori chosen probability measure (i.e. the initial degrees of beliefs). The cogency of inductive rules is significantly affected by the given background knowledge and the chosen measures. For example, it is already known that relative to some background knowledge, NC violates intuition (e.g. see [Goo67]). In Section 2.1, we argue that relative to unrestricted background knowledge, not only NC but any rule of induction can be refuted. Hempel himself believed that NC and the raven paradox should be considered in the context of absolutely no background information [Hem67]. From a Bayesian perspective, however, this does not solve the problem. The reason is that background knowledge and priors are convertible (in the sense that they can produce the same effects). For example if we are not allowed to consider NC in the context that we possess the background knowledge that “an object has a property ” (denoted as: ), we can (approximately) produce the same situation by subjectively believing that the probability that that object has the property , is sufficiently close to 1 (denoted as: ).333 According to Cournot’s principle [Cou43], we do not allow 1 (or 0) priors for events that may or may not happen. If it was allowed then assuming , would exactly produce the same effect that possessing the background knowledge “ is ” would. This would be problematic, because assigning probability 1 to the events that are not determined (by background knowledge) may lead to undefined conditional probabilities (in case the complement events occur). However, assigning a probability that is arbitrarily close to one (i.e. ) does not cause such a problem while it approximately produces the effect of the same background knowledge to arbitrary precision. 444 Although for Hempel and his contemporaries, the confirmation theory was (more or less) a logical relation, akin to deductive entailment, rather than a probabilistic relation (in the formal sense) [FH10], what we mentioned about the convertibility of objective information and subjective beliefs, was in a way reflected in their discussions: Good’s Red Herring [Goo67] provided a hypothetical objective background setting with respect to which, NC does not hold. Hempel’s assertion that “NC should be considered in the context of no (objective) background information” was in fact an attempt to address such an issue. However, nothing could prevent Good from producing the same effect by simply replacing the objective information with subjective a priori beliefs of a new born baby (Good’s baby [Goo68]). Therefore, if we want to restrict ourselves to the context of perfect ignorance, not only should we possess no objective knowledge, we should also only be permitted to reason based on an absolutely unbiased probability measure. This raises an important question: “What is an unbiased measure?” Due to its subjective nature, this question does not have a definitive answer but by far, the most widely considered choice is the uniform measure.555 The main justification is due to the principle of maximum entropy [Jay03]. This principle recommends that one choose, among all the probability measures satisfying a constraint, the measure which maximizes the Shannon entropy. In the absence of any background knowledge (i.e. no constraint), the uniform probability measure maximizes entropy. 666 An alternative to the uniform measure is Solomonoff’s theory of universal inductive inference [Sol64] which mathematically formalizes and puts together Occam’s razor (the principle that the simplest model consistent with the background knowledge should be chosen) and Epicurus’ principle of multiple explanations (that all explanations consistent with background knowledge should be kept) (see [Hut07] or [RH11]). On the other hand, it is well known that using the uniform measure, inductive learning is not possible [Car50]. This shows that from the subjective probabilistic perspective, choosing a probability measure that satisfies the condition of perfect ignorance and allows inductive learning, is arguably impossible.
It is also notable that demonstrating that a specific probability measure does (or does not) comply with a rule of induction (or a statement such as PC), does not illuminate the reason why a typical human observer believes that such a rule (or statement) is implausible (or plausible). Conversely, one might argue that compliance of a probability measure with a counterintuitive statement such as PC, may suggest that this measure does not provide a suitable model for inductive reasoning. As an example, consider the following two works: [Mah99] and [Mah04]. They are among the most famous answers to the raven paradox. Using Carnap’s measure [Car80], in 1999 Maher argued that both NC and PC hold. In 2004 he suggested a more complex measure that led to opposite results: Maher showed that for this latter measure, neither NC nor PC holds in all settings. Although Maher’s works successfully show that at least for one probability measures NC holds and for another probability measures it does not, they do not show whether NC and PC are generally plausible or not.
The mainstream contemporary Bayesian solutions are not restricted to a particular measure and in this sense, are more general. According to [FH06] almost all of them accept PC and argue that observation of a non-black non-raven does provide evidence in support of ; however, in comparison with the observation of a black raven, the amount of confirmation is very small. This difference, they argue, is due to the fact that the number of ravens is much less than the number of non-black objects. However as [Vra04] explains, these arguments have only been able to reach their intended conclusion by adding some extra assumptions about the characteristics of the chosen probability measure. He shows that the standard Bayesian solution relies on the almost never explicitly defended assumption that “the probability of should not be affected by evidence that an object is non-black.” – a supposition that he believes, is implausible, i.e. may hold or not.
To summarize the above discussion: the general plausibility of a rule of induction cannot be determined if we restrict our study to particular (objective) background knowledge or a particular probability measure. On the other hand, no rule of induction holds in the presence of an unrestricted choice of background knowledge and probability measure. We conclude that rules of induction should be studied for different classes of background knowledge and priors. If a rule of induction holds relative to a large class of reasonable background knowledge (i.e. information similar to our actual configuration of knowledge obtained from observations that often take place in real life) and relative to reasonable probability measures (i.e. measures that have intuitively reasonable characteristics e.g. comply with other rules of induction which are more directly justified by our intuitive notion of induction), then we can claim that the studied rule is plausible, otherwise we cannot.
In this paper, we study NC with such an approach. In Section 2, we present a formal representation for three rules of induction, namely, projectability (PJ), reasoning by analogy (RA) and Nicod’s condition (NC). We also define the form of background knowledge that is studied throughout the paper. Informally speaking, we only study pieces of knowledge that do not link the properties of one object to another object. They can be though of as knowledge that can be gained directly by observing the properties of some distinct objects. For example, the background knowledge: “if object is a raven, then object is not a raven”, is not of this form. While one can easily constitute pieces of information that do not have such a form and violate the aforementioned rules of induction, we have not found any illuminating counterexample to the assumption that relative to a piece of information that does not link properties of distinct objects together, PJ and RA comply with intuition. In the case of NC, we are more inquiring. In the next two sections, we study the restrictions of the probability measures that guarantee the validity of NC relative to two more specific background configurations (that can be expressed in the mentioned form). In Section 3, we find some sufficient conditions for the validity of NC and some sufficient conditions for its invalidity, relative to information about the kind (i.e. being raven or not) and color (i.e. being black or not) of some objects. The sufficient condition that we present for the validity of NC is less restrictive. However, this is insufficient for claiming that in this setting NC is generally plausible. Section 4 deals with the setting where the exact number of objects having one property is known. For example we know how many ravens (or how many non-black objects) exist. We show that in this setting, measures that comply with Nicod’s condition, do not always comply with PJ which seems to be the simplest formalization of inductive inference. It is also shown that in the case of contradiction, intuition (arguably) follows PJ rather than NC. We think that this result is both interesting and somewhat surprising and should be considered as a main contribution of this paper.
One limitation of our basic setup is that it limits us to a universe with an arbitrary but known size. However, in Section 5, this strong assumption is replaced by the weaker assumption that there is a probability distribution over the possible sizes of the universe and this distribution is not affected by an observation of a single object. We show that under this weaker assumption, the results from the former sections remain valid.
In Section 6, we summarize the paper and conclude that there is a tension between NC and our intuitive notion of inductive reasoning. We also suggest that reasoning by analogy provides a viable alternative to formalize the seemingly intuitive statement that “the observation that a particular object is both and confirms the hypothesis that any object that is is also ” without suffering form the shortcomings of NC. All theorems are proven in Section 7.
1.1 Notation, Basic Definitions and Assumptions
Throughout sections 2 to 4 we work with a first-order language whose only nonlogical symbols are a pair of monadic predicates and and a set of constants (officially shown as) where is a known positive integer. However, for simplicity we drop “” and refer to each constant by its index. We rely on the domain closure axiom [Rei80], that is:
where to are distinct constants i.e. . Clearly, models of this axiom are restricted to interpretations with domains containing exactly distinct individuals (objects) each of which is denoted by a constant in . Using this bijection between the elements of the domain and constants, we refer to as the universe.
Negation, conjunction, disjunction and material implication are respectively represented by “”, “” (or “”), “” and “”. If is an individual and is a 1-place predicate (either atomic or a sentential combination of atomic predicates), is defined as a proposition that involves predicate and indicates: “ has (or satisfies or is described by) ”. Conjunction of several propositions is abbreviated by . By definition, for , (i.e. tautology) and . More generally, if to are some objects (not necessarily consecutive), . The general hypothesis (which is equivalent to ) where . We also let:
Any of to defined by relation (1.2) is referred to as a complete description of an object .777 , , and are what Maher calls , , and respectively. What we call complete description, he calls sample proposition [Mah99]. For example if and represent “ravenhood” and “blackness” properties respectively, then means “ is not a raven and is not black” and so on. means “if is a raven then it is black” and is the general hypothesis that “for all , if is a raven then it is black”. Clearly all complete descriptions which provide counterexample to are in the form .
We define as the set of all propositions which are in the following form (or by simplification can be converted to it):
where to are some predicates in , , , , , , , , , , , and to are some mutually distinct objects: . Note that is in fact the set of all propositions that do not link the properties of different objects together. We define the set of all individuals described by as: . We refer to the set of (simple) predicates involved in by: . For example, the proposition is not in but (assuming ). and . By definition, empty (or tautologous) proposition is a member of with . Two subsets of are defined as follows:
Informally speaking, is the set of propositions that completely describe some individuals and do not falsify .
Likewise, is the set of propositions that completely describe all objects of the universe. We refer to any member of as a Complete Description Vector
Complete Description Vector(CDV). Note that each CDV corresponds to a unique model or world (up to isomorphism). In other words, every interpretation that makes a CDV (and aforementioned axiom (1.1)) true, uniquely determines the value of any sentence in ,888 The proof is straightforward. With respect to the domain closure axiom, all quantifiers are bounded. Therefore all sentences are convertible to quantifier-free forms and consequently convertible to full disjunctive normal form (DNF) which is in fact a disjunction of some CDVs. In any world, only one CDV is true, therefore only sentences containing that CDV (when expressed in full DNF) are true. therefore we can consider them as (representatives of) different worlds. The probability measures which we are concerned with, are defined over the sample space with the power set as -algebra. No other restriction is imposed on the choice of measure unless it is mentioned explicitly. For each proposition , let be the set of all CDVs that entail . We say that the event corresponds to the proposition (and vice versa). For convenience’ sake, except in Section 5, we represent the probability of events by the probability of their corresponding propositions999 In Section 5, we simultaneously deal with more than one sample space. While w.r.t. different sample spaces, propositions may correspond to different events, in that section we directly represent probability events by their relevant sample space subsets. ; formally, for propositions , we let . According to Cournot’s principle [Cou43], we do not allow 1 (resp. 0) priors to the sentences that are not valid (resp. unsatisfiable).
Objective background knowledge (or simply background knowledge) is what we are certain about and can be represented by a subset of . The more formal definition of the background setting studied throughout this paper and its corresponding restrictions are given in Section 2.1.
We equate “inductive support” with “probability increment”: It is said that in the presence of background knowledge , evidence confirms hypothesis iff:
We are only interested in the case where and are consistent, and are positive and is not determined by , i.e. .
2 Inductive Reasoning and Nicod’s Condition
The fundamental assumption behind inductive inference is the so-called principle of the uniformity of nature [Hum88] (or the immutability of natural processes [Pop59]) based on which, uniformity and trend are more probable than diversity and anomaly a priori.
Let us assume that is a set of background knowledge configurations for which we “intuitively” expect that inductive inference holds (for more discussion refer to Section 2.1). Relative to pieces of information in , we present the following varieties of inductive inference (i.e. inductive rules):
Projectability. For all objects and and background knowledge that does not determine or , based on [Mah04], one (and apparently the simplest) kind of inductive inference, namely projectability101010 According to [Car50] predictive inference (i.e. inference from a sample to another sample) is the most important kind of inference and the most important special kind of it, singular predictive inference, is inference from a sample to an individual object. Maher’s projectability is in fact a special kind of singular predictive inference: inference from one individual to another individual. 111111Maher’s original relation does not mention background knowledge, and only deals with strong projectability (which he calls absolute projectability). , is defined as follows:
Projectability (relative to predicate ) can be justified as follows: The evidence increases the proportion of the observed individuals that have the predicate
. Thus, according to the principle of the uniformity of nature, the estimated frequency of the predicatein the total population should also be increased because the uniformity between the characteristics of the sample and the total population is considered to be likely.
Reasoning by Analogy (RA). The observation that two individuals have some common properties, increases the probability that their unobserved properties are also alike, because it is likely that there is a uniformity between the characteristics of unobserved properties and the observed ones. Maher has formalized one variation of reasoning by analogy (or inference by analogy [Car50]) as: [Mah04]. We generalize the relation to cover the case where background knowledge (that does not determine the value of , and ) is also present:
Nicod’s Condition (NC). For , , all and that does not determine the value of or , we say NC holds for iff:
NC is stronger than PJ or RA in the sense that it deals with the confirmation of a generalization rather than a singular prediction. In other words, NC is a form of enumerative induction but PJ and RA are forms of singular predictive inference.
2.1 Restrictions on the Background Knowledge
Obviously, relative to unconstrained background knowledge, no rule of induction holds in general. For example, in the presence of background knowledge , at least relative to evidence , PJ does not hold. Similarly, (as [Mah04], Theorem 12 formally shows), in the presence of background knowledge , NC does not hold (for evidence ).
To prevent such problems, the biggest set of background configurations studied through out this paper is 121212 Note that we do not claim that no background knowledge that is not a member of is not plausible. Investigation of rule of inductions relative to such knowledge, is simply beyond the scope of this paper. , which according to its definition in Section 1.1, is the set of all consistent propositions that can be expressed in the form of a conjunction of some propositions that involve , , , , , or their negations.
Obviously, each member of can be expressed in the form of a conjunction of some propositions each of which describes only one individual. Consequently, problematic statements that interlink properties of different individuals are not expressible. As an example, the mentioned pathological examples and are not in .
In the case of PJ and RA, we did not find a pathological example in , relative to which, the rule of induction contradicts intuition. However, in the case of NC it is already claimed that relative to background knowledge , it is not intuitively sound to expect that the evidence confirms [FH10]. In Sections 3 and 4, we will investigate the validity of NC relative to two interesting subsets of .
2.2 Restrictions on the probability measure.
In Section 3 (Setting 1), we impose no restriction on the choice of the probability measure but in Section 4 (Setting 2), we assume that the probability measure is exchangeable [Car80] in a sense that probabilities are not changed by permuting individuals (i.e. swapping the name of objects). To introduce this restriction formally, we need the following definitions:
By the term permutation, we always refer to a bijection from a set of all objects to itself. Throughout this paper, we denote any arbitrary permutation by (or and when we deal with more than one permutation). Having a proposition , the proposition is obtained from by replacing any occurrence of any individual with .
If , the function defined by ; and , is a permutation with a fixed point . For short we write . If we define := , then .
The probability measure is exchangeable if for all propositions and and all permutations , .
3 Validity of NC when Background Knowledge Consists of Complete Descriptions of Some Individuals (Setting 1)
In Section 1.1, was defined as the set of all background knowledge that do not refute and describe some individuals completely (e.g. in the case of the raven paradox, members of represent the knowledge that we are already aware of the color and kind (i.e. the state of being raven) of some individuals and none of these known objects have been a non-black raven). Clearly, . Theorem 3.1 shows that if the chosen probability measure satisfies some conditions, then for any background knowledge , NC holds (for predicates and ). On the other hand, Theorem 3.2 shows that under alternative conditions, for some , NC does not hold.
If a probability measure complies with the following relation:
then, for this measure and any that does not determine or , NC holds, i.e. relation (3.1) entails: .
If background knowledge consists of complete descriptions of some individuals, by Theorem 3.1 for all pairs of predicates and , the uniform measure complies with NC since regardless of the interpretation of and , for this measure, & . This is not surprising since using this measure, learning is impossible (see [Car50]). This means that no observation changes the probability of being for an unobserved object. Nonetheless, for this measure NC is valid because any evidence in the form of , or confirms for the simple reason that it removes the possibility that the observed object (i.e. ) is a counterexample to .
In Carnap’s theory of inductive probability [Car80]:
In the above relations, is the number of objects mentioned by evidence ; is the number of mentioned objects which satisfy predicate , and is a constant measuring the resistance to generalization. Note that should not be mentioned by , i.e. . Using this measure and choosing , in the presence of background knowledge such that : = . Thus by Theorem 3.1, for the class of background knowledge in the form of conjunction of some , and/or for distinct individuals, this measure complies with NC. This is equivalent to the setting chosen by [Mah99] and its corresponding results.
In the above relations and to represent an arbitrary enumeration of all individuals (except ) that are not mentioned by (that is, ).
According to restriction (3.3), the probability that is not given that all other objects in the universe are should be less than the degree of confirmation by evidence of a hypothesis that an unobserved object is . Note that can be the index of any unobserved object.
[Mah04] proposes a measure based on the formula:
In the above relation, is the number of objects mentioned by ;
and denote the number of mentioned objects which are and respectively.
In this expression, the prior probability ofi.e. has to be equal to and and are parameters. Maher proposes a counterexample for NC where (Let ), , and prior probabilities are and . This conclusion can be confirmed independently by Theorem 3.2 as follows:
1. For these parameters, the only member of that does not contain and , is for which relation (3.2) holds if .
2. By assuming: and , a cumbersome calculation shows that (for empty background knowledge) relation (3.3) holds if: which covers Maher’s proposed configuration.
Comparing Theorems 3.1 and 3.2 shows that creating a probability measure that contradicts NC (w.r.t. ) is harder than making a measure that complies with it (for the same background setting) because the former measure has to satisfy more constraints. The reason is that even if in a measure, evidence does not affect the probability of
for unobserved objects (as in the case of the uniform distribution), every hypothesis that is not refuted by(including ) is confirmed by it since the observation has reduced the number of possible counterexamples by one. On the other hand, in the case of a measure that does not comply with NC, not only should confirm for unobserved objects, but the effect of this confirmation should be so substantial that it overwhelms the effect of the elimination of one counterexample to .131313 To see how the effect of elimination of one possible counterexample leads to relation (3.3), refer to the proof of Theorem 3.2 in Section 7. However, in the case where the size of the universe is large, the latter effect should be minute. This is reflected in relation (3.3) as follows: If is large, then at least for measures that comply with projectability, , because if it is known that all objects in the universe except are , then it should be quite probable that is too. Therefore, even if the degree of confirmation of by evidence (i.e.) is very small141414 Note that by relation (3.2), this degree of confirmation is positive. , relation (3.3) holds.151515 Here is another justification for the above argument: By definition, a probability measure defined over a first-order language with an infinite domain is Gaifman iff the probability of the generalization of any predicate (in our case, ) is equal to the probability of the conjunction of some positive instances when their number tends to infinity [GS82] or alternatively, (see [HLNU13] thm. 27) and consequently . Since the Gaifman condition is what we intuitively expect from generalization over an infinite universe, it can be considered as a very simple and intuitive rule of induction. In our case, if the universe was infinite and the measure was assumed to be Gaifman, inequality (3.3) would always hold. However we have assumed that the universe is finite therefore we cannot remove this inequality. What we can say is that for very large domains, relation (3.3) is a very weak condition. To summarize:
If regardless of the choice of background knowledge, an observation does not confirm that any unobserved individual is an that is not , then relative to any background knowledge in , NC holds.
If regardless of the choice of background knowledge, an observation confirms that any unobserved individual is an that is not , and on the other hand, the effect of elimination of one counterexample via an observation is negligible, then relative to any background knowledge in , NC does not hold.
The above statements delegate the assessment of NC (a form of enumerative induction) to the assessment of expressions which deal with singular predictions. Hence, a new perspective on the nature of NC is provided: Should regardless of the interpretation of and , (the observation of) an that is disconfirm that any unobserved object is but not ?
For example, relative to background knowledge and a probability measure that reflect our actual configuration of knowledge, should the observation of an =“walnut” that is =“round” decrease the probability that any unobserved object is a walnut but not round? Indeed yes; therefore by Theorem 3.1, in this case and for these predicates, NC holds.
Should the observation of an =“round”, G=“walnut” decrease the probability that any unobserved object is “round” but not a “walnut”? Arguably not.
Should the observation of an =“ogre” which is =“old” decrease the probability that we might encounter an ogre which is not old? Definitely not! 161616 This confirmation asymmetry may be due to possible asymmetry in background knowledge and/or prior possibilities of different predicates. For example according to our actual configuration of knowledge, the prior probability of “being an ogre” (for any individual) is quite low. This is a key point in the existing arguments: Good’s baby [Goo68] (that assigns low probability to ravenhood) and Maher’s unicorn [Mah04]. But unlike our discussion, these arguments do not reduce the assessment of NC to a singular prediction. Therefore, in this case, we are intuitively using a probability measure that satisfies the condition (3.2). Now assume that we have seen all objects of the world except one. It has happened that any observed object that has been an ogre has been old as well. Is it reasonable to believe that it is improbable that the last unobserved object is a young ogre? If yes, then our intuitive measure also complies with restriction (3.3), hence by Theorem 3.2, by this denotation for and , plausible probability measures do not comply with Nicod’s condition.
4 NC vs. PJ when the Number of Objects having One Predicate is Known (Setting 2)
This section studies NC in the presence of a completely different background setting where we know that exactly individuals are (e.g. ravens) and the rest are not , but we do not know anything about the other property (e.g. their color).
First we focus on a simpler setting where we know exactly which objects are and which objects are not (e.g. we know that objects to are and the rest of the universe i.e. objects to are not ).
For and , weak projectability (PJ) entails:
and reasoning by analogy (RA) entails:171717 Therefore, in this setting both PJ and RA suggest that is confirmed by evidence but RA provides no answer whether or not evidence should confirm (or disconfirm) . The reason is that (as the proof of the theorem which is provided in Section 7 shows) in the presence of background knowledge , validity of only depends on property of objects to that do not have a common property with object .
Next, we show that these results are valid in the general setting where the background knowledge is such that we only know the exact number of objects being but we do not know their names. In other words, we know that exactly one combination of out of objects of the universe are but we do not know which combination. But before that, we should formalize such knowledge in the form of an event (i.e. a subset of the sample space).
is defined as the set of all (distinct) subsets of which contain exactly individuals. Obviously, the cardinality of is .
Given , = , , , , , .
For , “Exactly objects of the universe are ” is formally defined as follows:
In the previous example,
By comparing definition (4.5) with the definition of , it becomes clear that for , , therefore we do not expect that in the presence of such background knowledge, rules of induction hold in general and they actually don’t. For instance, knowing that exactly objects are , the evidence that a particular object is , confirms that any other object is not ,181818 Suppose that you are in a camp populated by 100 captives, and it is known that 10 of them will be chosen randomly to be executed; Whenever someone except you is chosen, it is reasonable to be more optimist about your fate, for the simple reason that . which contradicts PJ:
However, the following theorem shows that for the hypothesis that we are interested in i.e. , the background knowledge is equivalent to which is a member of . Therefore, in the case of the raven paradox and background knowledge , the rules of induction (that are assumed to hold relative to background knowledge in ) should still hold.
If and is an arbitrary member of and assuming that a probability measure is exchangeable:
The formal proof of this theorem is presented in Section 7.3, but the following simple example shows the main idea behind the general proof.
Having , we show that:
(that is relation (4.7) for and ) as follows:
Similarly it can easily be shown that:
. Therefore by Bayes rule:
which is what we wanted to show by this example.
If “exactly objects (of the universe ) are ” and is an object, and the probability measure is exchangeable, weak projectability (PJ) entails:
and reasoning by analogy (RA) assumption entails:
The above relations seem to be compatible with intuition. While the total number of objects that satisfy is known in advance, the consideration of or should not affect our estimation of the frequency of the objects being . On the other hand, the probability of can still be affected by observations. Therefore, assuming PJ, consideration of increases the probability of and consequently decreases the probability of . As a result it seems reasonable that the evidence confirms , and the evidence disconfirms it.
Moreover, an observation has an extra effect: While it is known that only objects can be counterexamples to (because in order to be , one should be ), the observation decreases the number of possible counterexamples by one. This holds even in the case where the chosen measure is such that inductive reasoning is not possible (e.g. the uniform measure is used). Consequently, in (4.10) inequality is strict, but in (4.11) and (4.12) it is not. Theorem (4.3) implies the following results:
If := raven and := black, w.r.t. (4.11), PJ leads to:
((exactly objects are ravens).(a specific object is nonRaven and black)) (exactly individuals are ravens)
If :=nonBlack & :=nonRaven w.r.t. (4.11), PJ leads to:
((exactly individuals are not black).(a specific object is black and not raven)) (exactly individuals are not black)
If := raven and := black, w.r.t. (4.12), PJ leads to:
((exactly individuals are ravens).(a specific object is not raven and not black)) (exactly individuals are ravens)
If := nonBlack & := nonRaven, w.r.t. (4.12), PJ leads to:
((exactly individuals are not black).(a specific object is black and raven)) (exactly individuals are not black)
|no. of the ravens is known||no. of non-blacks is known|
|observation of a||PJ: doesn’t confirm (Cor. 4.8)||PJ: confirms (Cor. 4.5)|
|non-black||NC: confirms||NC: confirms|
|non-raven||RA: n/a||RA: confirms (Cor. 4.5)|
|observation of a||evidence refutes||evidence refutes|
|PJ: doesn’t disconfirm||PJ: doesn’t disconfirm|
|observation of a||(Corollary 4.6)||(Corollary 4.7)|
|black non-raven||NC: n/a||NC: n/a|
|RA: n/a||RA: n/a|
|PJ: confirms (Corollary 4.4)||PJ: does not confirm|
|observation of a||(Corollary 4.9)|
|black raven||NC: confirms||NC: confirms|
|RA: confirms (Corollary 4.4)||RA: n/a|
Corollaries (4.4) to (4.9) are summarized in (Table 1). NC, if assumed to hold in this setting, suggests that the observation of a non-black non-raven and the observation of a black raven (i.e. entries in the first and fourth rows of the table) should confirm which clearly contradicts what PJ suggests, therefore, there is a tension between these two rules. When background knowledge is neglected, intuition goes with PJ in the first column of the table. On the other hand, it does not completely match the suggestions of either NC or PC in the second column. This may indicate that intuition is more inclined to the case where “the number of ravens” and not “the number of non-blacks” is known in advance. In real life, none of these numbers is known but the total number of ravens can be estimated much easier than the number of non-black objects. On the other hand, if we are explicitly informed of the total number of non-black objects, at least in cases similar to the following example, intuition seems to follow PJ’s suggestions in the second column:
Imagine that you are only concerned about objects which are placed inside a bag (i.e. set of objects inside a bag). Also imagine that you are told that only 4 objects are not black. In this case, there are just four possible counterexamples to . Now suppose that a green apple comes out of the bag. Since it is green, it is one of those 4 non-blacks. Therefore, one possible counterexample is removed. Meanwhile, the fact that it is a non-raven may increase the probability of non-ravenhood (w.r.t. PJ). Therefore it is more probable that the 3 remaining non-blacks are also non-raven. Thus, this observation should confirm . Now suppose that a black raven comes out. Its color informs us that it is not among the possible counterexamples but its kind increases the probability of ravenhood which is not in favor of . So, this observation cannot confirm . Therefore, this example suggests that in Setting 2, given the proper background knowledge, intuition does not follow NC. It follows PJ even if it advises that the observation of a green apple confirms that “all ravens are black” and the observation of a black raven does not!
Clearly, we can never “prove” that a particular measure or a particular proposition is (or is not) “intuitively plausible”, due to the subjective nature of the problem.
The former example presented a particular method of reasoning that relative to a given configuration supports PJ more than NC. This has convinced us that generally in setting 2, PJ is more plausible than NC but as we mentioned, some people might not be convinced. For example,
one might argue that if we are told that the number of non-black objects is precisely 7 million, we can still believe that black ravens confirm that all ravens are black.
Such reasoning might be on grounds of some “hidden” background information such as knowing that ravens are animals and animals of the same kind often have similar colors.
This particular background knowledge is not in (and therefore not in Setting 2) however as it was mentioned in the introduction, this knowledge is convertible to the subjectively chosen a priori probability measure.
Of course given such knowledge, there will be no surprise if NC holds for :=raven and :=black but not for := non-black and := non-raven. However, it is up to the readers to judge about what is intuitive for them and what is not. What was formally provable (and is proved formally) is that in setting 2, no probability measure can simultaneously satisfy PJ and NC for a couple of predicates and .
Reasoning by analogy vs. Nicod’s condition: Table 1 clearly shows that the only cases where PJ and NC do not contradict is when according to RA, the general hypothesis should be confirmed. In other cases, RA do not impose a restriction; therefore, it never contradicts either PJ or NC.
A little thought reveals that RA and NC have many commonalities. We go a step further and propose a conjecture that NC may seem intuitively valid since it can easily be conflated with RA as follows:
According to the informal definition of NC: “The observation of an that is confirms that all are (or any is ).” Although this informal statement seems to be plausible a priori, it is vague and NC is not necessarily its only possible formalization. To begin with, it should be noticed that in informal language, the scopes of quantifiers are often ambiguous; For example the informal expression “for all , the probability of …” can easily be mistaken for “the probability that for all , ”. However the most suitable formalization of the former (i.e. ) differs from that of the latter (i.e. ). On the other hand, the informal “if” does not exclusively stand for material implication; in a proper context it can also mean conditional probability. Putting these together, it can be seen that:
(Informal NC): “The observation of an object which is both and confirms that all (or any) object that is is also .”
can alternatively be formalized as: . This relation is the definition of RA (see relation 2.2) – a rule of induction which is used in many fields (e.g. in case-based reasoning [AP94]), is directly justified by the principle of the uniformity of nature and does not suffer from the shortcomings of NC such as contradicting PJ or producing counterintuitive conclusions such as PC in the raven paradox.
5 When the Size of the Universe is Unknown
In Section 1.1, we defined our probability space using the sample space , the set of all complete description vectors (CDVs), all involving objects. Thus, from the beginning, we had to assume that the cardinality of the universe is known. In this section, we instead assume that:
The size of the universe (i.e. ) is unknown; however it is known that it is fixed and bounded by some known constants and . E.g. assume it is known that the number of the objects of the universe is larger than and less than .
The new evidence , does not affect the way the rational agent estimates the size of the universe. E.g., the observation of a black raven does not change the probability distribution over the possible sizes of the universe.
We show that in this setting, our previous conclusions are still valid. Informally speaking, the reason is that all (in)equalities of the previous sections hold for any arbitrary (but fixed) size of the universe, therefore this number does not play a role, and consequently, even if it is unknown, (as long as new evidence does not affect the agent’s beliefs about it) all (qualitative) relations should still hold.
To justify this claim formally, we need some new notation:
If it is known that the size of universe is , we let the enumeration of its objects be (created recursively by ).191919 Note that ‘’ denotes the disjoint union operation. Instead of , we write to indicate that the members of this set describe individuals which belong to the universe . Similarly, instead of , we write to emphasize that the sample space corresponds to a universe of size (i.e. ).202020 While contains CDVs that exactly describe objects, for all distinct and , =. Similarly, instead of , we write to indicate that by definition, (defined on ) is a measure that provides a probabilistic model for the rational agent (who performs induction), only if he/she/it knows the cardinality of the universe is .
To prevent ambiguity, instead of representing the events by propositions, we directly use subsets of the sample spaces: The event that corresponds to an arbitrary proposition relative to a sample space is: . For example, if , then represents the event , while stands for the event , , , , . The Complete Description Vectors (CDVs) are here bold-faced and conjunction symbols “.” are dropped to emphasize that they are not ordinary propositions. For example is a CDV that not only entails the ordinary proposition , but also indicates that the universe is , . The reason is that by definition, each CDV describes all objects of the universe (see Section 1.1).
Let and be some known lower and upper bound for the size of the universe.212121 is at least equal to the number objects mentioned by background knowledge or evidence. can be arbitrarily large but for simplicity, we assume that it is finite. To see what would be needed if we wanted to allow , refer to Footnote 23. We define a new sample space as a set that contains all members of all sample spaces that correspond to universes with sizes at least equal to and at most equal to :
Relative to the sample space and for all , the subset represents the event that “the size of the universe is ”. We let:
corresponds to the event th