Imprecise probability trees: Bridging two theories of imprecise probability

01/08/2008 ∙ by Gert de Cooman, et al. ∙ Ghent University 0

We give an overview of two approaches to probability theory where lower and upper probabilities, rather than probabilities, are used: Walley's behavioural theory of imprecise probabilities, and Shafer and Vovk's game-theoretic account of probability. We show that the two theories are more closely related than would be suspected at first sight, and we establish a correspondence between them that (i) has an interesting interpretation, and (ii) allows us to freely import results from one theory into the other. Our approach leads to an account of probability trees and random processes in the framework of Walley's theory. We indicate how our results can be used to reduce the computational complexity of dealing with imprecision in probability trees, and we prove an interesting and quite general version of the weak law of large numbers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In recent years, we have witnessed the growth of a number of theories of uncertainty, where imprecise (lower and upper) probabilities and previsions, rather than precise (or point-valued) probabilities and previsions, have a central part. Here we consider two of them, Glenn Shafer and Vladimir Vovk’s game-theoretic account of probability [30], which is introduced in Section 2, and Peter Walley’s behavioural theory [34], outlined in Section 3. These seem to have a rather different interpretation, and they certainly have been influenced by different schools of thought: Walley follows the tradition of Frank Ramsey [22], Bruno de Finetti [11] and Peter Williams [40] in trying to establish a rational model for a subject’s beliefs in terms of her behaviour. Shafer and Vovk follow an approach that has many other influences as well, and is strongly coloured by ideas about gambling systems and martingales. They use Cournot’s Principle to interpret lower and upper probabilities (see [29]; and [30, Chapter 2] for a nice historical overview), whereas on Walley’s approach, lower and upper probabilities are defined in terms of a subject’s betting rates.

What we set out to do here,111An earlier and condensed version of this paper, with much less discussion and without proofs, was presented at the ISIPTA ’07 conference [7]. and in particular in Sections 4 and 5, is to show that in many practical situations, the two approaches are strongly connected.222Our line of reasoning here should be contrasted with the one in [29], where Shafer et al. use the game-theoretic framework developed in [30] to construct a theory of predictive upper and lower previsions whose interpretation is based on Cournot’s Principle. See also the comments near the end of Section 5. This implies that quite a few results, valid in one theory, can automatically be converted and reinterpreted in terms of the other. Moreover, we shall see that we can develop an account of coherent immediate prediction in the context of Walley’s behavioural theory, and prove, in Section 6, a weak law of large numbers with an intuitively appealing interpretation. We use this weak law in Section 7 to suggest a way of scoring a predictive model that satisfies A. Philip Dawid’s Prequential Principle [5, 6].

Why do we believe these results to be important, or even relevant, to AI? Probabilistic models are intended to represent an agent’s beliefs about the world he is operating in, and which describe and even determine the actions he will take in a diversity of situations. Probability theory provides a normative system for reasoning and making decisions in the face of uncertainty. Bayesian, or precise, probability models have the property that they are completely decisive: a Bayesian agent always has an optimal choice when faced with a number of alternatives, whatever his state of information. While many may view this as an advantage, it is not always very realistic. Imprecise probability models try to deal with this problem by explicitly allowing for indecision, while retaining the normative, or coherentist stance of the Bayesian approach. We refer to [8, 34, 35] for discussions about how this can be done.

Imprecise probability models appear in a number of AI-related fields. For instance in probabilistic logic: it was already known to George Boole [1] that the result of probabilistic inferences may be a set of probabilities (an imprecise probability model), rather than a single probability. This is also important for dealing with missing or incomplete data, leading to so-called partial identification of probabilities, see for instance [9, 19]. There is also a growing literature on so-called credal nets [3, 4]: these are essentially Bayesian nets with imprecise conditional probabilities.

We are convinced that it is mainly the mathematical and computational complexity often associated with imprecise probability models that is keeping them from becoming a more widely used tool for modelling uncertainty. But we believe that the results reported here can help make inroads in reducing this complexity. Indeed, the upshot of our being able to connect Walley’s approach with Shafer and Vovk’s, is twofold. First of all, we can develop a theory of imprecise probability trees: probability trees where the transition from a node to its children is described by an imprecise probability model in Walley’s sense. Our results provide the necessary apparatus for making inferences in such trees. And because probability trees are so closely related to random processes, this effectively brings us into a position to start developing a theory of (event-driven) random processes where the uncertainty can be described using imprecise probability models. We illustrate this in Examples 1 and 3, and in Section 8.

Secondly, we are able to prove so-called Marginal Extension results (Theorems 3 and 7, Proposition 9), which lead to backwards recursion, and dynamic programming-like methods that allow for an exponential reduction in the computational complexity of making inferences in such imprecise probability trees. This is also illustrated in Examples 3 and Section 8. For (precise) probability trees, similar techniques were described in Shafer’s book on causal reasoning [27]. They seem to go back to Christiaan Huygens, who drew the first probability tree, and showed how to reason with it, in his solution to Pascal and Fermat’s Problem of Points.333See Section 8 for more details and precise references.

2. Shafer and Vovk’s game-theoretic approach to probability

In their game-theoretic approach to probability [30], Shafer and Vovk consider a game with two players, Reality and Sceptic, who play according to a certain protocol. They obtain the most interesting results for what they call coherent probability protocols. This section is devoted to explaining what this means.

2.1. Reality’s event tree

We begin with a first and basic assumption, dealing with how the first player, Reality, plays.

  1. Reality makes a number of moves, where the possible next moves may depend on the previous moves he has made, but do not in any way depend on the previous moves made by Sceptic.

This means that we can represent his game-play by an event tree (see also [26, 28] for more information about event trees). We restrict ourselves here to the discussion of bounded protocols, where Reality makes only a finite and bounded number of moves from the beginning to the end of the game, whatever happens. But we don’t exclude the possibility that at some point in the tree, Reality has the choice between an infinite number of next moves. We shall come back to these assumptions further on, once we have the appropriate notational tools to make them more explicit.444Essentially, the width of the tree may be infinite, but its depth should be finite.

Figure 1. A simple event tree for Reality, displaying the initial situation , other non-terminal situations (such as ) as grey circles, and paths, or terminal situations, (such as ) as black circles. Also depicted is a cut of . Observe that (strictly) precedes : , and that is the children cut of .

Let us establish some terminology related to Reality’s event tree.

2.1.1. Paths, situations and events

A path in the tree represents a possible sequence of moves for Reality from the beginning to the end of the game. We denote the set of all possible paths by , the sample space of the game.

A situation is some connected segment of a path that is initial, i.e., starts at the root of the tree. It identifies the moves Reality has made up to a certain point, and it can be identified with a node in the tree. We denote the set of all situations by . It includes the set of terminal situations, which can be identified with paths. All other situations are called non-terminal; among them is the initial situation , which represents the empty initial segment. See Fig. 1 for a simple graphical example explaining these notions.

If for two situations and , is a(n initial) segment of , then we say that precedes or that follows , and write , or alternatively . If is a path and then we say that the path goes through situation . We write , and say that strictly precedes , if and .

An event is a set of paths, or in other words, a subset of the sample space: . With an event , we can associate its indicator , which is the real-valued map on that assumes the value on , and elsewhere.

We denote by the set of all paths that go through : is the event that corresponds to Reality getting to a situation . It is clear that not all events will be of the type . Shafer [27] calls events of this type exact. Further on, in Section 4, exact events will be the only events that can be legitimately conditioned on, because they are the only events that can be foreseen may occur as part of Reality’s game-play.

2.1.2. Cuts of a situation

Call a cut of a situation any set of situations that follow , and such that for all paths through , there is a unique that goes through. In other words:

  1. ; and

  2. ;

see also Fig. 1. Alternatively, a set of situations is a cut of if and only if the corresponding set of exact events is a partition of the exact event . A cut can be interpreted as a (complete) stopping time.

If a situation precedes (follows) some element of a cut of , then we say that precedes (follows) , and we write (). Similarly for ‘strictly precedes (follows)’. For two cuts and of , we say that precedes if each element of is followed by some element of .

A child of a non-terminal situation is a situation that immediately follows it. The set of children of constitutes a cut of , called its children cut. Also, the set of terminal situations is a cut of , called its terminal cut. The event is the corresponding terminal cut of a situation .

2.1.3. Reality’s move spaces

We call a move for Reality in a non-terminal situation an arc that connects with one of its children , meaning that is the concatenation of the segment and the arc . See Fig. 2.

Figure 2. An event tree for Reality, with the move space and the corresponding children cut of a non-terminal situation .

Reality’s move space in is the set of those moves that Reality can make in : . We have already mentioned that may be (countably or uncountably) infinite: there may be situations where reality has the choice between an infinity of next moves. But every should contain at least two elements: otherwise there is no choice for Reality to make in situation .

2.2. Processes and variables

We now have all the necessary tools to represent Reality’s game-play. This game-play can be seen as a basis for an event-driven, rather than a time-driven, account of a theory of uncertain, or random, processes. The driving events are, of course, the moves that Reality makes.555These so-called Humean events shouldn’t be confused with the Moivrean events we have considered before, and which are subsets of the sample space . See Shafer [27, Chapter 1] for terminology and more explanation. In a theory of processes, we generally consider things that depend on (the succession of) these moves. This leads to the following definitions.

Any (partial) function on the set of situations is called a process, and any process whose domain includes all situations that follow a situation is called a -process. Of course, a -process is also an -process for all ; when we call it an -process, this means that we are restricting our attention to its values in all situations that follow .

A special example of a -process is the distance which for any situation returns the number of steps along the tree from to . When we said before that we are only considering bounded protocols, we meant that there is a natural number such that for all situations and all .

Similarly, any (partial) function on the set of paths is called a variable, and any variable on whose domain includes all paths that go through a situation is called a -variable. If we restrict a -process to the set of all terminal situations that follow , we obtain a -variable, which we denote by .

If is a cut of , then we call a -variable -measurable if for all in , assumes the same value for all paths that go through . In that case we can also consider as a variable on , which we denote as .

If is a -process, then with any cut of we can associate a -variable , which assumes the same value in all that follow . This -variable is clearly -measurable, and can be considered as a variable on . This notation is consistent with the notation introduced earlier.

Similarly, we can associate with a new, -stopped, -process , as follows:

The -variable is -measurable, and is actually equal to :

(1)

The following intuitive example will clarify these notions.

Example 1 (Flipping coins).

Consider flipping two coins, one after the other. This leads to the event tree depicted in Fig. 3. The identifying labels for the situations should be intuitively clear: e.g., in the initial situation ‘’ none of the coins have been flipped, in the non-terminal situation ‘’ the first coin has landed ‘heads’ and the second coin hasn’t been flipped yet, and in the terminal situation ‘’ both coins have been flipped and have landed ‘tails’.

Figure 3. The event tree associated with two successive coin flips. Also depicted are two cuts, and , of the initial situation.

First, consider the real process , which in each situation , returns the number of heads obtained so far, e.g., and . If we restrict the process to the set of all terminal elements, we get a real variable , whose values are: , and .

Consider the cut of the initial situation, which corresponds to the following stopping time: “stop after two flips, or as soon as an outcome is heads”; see Fig. 3. The values of the corresponding variable are given by: , and . So is -measurable, and can therefore be considered as a map on the elements and and of , with in particular .

Next, consider the processes , defined as follows:

returns the outcome of the latest, the outcome of the first, and that of the second coin flip. The associated variables and give, in each element of the sample space, the respective outcomes of the first and second coin flips.

The variable is -measurable: as soon as we reach (any situation on) the cut , its value is completely determined, i.e., we know the outcome of the first coin flip; see Fig. 3 for the definition of .

We can associate with the process the variable that is also -measurable: it returns, in any element of the sample space, the outcome of the first coin flip. Alternatively, we can stop the process after one coin flip, which leads to the -stopped process . This new process is of course equal to , and for the corresponding variable , we have that ; also see Eq. (1).

2.3. Sceptic’s game-play

We now turn to the other player, Sceptic. His possible moves may well depend on the previous moves that Reality has made, in the following sense. In each non-terminal situation , he has some set of moves available to him, called Sceptic’s move space in . We make the following assumption:

  1. In each non-terminal situation , there is a (positive or negative) gain for Sceptic associated with each of the possible moves in that Sceptic can make. This gain depends only on the situation and the next move that Reality will make.

This means that for each non-terminal situation there is a gain function , such that represents the change in Sceptic’s capital in situation when he makes move and Reality makes move .

2.3.1. Strategies and capital processes

Let us introduce some further notions and terminology related to Sceptic’s game-play. A strategy for Sceptic is a partial process defined on the set of non-terminal situations, such that is the corresponding move that Sceptic will make in each non-terminal situation .

With each such strategy there corresponds a capital process , whose value in each situation gives us Sceptic’s capital accumulated so far, when he starts out with zero capital in and plays according to the strategy . It is given by the recursion relation

with initial condition . Of course, when Sceptic starts out (in ) with capital and uses strategy , his corresponding accumulated capital is given by the process . In the terminal situations, his accumulated capital is then given by the real variable .

If we start in a non-terminal situation , rather than in , then we can consider -strategies that tell Sceptic how to move starting from onwards, and the corresponding capital process is then also a -process, that tells us how much capital Sceptic has accumulated since starting with zero capital in situation and using -strategy .

2.3.2. Lower and upper prices

The assumptions G1 and G2 outlined above determine so-called gambling protocols. They are sufficient for us to be able to define lower and upper prices for real variables.

Consider a non-terminal situation and a real -variable . The upper price for in is defined as the infimum capital that Sceptic has to start out with in in order that there would be some -strategy such that his accumulated capital allows him, at the end of the game, to hedge , whatever moves Reality makes after :

(2)

where is taken to mean that for all terminal situations that go through . Similarly, for the lower price for in :

(3)

so . If we start from the initial situation , we simply get the upper and lower prices for a real variable , which we also denote by and .

2.3.3. Coherent probability protocols

Requirements G1 and G2 for gambling protocols allow the moves, move spaces and gain functions for Sceptic to be just about anything. We now impose further conditions on Sceptic’s move spaces.

A gambling protocol is called a probability protocol when besides G1 and G2, two more requirements are satisfied.

  1. For each non-terminal situation , Sceptic’s move space is a convex cone in some linear space: for all non-negative real numbers and and all and in .

  2. For each non-terminal situation , Sceptic’s gain function has the following linearity property: for all non-negative real numbers and , all and in and all in .

Finally, a probability protocol is called coherent666For a discussion of the use of ‘coherent’ here, we refer to [29, Appendix C]. when moreover:

  1. For each non-terminal situation , and for each in there is some in such that .

It is clear what this last requirement means: in each non-terminal situation, Reality has a strategy for playing from onwards such that Sceptic can’t (strictly) increase his capital from onwards, whatever -strategy he might use.

For such coherent probability protocols, Shafer and Vovk prove a number of interesting properties for the corresponding lower (and upper) prices. We list a number of them here. For any real -variable , we can associate with a cut of another special -measurable -variable by , for all paths through , where is the unique situation in that goes through. For any two real -variables and , is taken to mean that for all paths that go through .

Proposition 1 (Properties of lower and upper prices in a coherent probability protocol [30]).

Consider a coherent probability protocol, let be a non-terminal situation, , and real -variables, and a cut of . Then

  1. [convexity];

  2. [super-additivity];

  3. for all real [non-negative homogeneity];

  4. for all real [constant additivity];

  5. for all real [normalisation];

  6. implies that [monotonicity];

  7. [law of iterated expectation].

What is more, Shafer and Vovk use specific instances of such coherent probability protocols to prove various limit theorems (such as the law of large numbers, the central limit theorem, the law of the iterated logarithm), from which they can derive, as special cases, the well-known measure-theoretic versions. We shall come back to this in Section 

6.

The game-theoretic account of probability we have described so far, is very general. But it seems to pay little or no attention to beliefs that Sceptic, or other, perhaps additional players in these games might entertain about how Reality will move through its event tree. This might seem strange, because at least according to the personalist and epistemicist school, probability is all about beliefs. In order to find out how we can incorporate beliefs into the game-theoretic framework, we now turn to Walley’s imprecise probability models.

3. Walley’s behavioural approach to probability

In his book on the behavioural theory of imprecise probabilities [34], Walley considers many different types of related uncertainty models. We shall restrict ourselves here to the most general and most powerful one, which also turns out to be the easiest to explain, namely coherent sets of really desirable gambles; see also [36].

Consider a non-empty set of possible alternatives , only one of which actually obtains (or will obtain); we assume that it is possible, at least in principle, to determine which alternative does so. Also consider a subject who is uncertain about which possible alternative actually obtains (or will obtain). A gamble on is a real-valued map on , and it is interpreted as an uncertain reward, expressed in units of some predetermined linear utility scale: if actually obtains, then the reward is , which may be positive or negative. We use the notation for the set of all gambles on . Walley [34] assumes gambles to be bounded. We make no such boundedness assumption here.777The concept of a really desirable gamble (at least formally) allows for such a generalisation, because the coherence axioms for real desirability nowhere hinge on such a boundedness assumption, at least not from a technical mathematical point of view.

If a subject accepts a gamble , this is taken to mean that she is willing to engage in the transaction where, (i) first it is determined which obtains, and (ii) then she receives the reward . We can try and model the subject’s beliefs about by considering which gambles she accepts.

3.1. Coherent sets of really desirable gambles

Suppose our subject specifies some set of gambles she accepts, called a set of really desirable gambles. Such a set is called coherent if it satisfies the following rationality requirements:

  1. if then [avoiding partial loss];

  2. if then [accepting partial gain];

  3. if and belong to then their (point-wise) sum also belongs to [combination];

  4. if belongs to then its (point-wise) scalar product also belongs to for all non-negative real numbers [scaling].

Here ‘’ means ‘ and not ’. Walley has also argued that, besides D1–D4, sets of really desirable gambles should satisfy an additional axiom:

  1. is -conglomerable for any partition of : if for all , then also [full conglomerability].

When the set is finite, all its partitions are finite too, and therefore full conglomerability becomes a direct consequence of the finitary combination axiom D3. But when is infinite, its partitions may be infinite too, and then full conglomerability is a very strong additional requirement, that is not without controversy. If a model is -conglomerable, this means that certain inconsistency problems when conditioning on elements of are avoided; see [34] for more details and examples. Conglomerability of belief models wasn’t required by forerunners of Walley, such as Williams [40],888Axioms related to (D1)–(D4), but not (D5), were actually suggested by Williams for bounded gambles. But it seems that we need at least some weaker form of (D5), namely the cut conglomerability (D5’) considered further on, to derive our main results: Theorems 3 and 6. or de Finetti [11]. While we agree with Walley that conglomerability is a desirable property for sets of really desirable gambles, we do not believe that full conglomerability is always necessary: it seems that we only need to require conglomerability with respect to those partitions that we actually intend to condition our model on.999The view expressed here seems related to Shafer’s, as sketched near the end of [25, Appendix 1]. This is the path we shall follow in Section 4.

3.2. Conditional lower and upper previsions

Given a coherent set of really desirable gambles, we can define conditional lower and upper previsions as follows: for any gamble and any non-empty subset of , with indicator ,

(4)
(5)

so , and the lower prevision of , conditional on is the supremum price for which the subject will buy the gamble , i.e., accept the gamble , contingent on the occurrence of . Similarly, the upper prevision of , conditional on is the infimum price for which the subject will sell the gamble , i.e., accept the gamble , contingent on the occurrence of .

For any event , we define the conditional lower probability , i.e., the subject’s supremum rate for betting on the event , contingent on the occurrence of , and similarly for .

We want to stress here that by its definition [Eq. (5)], is a conditional lower prevision on what Walley [34, Section 6.1] has called the contingent interpretation: it is a supremum acceptable price for buying the gamble contingent on the occurrence of , meaning that the subject accepts the contingent gambles , , which are called off unless occurs. This should be contrasted with the updating interpretation for the conditional lower prevision , which is a subject’s present (before the occurrence of ) supremum acceptable price for buying after receiving the information that has occurred (and nothing else!). Walley’s Updating Principle [34, Section 6.1.6], which we shall accept, and use further on in Section 4, (essentially) states that conditional lower previsions should be the same on both interpretations. There is also a third way of looking at a conditional lower prevision , which we shall call the dynamic interpretation, and where stands for the subject’s supremum acceptable buying price for after she gets to know has occurred. For precise conditional previsions, this last interpretation seems to be the one considered in [13, 23, 24, 29]. It is far from obvious that there should be a relation between the first two and the third interpretations.101010In [29], the authors seem to confuse the updating interpretation with the dynamic interpretation when they claim that “[their new understanding of lower and upper previsions] justifies Peter Walley’s updating principle”. We shall briefly come back to this distinction in the following sections.

For any partition of , we let be the gamble on that in any element of assumes the value , where is any element of .

The following properties of conditional lower and upper previsions associated with a coherent set of really desirable bounded gambles were (essentially) proved by Walley [34], and by Williams [40]. We give the extension to potentially unbounded gambles:

Proposition 2 (Properties of conditional lower and upper previsions [34]).

Consider a coherent set of really desirable gambles , let be any non-empty subset of , and let , and be gambles on . Then111111Here, as in Proposition 1, we implicitly assume that whatever we write down is well-defined, meaning that for instance no sums of and appear, and that the function is real-valued, and nowhere infinite. Shafer and Vovk don’t seem to mention the need for this.

  1. [convexity];

  2. [super-additivity];

  3. for all real [non-negative homogeneity];

  4. for all real [constant additivity];

  5. for all real [normalisation];

  6. implies that [monotonicity];

  7. if is any partition of that refines the partition and is -conglomerable, then [conglomerative property].

The analogy between Propositions 1 and 2 is striking, even if there is an equality in Proposition 1.7, where we have only an inequality in Proposition 2.7.121212Concatenation inequalities for lower prices do appear in the more general context described in [29]. In the next section, we set out to identify the exact correspondence between the two models. We shall find a specific situation where applying Walley’s theory leads to equalities rather than the more general inequalities of Proposition 2.7.131313This seems to happen generally for what is called marginal extension in a situation of immediate prediction, meaning that we start out with, and extend, an initial model where we condition on increasingly finer partitions, and where the initial conditional model for any partition deals with gambles that are measurable with respect to the finer partitions; see [34, Theorem 6.7.2] and [20].

We now show that there can indeed be a strict inequality in Proposition 2.7.

Example 2.

Consider an urn with red, green and blue balls, from which a ball will be drawn at random. Our subject is uncertain about the colour of this ball, so . Assume that she assesses that she is willing to bet on this colour being red at rates up to (and including) , i.e., that she accepts the gamble . Similarly for the other two colours, so she also accepts the gambles and . It is not difficult to prove using the coherence requirements D1–D4 and Eq. (5) that the smallest coherent set of really desirable gambles that includes these assessments satisfies , where

For the partition (a Daltonist has observed the colour of the ball and tells the subject about it), it follows from Eq. (5) after some manipulations that

If we consider , then in particular and , so and therefore

whereas , and therefore .

The difference between infimum selling and supremum buying prices for gambles represents imprecision present in our subject’s belief model. If we look at the inequalities in Proposition 2.1, we are led to consider two extreme cases. One extreme maximises the ‘degrees of imprecision’ by letting and . This leads to the so-called vacuous model, corresponding to , and intended to represent complete ignorance on the subject’s part.

The other extreme minimises the degrees of imprecision by letting everywhere. The common value is then called the prevision, or fair price, for conditional on . We call the corresponding functional a (conditional) linear prevision. Linear previsions are the precise probability models considered by de Finetti [11]. They of course have all properties of lower and upper previsions listed in Proposition 2, with equality rather than inequality for statements 2 and 7. The restriction of a linear prevision to (indicators of) events is a finitely additive probability measure.

4. Connecting the two approaches

In order to lay bare the connections between the game-theoretic and the behavioural approach, we enter Shafer and Vovk’s world, and consider another player, called Forecaster, who, in situation , has certain piece-wise beliefs about what moves Reality will make.

4.1. Forecaster’s local beliefs

More specifically, for each non-terminal situation , she has beliefs (in situation ) about which move Reality will choose from the set of moves available to him if he gets to . We suppose she represents those beliefs in the form of a coherent141414Since we don’t immediately envisage conditioning this local model on subsets of , we impose no extra conglomerability requirements here, only the coherence conditions D1–D4. set of really desirable gambles on . These beliefs are conditional on the updating interpretation, in the sense that they represent Forecaster’s beliefs in situation about what Reality will do immediately after he gets to situation . We call any specification of such coherent , , an immediate prediction model for Forecaster. We want to stress here that should not be interpreted dynamically, i.e., as a set of gambles on that Forecaster accepts in situation .

We shall generally call an event tree, provided with local predictive belief models in each of the non-terminal situations , an imprecise probability tree. These local belief models may be coherent sets of really desirable gambles . But they can also be lower previsions (perhaps derived from such sets ). When all such local belief models are precise previsions, or equivalently (finitely additive) probability measures, we simply get a probability tree in Shafer’s [27, Chapter 3] sense.

4.2. From local to global beliefs

We can now ask ourselves what the behavioural implications of these conditional assessments in the immediate prediction model are. For instance, what do they tell us about whether or not Forecaster should accept certain gambles151515In Shafer and Vovk’s language, gambles are real variables. on , the set of possible paths for Reality? In other words, how can these beliefs (in ) about which next move Reality will make in each non-terminal situation be combined coherently into beliefs (in ) about Reality’s complete sequence of moves?

In order to investigate this, we use Walley’s very general and powerful method of natural extension, which is just conservative coherent reasoning. We shall construct, using the local pieces of information , a set of really desirable gambles on for Forecaster in situation that is (i) coherent, and (ii) as small as possible, meaning that no more gambles should be accepted than is actually required by coherence.

4.2.1. Collecting the pieces

Consider any non-terminal situation and any gamble in . With we can associate a -gamble,161616Just as for variables, we can define a -gamble as a partial gamble whose domain includes . also denoted by , and defined by

for all , where we denote by the unique element of such that . The -gamble is -measurable for any cut of that is non-trivial, i.e., such that . This implies that we can interpret as a map on . In fact, we shall even go further, and associate with the gamble on a -process, also denoted by , by letting for any , where is any terminal situation that follows ; see also Fig. 4.

Figure 4. In a non-terminal situation , we consider a gamble on Reality’s move space that Forecaster accepts, and turn it into a process, also denoted by . The values in situations are indicated by curly arrows.

represents the gamble on that is called off unless Reality ends up in situation , and which, when it isn’t called off, depends only on Reality’s move immediately after , and gives the same value to all paths that go through . The fact that Forecaster, in situation , accepts on conditional on Reality’s getting to , translates immediately to the fact that Forecaster accepts the contingent gamble on , by Walley’s Updating Principle. We thus end up with a set

of gambles on that Forecaster accepts in situation .

The only thing left to do now, is to find the smallest coherent set of really desirable gambles that includes (if indeed there is any such coherent set). Here we take coherence to refer to conditions D1–D4, together with D5’, a variation on D5 which refers to conglomerability with respect to those partitions that we actually intend to condition on, as suggested in Section 3.

4.2.2. Cut conglomerability

These partitions are what we call cut partitions. Consider any cut of the initial situation . The set of events is a partition of , called the -partition. D5’ requires that our set of really desirable gambles should be cut conglomerable, i.e., conglomerable with respect to every cut partition .171717Again, when all of Reality’s move spaces are finite, cut conglomerability (D5’) is a consequence of D3, and therefore needs no extra attention. But when some or all move spaces are infinite, then a cut may contain an infinite number of elements, and the corresponding cut partition will then be infinite too, making cut conglomerability a non-trivial additional requirement.

Why do we only require conglomerability for cut partitions? Simply because we are interested in predictive inference: we eventually will want to find out about the gambles on that Forecaster accepts in situation , conditional (contingent) on Reality getting to a situation . This is related to finding lower previsions for Forecaster conditional on the corresponding events . A collection of such events constitutes a partition of the sample space if and only if is a cut of .

Because we require cut conglomerability, it follows in particular that will contain the sums of gambles for all non-terminal cuts of and all choices of , . This is because for all . Because moreover should be a convex cone [by D3 and D4], any sum of such sums over a finite number of non-terminal cuts should also belong to . But, since in the case of bounded protocols we are discussing here, Reality can only make a bounded and finite number of moves, is a finite union of such non-terminal cuts, and therefore the sums should belong to for all choices , .

4.2.3. Selections and gamble processes

Consider any non-terminal situation , and call -selection any partial process defined on the non-terminal such that . With a -selection , we associate a -process , called a gamble process, where

(6)

in all situations ; see also Fig. 5. Alternatively, is given by the recursion relation

for all non-terminal , with initial value . In particular, this leads to the -gamble defined on all terminal situations that follow , by letting

(7)

Then we have just argued that the gambles should belong to for all non-terminal situations and all -selections . As before for strategy and capital processes, we call a -selection simply a selection, and a -gamble process simply a gamble process.

Figure 5. The -selection in this event tree is a process defined in the two non-terminal situations and ; it selects, in each of these situations, a really desirable gamble for Forecaster. The values of the corresponding gamble process are indicated by curly arrows.

4.2.4. The Marginal Extension Theorem

It is now but a technical step to prove Theorem 3 below. It is a significant generalisation, in terms of sets of really desirable gambles rather than coherent lower previsions,181818The difference in language may obscure that this is indeed a generalisation. But see Theorem 7 for expressions in terms of predictive lower previsions that should make the connection much clearer. of the Marginal Extension Theorem first proved by Walley [34, Theorem 6.7.2], and subsequently extended by De Cooman and Miranda [20].

Theorem 3 (Marginal Extension Theorem).

There is a smallest set of gambles that satisfies D1–D4 and D5’ and includes . This natural extension of is given by

Moreover, for any non-terminal situation and any -gamble , it holds that if and only if there is some -selection such that , where as before, is taken to mean that for all terminal situations that follow .

4.3. Predictive lower and upper previsions

We now use the coherent set of really desirable gambles to define special lower previsions for Forecaster in situation , conditional on an event , i.e., on Reality getting to situation , as explained in Section 3.191919We stress again that these are conditional lower previsions on the contingent/updating interpretation. We shall call such conditional lower previsions predictive lower previsions. We then get, using Eq. (5) and Theorem 3, that for any non-terminal situation ,

(8)
(9)

We also use the notation . It should be stressed that Eq. (8) is also valid in terminal situations , whereas Eq. (9) clearly isn’t.

Besides the properties in Proposition 2, which hold in general for conditional lower and upper previsions, the predictive lower (and upper) previsions we consider here also satisfy a number of additional properties, listed in Propositions 4 and 5.

Proposition 4 (Additional properties of predictive lower and upper previsions).

Let be any situation, and let , and be gambles on .

  1. If is a terminal situation , then ;

  2. and ;

  3. (on ) implies that [monotonicity].

Before we go on, there is an important point that must be stressed and clarified. It is an immediate consequence of Proposition 4.2 that when and are any two gambles that coincide on , then . This means that is completely determined by the values that assumes on , and it allows us to define on gambles that are only necessarily defined on , i.e., on -gambles. We shall do so freely in what follows.

For any cut of a situation , we may define the -gamble as the gamble that assumes the value in any , where . This -gamble is -measurable by construction, and it can be considered as a gamble on .

Proposition 5 (Separate coherence).

Let be any situation, let be any cut of , and let and be -gambles, where is -measurable.

  1. ;

  2. ;

  3. ;

  4. if is moreover non-negative, then .

4.4. Correspondence between immediate prediction models and coherent probability protocols

There appears to be a close correspondence between the expressions [such as (3)] for lower prices associated with coherent probability protocols and those [such as (9)] for the predictive lower previsions based on an immediate prediction model. Say that a given coherent probability protocol and given immediate prediction model match whenever they lead to identical corresponding lower prices and predictive lower previsions for all non-terminal .

The following theorem marks the culmination of our search for the correspondence between Walley’s, and Shafer and Vovk’s approaches to probability theory.

Theorem 6 (Matching Theorem).

For every coherent probability protocol there is an immediate prediction model such that the two match, and conversely, for every immediate prediction model there is a coherent probability protocol such that the two match.

The ideas underlying the proof of this theorem should be clear. If we have a coherent probability protocol with move spaces and gain functions for Sceptic, define the immediate prediction model for Forecaster to be (essentially) . If, conversely, we have an immediate prediction model for Forecaster consisting of the sets , define the move spaces for Sceptic by , and his gain functions by for all in . We discuss the interpretation of this correspondence in more detail in Section 5.

4.5. Calculating predictive lower prevision using backwards recursion

The Marginal Extension Theorem allows us to calculate the most conservative global belief model that corresponds to the local immediate prediction models . Here beliefs are expressed in terms of sets of really desirable gambles. Can we derive a result that allows us to do something similar for the corresponding lower previsions?

To see what this question entails, first consider a local model : a set of really desirable gambles on , where