A betting interpretation for probabilities and Dempster-Shafer degrees of belief

01/11/2010 ∙ by Glenn Shafer, et al. ∙ 0

There are at least two ways to interpret numerical degrees of belief in terms of betting: (1) you can offer to bet at the odds defined by the degrees of belief, or (2) you can judge that a strategy for taking advantage of such betting offers will not multiply the capital it risks by a large factor. Both interpretations can be applied to ordinary additive probabilities and used to justify updating by conditioning. Only the second can be applied to Dempster-Shafer degrees of belief and used to justify Dempster's rule of combination.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The meaning of numerical probability has long been a matter of contention. Siméon Denis Poisson (1781–1840) distinguished between objective and subjective probabilities [12]. One recent philosophical introduction to probability lists five competing interpretations: classical, frequency, propensity, logical, and subjective [8].

The classical and subjective interpretations both involve betting. In the classical interpretation, the probability of an event is the correct price for a payoff that will equal one monetary unit if the event happens and zero otherwise. In the subjective interpretation, it is the price an individual is willing to pay for this payoff.

This article explains another betting interpretation of probability. Here I call it the Ville interpretation, in recognition of Jean André Ville (1910–1989), who first formulated it in his book on collectives [22]. Probabilities are prices under the Ville interpretation, just as they are under the classical and subjective interpretations. But instead of asserting that these prices are correct in some unspecified sense (as in the classical interpretation) or that some individual will pay them (as in the subjective interpretation), we assert that no strategy for taking advantage of them will multiply the capital it risks by a large factor. The Ville interpretation derives from an older interpretation of probability, neglected in the English-language literature, which I call the Cournot interpretation after Antoine Augustin Cournot (1801–1877). According to the Cournot interpretation, the meaning of a probabilistic theory lies in the predictions that it makes with high probability.

As I explain in this article, the Ville interpretation can be applied both to ordinary additive probabilities and to the non-additive degrees of belief of the Dempster-Shafer calculus of belief functions. It works for Dempster-Shafer degrees of belief in ways that the subjective interpretation does not.

2 The Ville interpretation

This section reviews how the Ville interpretation emerges from older ideas and how it extends probability theory beyond its classical domain to games where the probabilities given and prices offered fall short of defining a probability distribution for all events of interest. In Section 

2.1, I review briefly the history of the Cournot interpretation of ordinary probabilities. In Section 2.2, I explain how the Ville interpretation is related to the Cournot interpretation. In Section 2.3, I illustrate the power of the Ville interpretation using the example of probability forecasting, and in Section 2.4, I explain its role more generally in game-theoretic probability.

2.1 Cournot

The standard procedure for testing a probabilistic theory involves picking out an event to which the theory gives very small probability: we reject the theory if the event happens. In fact, this seems to be the only way to test a probabilistic theory. Because Cournot was the first to state that mathematical probability makes contact with phenomena only by ruling out events given very small probability ([3], p. 58), the prediction that

an event of very small probability will not happen (1)

has been called Cournot’s principle. In the first half of the twentieth century, many European scholars, including Émile Borel, Paul Lévy, Maurice Fréchet, and Andrei Kolmogorov, contended that Cournot’s principle is fundamental to the meaning and use of mathematical probability [20]. As Borel said, we evoke “the only law of chance” when we single out an event of very small probability and predict it will not happen. (Or when, equivalently, we single out an event of very high probability and predict that it will happen.) Let us call the thesis that such predictions constitute the meaning of probability the Cournot interpretation of probability.

Cournot, Fréchet, and Kolmogorov are often called frequentists. This is misleading. These authors did believe that the probability of an event will be approximated by the frequency with which it happens in independent trials, but they considered this “law of large numbers” a consequence of Cournot’s principle together with Bernoulli’s theorem, which gives very high probability to the approximation holding. The true frequentists, such as John Venn, saw no sense in Bernoulli’s theorem; probability is frequency, they believed, and so it is silly to try to prove that frequency will approximate probability

[21].

Of course, events of very small probability do happen. An experiment may have a very large number of possible outcomes, each of which has very small probability, and one of which must happen. So Cournot’s principle makes sense only if we are talking about particular events of very small probability that are salient for some reason: perhaps because they are so simple, perhaps because they have high probability under a plausible alternative hypothesis, or perhaps simply because they were specified in advance. There may be a substantial number of events that are salient in this way, but this is not a problem if we set our threshold for small probability low enough, because the disjunction of a number of events with very small probably will still have reasonably small probability.

In order to put the Cournot interpretation into practice, we must also decide how small a probability we can neglect. This evidently depends on the context. Borel distinguished between what was negligible at the human level, at the terrestrial level, and at the cosmic level [2].

In using the Cournot interpretation, we must also bear in mind its role in testing and giving meaning to a probabilistic theory as a whole. Strictly speaking, it gives direct meaning only to probabilities that are very small (the event will not happen) or very large (the event will happen). It gives no meaning to a probability of , say. But when a probabilistic theory says that many successive events are independent and all have probability , it gives probabilities close to one for many aspects of this sequence of events. Probabilistic theories in which probabilities evolve (stochastic processes) also give probabilities close to one to many statements concerning what happens over time, so they can also be tested and acquire meaning by Cournot’s principle.

Although it was widely accepted in continental Europe in the middle of the twentieth century, the Cournot interpretation never gained a significant foothold in the English-language literature, and awareness of it receded as English became the language of science and mathematics after World War II. We find only isolated affirmations of it after about 1970. In the article on probability in the Soviet Mathematical Encyclopedia, for example, we find the assertion that only probabilities close to zero or one have empirical meaning [13]. For more on the history of the Cournot interpretation see [9, 10, 11, 16, 20].

2.2 From Cournot to Ville

When a probability distribution is used to set betting odds, there is a well known relationship between the happening of events of small probability and the success of betting strategies. The event that a given betting strategy multiplies the capital it risks by or more has probability or less. Conversely, for every event of probability or less there is a bet that multiplies the capital it risks by or more if the event happens. So it is natural to consider, as an alternative to Cournot’s principle, the principle that

a strategy will not multiply the capital it risks by a large factor. (2)

Let us call this Ville’s principle. Let us call the thesis that predictions of the form (2) constitute the meaning of probability the Ville interpretation of probability.

Ville’s principle is equivalent to Cournot’s principle whenever a probability distribution is given for the events being considered and the two principles are made specific, with the specific event and small probability mentioned in Cournot’s principle matching the specific strategy and large factor mentioned in Ville’s principle. But when the two principles are considered more abstractly, without and the particular event or strategy being specified, they differ in two important respects:

  1. Ville’s principle gives us more guidance than Cournot’s principle. It tells us to specify a strategy for betting, not merely a single event of small probability. We found it necessary to elaborate Cournot’s principle by saying that the event of very small probability should be specified in advance. The corresponding coda for Ville’s principle is also needed, but it is less easily overlooked, because a betting strategy cannot be implemented unless it is specified in advance.

  2. Ville’s principle has a broader scope than Cournot’s principle. Cournot’s principle applies only when there is a probability distribution for the events under discussion. Ville’s principle applies whenever prices for gambles are given, even if these prices fall short of defining probabilities for events.

To see some of the implications of Ville’s principle giving us more guidance, consider how testing is usually implemented. A test of a probabilistic theory usually begins with a test statistic, say

, where is an outcome that is to be observed. If the theory specifies a probability distribution for , then we reject the theory at the significance level when we observe a value such that

where is a number such that . Ville’s principle tells us to implement this idea in a particular way: our test statistic is the capital achieved by a specified betting strategy that starts with some initial capital and does not risk losing more than . We reject the theory at the significance level when we observe a value such that

Markov’s inequality tells us that .111

In general, Markov’s inequality says that a nonnegative random variable

satisfies
Because the betting strategy uses the odds set by , the expected value of the final capital is the initial capital . Because the strategy risks only the initial capital, the final capital cannot be negative.

When we adopt a betting strategy with which to test a probability distribution , we are implicitly specifying an alternative hypothesis that we can plausibly adopt if we reject . To see that this is so, let us suppose, for simplicity, that (the strategy risks one unit of capital), and that there are only finitely or countably many possible values for . In this case, we can define by

(3)

It is easy to see that is a probability distribution: (1) because is a probability and is the final capital for a betting strategy that does not risk its capital becoming negative, and (2) because it is the expected payoff under of a gamble that costs one unit. Equation (3) tells us that the final capital is the likelihood ratio , a measure of how much the observed outcome favors over .

2.3 Probability forecasting

As a first example of how Ville’s principle and the Ville interpretation apply even when prices offered fall short of defining a probability distribution for all events of interest, consider a game in which a forecaster announces probabilities successively, observing the outcome of each preceding event before giving the next probability:


Probability Forecasting Game

.
FOR :
Forecaster announces .
Skeptic announces .
Reality announces .
. (4)


This is a perfect-information game; the three players move in sequence, and they all see each move as it is made. The game continues for rounds.

The number can be thought of as the price of a ticket that pays the amount . Skeptic can buy any number of the tickets. Since he pays for each ticket and receives in return, his net payoff is . The number can be positive or negative. By choosing positive, Skeptic buys tickets; by choosing negative, he sells tickets.

Within the game, the are simply prices. But we think of them as Forecaster’s probabilities: is Forecaster’s probability that Reality will choose . On the other hand, Forecaster need not have a joint probability distribution for Reality’s moves . He simply chooses as he pleases at each step.

Skeptic tests Forecaster’s by trying to increase his capital using them as prices. If Skeptic succeeds—i.e., if he makes large without risking more than his initial capital , then we conclude that Forecaster is not a good probability forecaster. Ville’s principle says that if Forecaster is a good forecaster, then Skeptic will not achieve a large value for his final capital without risking more than .

What does it mean for Skeptic not to risk more than ? It means that his moves do not allow Reality to make his final capital negative. Since Reality can always keep Skeptic from making money (by choosing if is positive and if is negative), she can make negative as soon as Skeptic lets become negative for any . So in order to deny Reality the option of making negative, Skeptic must choose each so as to deny Reality the option of making negative. By (2.3), this means choosing in the interval

(5)

For brevity, let us say that Skeptic plays safely if he always chooses satisfying (5), and lets us call a strategy for Skeptic safe if it always prescribes satisfying (5).

We can get back to classical probability by assuming that Forecaster follows a strategy based on a joint probability distribution for and perhaps other events outside the game, the strategy being to set equal to ’s conditional probability for

given what has been observed so far. But Ville’s principle is powerful even in the absence of a specified strategy for Forecaster. It is all we need in order to derive various relations, such as the law of the large numbers, the law of the iterated logarithm, and the central limit theorem, that classical probability theory says will hold between the probabilities

and the outcomes . It turns out, for example, that Skeptic can play safely in such a way that either the relative frequency of s among , approximates the average probability forecast, or else becomes very large ([19], p. 125). Because it tells us that will not become very large very large, Ville’s principle therefore implies that will approximate This is a version of the law of large numbers.

2.4 Game-theoretic probability

Probability forecasting is only one example where prices fall short of defining a probability distribution. In many other examples, the shortfall is substantially greater.

One class of such examples arises in finance theory, where the price for a security at the beginning of the day can be thought of as the price for a ticket that pays what the security is worth at the end of the day. Here the roles of Forecaster and Reality are both played by the market that sets the prices, and the role of Skeptic is played by a speculator. Over a period of days, they play a perfect-information game much like our Probability Forecasting Game:


Market Game

.
FOR :
Market announces opening price .
Speculator announces .
Market announces closing price .
.


Here when Speculator goes long in the security, and when he goes short.

A cornerstone of finance theory is the efficient market hypothesis, which states that a speculator cannot expect to make money using publicly available information. Efforts to formulate this hypothesis more precisely usually start with the questionable assumption that market prices are governed, in some sense, by a probability distribution. Ville’s principle offers an alternative way of making the hypothesis precise: we can say that Speculator will not make large while playing safely. This version of the efficient market hypothesis can be tested directly, without making any probabilistic assumptions [27]. It also implies a number of stylized facts about financial markets, including the effect [23] and the relation between the volatility and average of simple returns called the CAPM [25].

Shafer and Vovk [19] give other examples of games where prices fall short of defining a probability distribution. It turns out that many of the usual results of probability theory can be extended to such games, provided that we adopt Ville’s principle. In general, we call the study of such games game-theoretic probability.

The results in [19] are concerned with strategies for Skeptic or Speculator in a probability game; they say that this player can multiply their capital by a large factor if some result in probability theory or finance theory does not hold. It is also fruitful, however, to consider how Forecaster or Market can play against such strategies for Skeptic or Speculator. It turns out that they can do this effectively, and this gives a new method of making predictions, called defensive forecasting [26, 24].

3 The judgement of irrelevance in updating by conditioning

How should Forecaster’s probabilities change when he learns new information?

An important school of thought, called Bayesian in recent decades, contends that when we learn , we should update our probability for from to

(6)

The change is called conditioning. Bayesians acknowledge that it is appropriate only if we judge to be the only relevant information we have learned ([5], Section 11.2.2, [1], p. 45).222The authors just cited, de Finetti and Bernardo and Smith, go on to say that irrelevance usually fails; when we learn we usually learn other information that will also modify our judgement concerning . Nevertheless, updating by (6) is widely taught and implemented.

In this section, I review arguments for the updating rule (6), with attention to how they account for the judgement of relevance and irrelevance. I consider the argument originally given by Abraham De Moivre, the variation given by Bruno de Finetti, and another variation that is based on Ville’s principle. Only the argument from Ville’s principle uses the judgement of relevance.

3.1 De Moivre’s argument

Abraham De Moivre was the first to state the rule of compound probability. In the second edition of his Doctrine of Chances, published in 1738 [6], he stated the rule as follows:

…the Probability of the happening of two Events dependent, is the product of the Probability of the happening of one of them, by the Probability which the other will have of happening, when the first shall have been consider’d as having happen’d…

This rule can be written

(7)

where is the probability of the happening of and , is the probability of the happening of , and is the probability which will have of happening, when shall have been consider’d as having happen’d.

The twentieth century abandoned De Moivre’s way of talking about probabilities. Now we call the conditional probability of given , and we say that it is defined by the equation

(8)

provided that . This makes (7) a trivial consequence of a definition. But for De Moivre, (7) was more substantive. It was a consequence of how probability is related to price.

De Moivre gave an argument for the rule of compound probability on pp. 5–6 of his second edition. He used a language that is somewhat unfamiliar today; he talked about the values of gamblers’ expectations. But it is true to his thinking to say that the probability of an event is the price (or the fair price, if you prefer) for a ticket that pays if the event happens and if it does not happen. (An expectation is the possession a ticket with a uncertain payoff, and its value is the price you should pay for the ticket.) Using the language of tickets, payoffs, and price, we can express his argument as follows:

  1. The price of a ticket that pays if happens is .

  2. Assume one can buy or sell any number of such tickets, even fractional amounts. So is the price of a ticket that pays if happens, where is any real number. (Buying a negative amount means selling.)

  3. After happens (or everyone learns that has happened and nothing else), is the price of a ticket that pays if happens.

  4. So starting with , you can get if happens. You use the to buy a ticket that pays if happens, and then, if does happen, you use the to buy a ticket that pays if also happens.

  5. So is the value of a ticket that pays if happens.

De Moivre’s argument is unconvincing to modern readers because we do not accept his starting point—his unexamined assumption that an expectation has a well defined numerical value. Our positivist heritage demands that such numbers be cashed out in some way that can be observed.

3.2 De Finetti’s version of the argument

Bruno de Finetti (1906–1985) had a way of responding to the positivist challenge. For him, probability is specific to an individual. An individual’s probability for an event is the price the individual sets for a ticket that returns if happens—the price at which he is willing to trade in such tickets, buying or selling as the occasion arises.

As for the conditional probability , de Finetti proposed a betting interpretation that avoids references to a situation after has happened or is known to have happened. For him, is the price of a conditional ticket—the price of a ticket that pays if happens, with the understanding that the transaction is cancelled (the price is refunded and no payoff is made if happens) if does not happen.

With these interpretations, de Finetti was able to formulate a version of De Moivre’s argument that leaves aside the notion of changing probabilities. We situate ourselves at the beginning of the game, as it were, and argue as follows:

  1. is the price at which I am willing to buy or sell tickets that pay if happens.

  2. I am willing to buy or sell any number of such tickets, even fractional amounts. So is the price I will pay for a ticket that pays if happens, where is any real number.

  3. is the price I am willing to pay for a ticket that pays if happens, with the understanding that this price is refunded if does not happen.

  4. It follows that I am willing to pay to get back if and both happen. You can prove this by selling me two tickets:

    • For , a ticket that pays if happens.

    • For , a ticket that pays if and both happen, with the price being refunded if does not happen.

    If and both happen, I end up with , less the I paid for the first ticket; the payoff from the first ticket is cancelled by the cost of the second. If does not happen, I lose only the , the second purchase having been cancelled. If happens but does not, I again lose only the , the cost of the second purchase being cancelled by the payoff on the first.

  5. So is the price I am willing to pay for if happens—i.e., my probability for .

As a coda, we may add de Finetti’s argument for the price being unique. De Moivre had taken it for granted that the value of a thing is unique. De Finetti, using his assumption that we are willing to buy and sell any amount, argued that we must make the probability unique in order to prevent an opponent from extracting an indefinite amount of money from us.

De Finetti’s version of the argument comes closer to modern mathematical rigor than De Moivre’s, because it leaves aside the notion of something being “consider’d as having happen’d”, for which De Moivre gave no set-theoretic exegesis. But some such notion must still be used in order to extend the argument to a justification for using conditional probabilities as one’s new probabilities after something new is learned. We must explain why the price for the conditional ticket on given should not change when and nothing else is learned. There is a large literature on how convincingly this argument can be made; some think it requires that a protocol for new information be fixed and known in advance. See [15] and references therein.

3.3 Making the argument from Ville’s principle

Ville’s principle, like Cournot’s, can usually be applied directly only to a run of events, in which a strategy has time to multiply the capital it risks substantially (or, in the case of Cournot’s principle, we can identify an event of very small probability). So in order to apply Ville’s principle to the problem of changing probabilities that are neither very small nor very large, we must imagine them being embedded in a longer sequence of similar probabilities for similar events. This is how probability judgments are often made: we judge that an event is like an event in some repetitive process for which we know probabilities [18].

In de Finetti’s picture, we make a probability judgement by saying that is the price at which we are willing to buy or sell tickets that pay if happens. (I omit needed caveats: that we buy and sell only to people who have the same knowledge as ourselves, that this is only the price we might be inclined to set if we were inclined to gamble, etc.) In Ville’s picture, we make a probability judgement by saying that if we do offer such bets on , and on a sequence of similar events in similar but independent circumstances, then an opponent will not succeed in multiplying the capital they risk in betting against us by a large factor. Let us abbreviate this to the statement that an opponent will not beat the probability.

In this terminology, our task is to show that the following claim holds:

Suppose we are in a situation where we judge that an opponent will not beat and . Suppose we then learn and nothing more. Then we can include as a new probability for among the probabilities that we judge an opponent will not beat. (9)

In one respect, we are following De Moivre more faithfully than de Finetti did. De Finetti’s mathematical argument is concerned only with prices in a single situation. Here we propose, like De Moivre, to give an argument that relates prices over two situations: an initial situation and a subsequent situation where our additional knowledge is and nothing more. This is normal for the game-theoretic framework reviewed in Sections 2.3 and 2.4; there we apply Ville’s principle to games with many rounds.

Here is the argument for (9) from Ville’s principle:

  1. An opponent will not beat the probabilities and . This means that a strategy for the opponent that buys and sells tickets on and at these prices, along with similar tickets on other events, will not multiply the capital risked by a large factor.

  2. We need to show that this impossibility of multiplying the capital risked still holds for strategies that are also allowed to use as a new probability for after and nothing more is known.

  3. It suffices to show that if is a strategy against all three probabilities ( and in the initial situation and later), then there exists a strategy against the two probabilities ( and in the initial situation) alone that risks no more capital and has the same payoffs as .

  4. Let , which may be positive or negative or zero, be the amount of tickets buys after learning . To construct from , we delete this purchase of tickets and add

    (10)

    to ’s purchases of tickets on and in the initial situation.

    • The tickets in (10) have zero net cost:

      So uses the same capital in the initial situation as .

    • The payoffs of the tickets in (10) are the same as the net payoffs of the tickets deleted from :

      so uses no more capital than after the initial situation and has the same payoffs in the end.

  5. By hypothesis, will not multiply the capital it risks by a large factor. So , which risks the same capital and has the same payoffs, does not either.

See [17] for an extension of this argument to Peter Walley’s updating principle for upper and lower previsions.

3.4 The judgement of irrelevance

The argument from Ville’s principle for using conditional probability as one’s new probability uses the role and implications of knowledge in a way that de Finetti’s argument does not.

  • De Finetti argued for the conditional probability being the price in the initial situation for a conditional purchase—a purchase of a ticket on the condition that happens. He then merely asserted, with no argument, that it should remain the price for this purchase after we learn that happens and nothing more.

  • The Ville argument, in contrast, is truly an argument for being the price for a ticket in the new situation where we have learned that happened and nothing more.

The Ville argument is able to bring knowledge into the story because it looks what can be accomplished by different strategies. What a strategy can accomplish depends on what information is available.

It is important to understand how the caveat “nothing more” enters into the Ville argument. The argument depends on constructing a strategy for the initial situation alone that is equivalent to a strategy that makes additional bets in the later situation where is known. If something more than is known, and uses this additional information as well (’s purchase of the tickets depends on it), then the construction is not possible.

We can of course relax the requirement that nothing more be known than ’s happening. The essential requirement is that nothing more be known that can help an opponent multiply his capital. In this case, we may say that the happening of is our only relevant information. We may have learned many other things by the time or at the time when we learned , but none of them can provide further help to a strategy for betting against the probabilities.

4 Judgements of irrelevance in the Dempster-Shafer calculus

The Dempster-Shafer theory of belief functions extends conditional probability to a calculus for combining probability judgements based on different bodies of evidence. Judgements of irrelevance enter into this calculus explicitly and pervasively. These judgements can be explained in terms of Ville’s principle in the same way as the judgement of irrelevance in the case of updating by conditioning on : they are judgements that once certain information is taken into account, other information is of no help to a strategy for betting against certain probabilities.

In this section, I list the Ville judgements of irrelevance required by various operations in the Dempster-Shafer calculus (Section 4.1), and I discuss how attention to these judgements in applications can strengthen the calculus’s usefulness (Section 4.2).

4.1 Basic operations

The Dempster-Shafer calculus derives from a series of articles by A. P. Dempster, recently republished along with other classic articles on the calculus in [28]. The calculus was described in detail in [14] and reviewed in [7]. Without reviewing the examples and details readers can find in these references, I give here an overview of four related operations: the transfer of belief, conditioning, independent combination, and Dempster’s rule of combination. In each case, I explain the judgement of irrelevance involved.

I omit two other important operations, natural extension and marginalization, because they do not require judgements of irrelevance.

Transfer of belief.

Suppose is a variable, whose possible values form the set , and suppose is a probability distribution on , expressing our probability judgements about the value of . Suppose is another variable, with the set of possible values , for which we do not have a probability distribution. Suppose further that is a multivalued mapping from to (a mapping from to non-empty subsets of ). Then we can define a function on subsets of by setting

(11)

A function defined in this way is called a belief function. We call its degree of belief in .

We can give ’s degrees of belief a Ville interpretation under the following conditions:

  1. The probability distribution has a Ville interpretation: no betting strategy will beat the probabilities it gives for .

  2. The multivalued mapping has this meaning:

    (12)
  3. Learning the relationship (12) between and does not affect the impossibility of beating the probabilities for . (This is the irrelevance judgement.)

The Ville interpretation that follows from these conditions is one-sided: a strategy that buys for tickets that pay if (and makes similar bets on the strength of similar evidence) will not multiply the capital its risks by a large factor.

Conditioning.

Suppose we modify the preceding setup by allowing the subset of to be empty for some . In this case, condition (12) tells us that the event happened, and if we judge that we have learned nothing else that can help a strategy beat ’s probabilities, then we are entitled to condition on this event. This results in replacing (11) by

The judgements of irrelevance that justify this equation can be summarized by saying that aside from the impossibility of the for which , learning (12) does not provide any other information that can help a strategy beat the probabilities for .

Independent combination.

Suppose and are probability distributions on and , respectively, expressing our probability judgements about the values of the variables and , respectively. What judgement is involved when we say further that the product probability measure on expresses our probability judgement about and jointly?

This question is not answered simply by saying that and are probabilistically independent, because probabilistic independence, in modern probability theory, is a property of a joint probability distribution for two variables, not a judgement outside the mathematics that justifies adopting the product distribution as a joint probability distribution for them.

De Finetti’s betting interpretation of probability does give an answer to the question: we should adopt the product distribution if learning the value of one of the variables and nothing else will not change the prices we are willing to offer on the other variable.

The Ville interpretation gives an analogous answer: we should adopt the product distribution if we make the judgement that knowing the value of one of the variables and nothing more would not help a strategy beat the probabilities for the other variable.

Dempster-Shafer theory extends the idea of independent combination to belief functions, by considering two multivalued mappings, say a mapping from to non-empty subsets of and a mapping from to non-empty subsets of . Suppose and have these meanings, where and are variables that take values in and , respectively:

(13)
(14)

Then we can form a belief function for the pair :

for . To justify this, we must make the Ville judgement justifying the formation of the product distribution and also the judgement that learning (13) and (14) does not help beat the probabilities given by or . This goes beyond the individual judgements that learning (13) does not help beat and that learning (14) does not help beat .

Dempster’s rule of combination.

Dempster’s rule concerns the combination of two bodies of evidence bearing on the same variable . Given the ideas we have just reviewed, it is most easily stated by considering two multivalued mappings from the different probability spaces to the same space , say from and from . They have the usual meaning:

(15)
(16)

Even if both and are always non-empty, their intersection may be empty. When we learn (13) and (14), we learn that the event has happened.

Conditioning on the intersection being non-empty, we obtain the belief function on given by

In this case, the required Ville judgements are those involved in forming the product measure, along with the judgement that learning (13) and (14) does not help beat the probabilities given by the product measure aside from providing the information that has happened.

4.2 Discussion

In [14], I stated that Dempster’s rule of combination is appropriate when the bodies of evidence underlying individual belief functions are independent. The Ville judgements I have just detailed elaborate this notion of independence, in a way that should be useful in applications.

In our various writings on belief functions and in debates with critics, A. P. Dempster and I frequently took the view that the notions of independence and conditioning involved in Dempster’s rule are the same as in ordinary probability theory. The analysis of this article vindicates this view in some degree, insofar as it has shown that the judgements of irrelevance required for Dempster’s rule have the same general form as judgements of irrelevance that justify the formation of product measures in ordinary probability theory and updating by conditioning in Bayesian reasoning. The analysis has also revealed, however, the complexity that can be involved in judgements of this general form.

The critics often demanded, of course, explanations of independence and conditioning that were consistent with de Finetti’s explanation of the meaning of these concepts in the Bayesian calculus. Here I have argued that de Finetti’s explanations are not as convincing as sometimes thought even for Bayesian updating: they justify the pricing of conditional tickets but not the changes in price from one state of knowledge to another. In any case, they surely do not extend to the Dempster-Shafer case, where no embedding of the rules in a static picture seems to be possible. For the process of combining evidence, we need a more dynamic picture, which is provided by the Ville interpretation.

It is easy to construct examples in which the Ville irrelevance judgements required for Dempster’s rule are unreasonable or clearly wrong. It is also easy enough to construct examples in which these judgements are reasonable; I gave some such examples in the 1980s (see for example [18]). Existing applications of the Dempster-Shafer calculus would be enriched, however, by a systematic examination of the reasonableness of the irrelevance judgements they require. A clearer understanding of these judgements might also help us construct Dempster-Shafer models for complex scientific problems where the irrelevance judgements need to justify ordinary probabilities and Bayesian reasoning seem unreasonably strong.

References

  • [1] José Bernardo and Adrian F. M. Smith. Bayesian Theory. Wiley, New York, 1994.
  • [2] Emile Borel. Valeur pratique et philosophie des probabilités. Gauthier-Villars, Paris, 1939.
  • [3] Antoine-Augustin Cournot. Exposition de la théorie des chances et des probabilités. Hachette, Paris, 1843. Reprinted in 1984 as Volume I (B. Bru, editor) of [4].
  • [4] Antoine-Augustin Cournot. Œuvres complètes. Vrin, Paris, 1973–.
  • [5] Bruno de Finetti. Teoria Delle Probabilità. Einaudi, Turin, 1970. An English translation, by Antonio Machi and Adrian Smith, was published as Theory of Probability by Wiley (London) in two volumes in 1974 and 1975.
  • [6] Abraham De Moivre. The Doctrine of Chances: or, A Method of Calculating the Probabilities of Events in Play. Pearson, London, 1718. Second edition 1738, third 1756.
  • [7] A. P. Dempster. The Dempster-Shafer calculus for statisticians. International Journal of Approximate Reasoning, 48:365–377, 2008.
  • [8] Maria Carla Galavotti. Philosophical Introduction to Probability. CSLI Publications, Stanford, California, 2005.
  • [9] Thierry Martin. Probabilités et critique philosophique selon Cournot. Vrin, Paris, 1996.
  • [10] Thierry Martin. Probabilité et certitude. In Thierry Martin, editor, Probabilités subjectives et rationalité de l’action, pages 119–134. CNRS Éditions, Paris, 2003.
  • [11] Thierry Martin. Cournot, philosophe des probabilités. In Thierry Martin, editor, Actualité de Cournot, pages 51–68. Vrin, Paris, 2005.
  • [12] Siméon-Denis Poisson. Recherches sur la probabilité des judgments en matière criminelle et en matière civile, précédés des règles générale du calcul des probabilités. Bachelier, Paris, 1837.
  • [13] Yu. V. Prokhorov and B. A. Sevast’yanov. Probability theory. In Encyclopaedia of mathematics: An updated and annotated translation of the Soviet “Mathematical Encyclopaedia” (managing editor M. Hazewinkel), volume 7, pages 307–313. Reidel, Boston, 1987–1994.
  • [14] Glenn Shafer. A Mathematical Theory of Evidence. Princeton University Press, Princeton, New Jersey, 1976.
  • [15] Glenn Shafer. Conditional probability. International Statistical Review, 53:261–277, 1985.
  • [16] Glenn Shafer. From Cournot’s principle to market efficiency. In Jean-Phillipe Touffut, editor, Augustin Cournot: Modelling Economics, pages 55–95. Edward Elgar, 2007. A pre-publication version is at www.probabilityandfinance.com as Working Paper 15.
  • [17] Glenn Shafer, Peter R. Gillett, and Richard B. Scherl. A new understanding of subjective probability and its generalization to lower and upper prevision. International Journal of Approximate Reasoning, 33:1–49, 2003.
  • [18] Glenn Shafer and Amos Tversky. Languages and designs for probability judgment. Cognitive Science, 9:309–339, 1985.
  • [19] Glenn Shafer and Vladimir Vovk. Probability and Finance: It’s Only a Game. Wiley, New York, 2001.
  • [20] Glenn Shafer and Vladimir Vovk. The sources of Kolmogorov’s Grundbegriffe. Statistical Science, 21(1):70–98, 2006. A longer version is at www.probabilityandfinance.com as Working Paper 4.
  • [21] John Venn. The Logic of Chance. Macmillan, London and New York, third edition, 1888.
  • [22] Jean-André Ville. Étude critique de la notion de collectif. Gauthier-Villars, Paris, 1939.
  • [23] Vladimir Vovk and Glenn Shafer. A game-theoretic explanation of the effect, Working Paper 5, www.probabilityandfinance.com, 2003.
  • [24] Vladimir Vovk and Glenn Shafer. Good randomized sequential probability forecasting is always possible. Journal of the Royal Statistical Society, Series B, 67:747–764, 2005. A longer version is at www.probabilityandfinance.com as Working Paper 7.
  • [25] Vladimir Vovk and Glenn Shafer. The game-theoretic capital asset pricing model. International Journal of Approximate Reasoning, 49(1):175–197, 2008. See also Working Paper 1 at www.probabilityandfinance.com.
  • [26] Vladimir Vovk, Akimichi Takemura, and Glenn Shafer. Defensive forecasting. In

    Tenth International Workshop on Artificial Intelligence and Statistics

    , 2005.
    www.gatsby.ucl.ac.uk/aistats/. See also Working Paper 8 at www.probabilityandfinance.com.
  • [27] Wei Wu and Glenn Shafer. Testing lead-lag effects under game-theoretic efficient market hypotheses, Working Paper 23, www.probabilityandfinance.com, November 2007.
  • [28] Ronald Yager and Liping Liu, editors. Classic Works of the Dempster-Shafer Theory of Belief Functions. Springer, New York, 2008.