The main goal of this paper is to propose an axiomatic utility theory for lotteries described by belief functions in the Dempster-Shafer (D-S) theory of evidence Dempster1967 ; Shafer1976 . The axiomatic theory is constructed similar to von Neumann-Morgenstern’s (vN-M’s) utility theory for probabilistic lotteries vonNeumannMorgenstern1947 ; HersteinMilnor1953 ; Hausner1954 ; LuceRaiffa1957 ; Jensen1967 ; Fishburn1982 . Unlike the probabilistic case, our axiomatic theory leads to interval-valued utilities, and therefore to a partial (incomplete) preference order on the set of all belief function lotteries. Also, we compare our decision theory to those proposed by Jaffray Jaffray1989 , Smets Smets2002 , Dubois et al. Duboisetal1999 , Giang and Shenoy GiangShenoy2005 ; GiangShenoy2011 , and Shafer Shafer2016 .
In the foreword to Glenn Shafer’s 1976 monograph Shafer1976
, Dempster writes: “… I believe that Bayesian inference will always be a basic tool for practical everyday statistics, if only because questions must be answered and decisions must be taken, so that a statistician must always stand ready to upgrade his vaguer forms of belief into precisely additive probabilities.” More than 40 years after these lines were written, a lot of approaches to decision-making have been proposed (see the recent review inDenoeux2019 ). However, most of these methods lack a strong theoretical basis. The most important steps toward a decision theory in the D-S framework have been made by Jaffray Jaffray1989 , Smets Smets2002 , and Shafer Shafer2016 . However, we argue that these proposals are either not sufficiently justified from the point of view of D-S theory, or not sufficiently developed for practical use. Our goal is to propose and justify a utility theory that is in line with vN-M’s utility theory, but adapted to be used with lotteries whose uncertainty is described by D-S belief functions.
In essence, the D-S theory consists of representations— basic probability assignments (also called mass functions), belief functions, plausibility functions, etc.—together with Dempster’s combination rule, and a rule for marginalizing joint belief functions. The representation part of the D-S theory is also used in various other theories of belief functions. For example, in the imprecise probability community, a belief function is viewed as the lower envelope of a convex set of probability mass functions called a credal set. Using these semantics, it makes more sense to use the Fagin-Halpern combination rule FaginHalpern1991 (also proposed by de Campos et al. deCamposetal1990 ), rather than Dempster’s combination rule HalpernFagin1992 ; Shafer1990 ; Shafer1992 . The utility theory this article proposes is designed specifically for the D-S belief function theory, and not for the other theories of belief functions. This suggests that Dempster’s combination rule should be an integral part of our theory, a property that is not satisfied in the proposals by Jaffray and Smets.
There is a large literature on decision making with a (credal) set of probability mass functions motivated by Ellsberg’s paradox Ellsberg1961 . An influential work in this area is the axiomatic framework by Gilboa-Schmeidler GilboaSchmeidler1989 , where they use Choquet integration Choquet1953 ; GilboaSchmeidler1994 to compute expected utility. A belief function is a special case of a Choquet capacity. Jaffray’s Jaffray1989 work can also be regarded as belonging to the same line of research, although Jaffray works directly with belief functions without specifying a combination rule. A review of this literature can be found in, e.g., Gajdosetal2008 , where the authors propose a modification of the Gilboa-Schmeidler GilboaSchmeidler1989 axioms. As we said earlier, our focus here is on decision-making with D-S theory of belief functions, and not on decision-making based in belief functions with a credal set interpretation. As we will see, our interval-valued utility functions lead to intervals that are contained in the Choquet lower and upper expected utility intervals.
The remainder of this article is as follows. In Section 2, we sketch vN-M’s axiomatic utility theory for probabilistic lotteries as described by Luce and Raiffa LuceRaiffa1957 . In Section 3, we summarize the basic definitions in the D-S belief function theory. In Section 4, we describe our adaptation of vN-M’s utility theory for lotteries in which uncertainty is described by D-S belief functions. Our assumptions lead to an interval-valued utility function, and consequently, to a partial (incomplete) preference order on the set of all belief function lotteries. We also describe a model for assessments of utilities. In Section 5, we compare our utility theory with those described by Jaffray Jaffray1989 , Smets Smets2002 , Dubois et al. Duboisetal1999 , Giang and Shenoy GiangShenoy2005 ; GiangShenoy2011 , and Shafer Shafer2016 . Finally, in Section 6, we summarize and conclude.
2 von Neumann-Morgenstern’s Utility Theory
In this section, we describe vN-M’s utility theory for decision under risk. Most of the material in this section is adapted from LuceRaiffa1957 . A decision problem can be seen as a situation in which a decision-maker (DM) has to choose a course of action (or act) in some set F. An act may have different outcomes, depending on the state of nature . Exactly one state of nature will obtain, but this state is unknown. Denoting by the set of states of nature and by the set of outcomes111 The assumption of finiteness of the sets and O is only for ease of exposition. It is unnecessary for the proof of the representation theorem in this section., an act can thus be formalized as a mapping from to O. In this section, we assume that uncertainty about the state of nature is described by a probability mass function (PMF) on . If the DM selects act , they will get outcome with probability
To each act thus corresponds a PMF on O. We call a probabilistic lottery. As only one state in will obtain, a probabilistic lottery will result in exactly one outcome (with probability ), and we suppose that the lottery will not be repeated. Another natural assumption is that two acts that induce the same lottery are equivalent: the problem of expressing preference between acts then boils down to expressing preference between lotteries.
We are thus concerned with a DM who has preferences on , the set of all probabilistic lotteries on O, and our task is to find a real-valued utility function such that the DM strictly prefers to if and only if , and the DM is indifferent between and if and only if . We write if the DM strictly prefers to , write if the DM is indifferent between (or equally prefers) and , and write if the DM either strictly prefers to or is indifferent between the two.
Of course, finding such a utility function is not always possible, unless the DM’s preferences satisfy some assumptions. We can then construct a utility function that is linear in the sense that the utility of a lottery is equal to its expected utility , where is regarded as a degenerate lottery where the only possible outcome is with probability 1. In the remainder of this section, we describe a set of assumptions that lead to the existence of such a linear utility function.
Assumption 2.1 (Weak ordering of outcomes).
For any two outcomes and , either or . Also, if and , then . Thus, the preference relation over O is a weak order, i.e., it is complete and transitive.
Given Assumption 2.1, without loss of generality, let us assume that the outcomes are labelled such that , and to avoid trivialities, assume that .
Suppose that is a set of lotteries, where each of the lotteries are over outcomes in , with PMFs for . Suppose is a PMF on L such that for , and . Then is called a compound lottery whose outcome is exactly one lottery (with probability ), and lottery will result in one outcome (with probability ). Notice that the PMF is a conditional PMF for O in the second stage given that lottery is realized (with probability ) in the first stage (see Figure 1). We can compute the joint PMF for , and then compute the marginal p of the joint for O. The following assumption states that the resulting lottery is indifferent to the compound lottery .
Assumption 2.2 (Reduction of compound lotteries).
Any compound lottery , , where , is indifferent to a simple non-compound lottery , , where
for . PMF is the marginal for O of the joint PMF of .
A simple lottery involving only outcomes and with PMF , where , is called a reference lottery, and is denoted by . Let denote the set .
Assumption 2.3 (Continuity).
Each outcome is indifferent to a reference lottery
for some , where , i.e., .
Assumption 2.4 (Weak order).
The preference relation for lotteries in is a weak order, i.e., it is complete and transitive.
Assumption 2.5 (Substitutability).
In any lottery , if we substitute an outcome by the reference lottery that is indifferent to , then the result is a compound lottery that is indifferent to see Figure 2, i.e,
Theorem 2.1 (Reducing a lottery to an indifferent reference lottery).
(LuceRaiffa1957 ) First, we replace each by for . Assumption 2.3 (continuity) states that these indifferent lotteries exist, and Assumption 2.5 (substitutability) says that they are substitutable without changing the preference relation. So by using Assumption 2.4 serially, . Now if we apply Assumption 2.2 (reduction of compound lotteries), then , where is given by Eq. (3). ∎
Assumption 2.6 (Monotonicity).
A reference lottery is preferred or indifferent to reference lottery if and only if .
Assumptions 2.1–2.6 allow us to define the utility of a lottery as the probability of the best outcome in an indifferent reference lottery, and this utility function for lotteries on O is linear. This is stated by the following theorem.
Theorem 2.2 (LuceRaiffa1957 ).
Thus, we can define the utility of lottery as , where . Also, such a linear utility function is unique up to a strictly increasing affine transformation, i.e., if , where and are real constants, then also qualifies as a utility function.
3 Basic Definitions in the D-S Belief Function Theory
In this section, we review the basic definitions in the D-S theory of belief functions. Like various uncertainty theories, D-S belief function theory includes functional representations of uncertain knowledge, and basic operations for making inferences from such knowledge. Most of this material is taken from JirousekShenoy2018b .
3.1 Representations of belief functions
Belief functions can be represented in several different ways, including as basic probability assignments, plausibility functions and belief functions.222 Belief functions can also be mathematically represented by a convex set of PMFs called a credal set, but the semantics of such a representation are incompatible with Dempster’s combination rule Shafer1981 ; Shafer1990 ; Shafer1992 ; HalpernFagin1992 . For these reasons, we skip a credal set representation of a belief function. These are briefly discussed below.
Definition 1 (Basic Probability Assignment).
Suppose is an unknown quantity variable with possible values states in a finite set called the state space of . We assume that takes one and only one value in , but this value is unknown. Let denote the set of all subsets of . A basic probability assignment BPA for is a function such that
The subsets such that are called focal sets of . An example of a BPA for is the vacuous BPA for , denoted by , such that . We say that is deterministic if has a single focal set (with mass 1). Thus, the vacuous BPA for is deterministic with focal set . If all focal sets of are singleton subsets (of ), then we say that is Bayesian. In this case, is equivalent to the PMF for such that for each .
Definition 2 (Plausibility Function).
The information in a BPA can be represented by a corresponding plausibility function defined as follows:
For an example, suppose . Then, the plausibility function corresponding to BPA is given by , , , and .
Definition 3 (Belief Function).
The information in a BPA can also be represented by a corresponding belief function that is defined as follows:
For the example above with , the belief function corresponding to BPA is given by , , , and . For any proposition , it is easy to see that . Thus, if a DM’s belief in proposition a is an interval, say , where and , then such beliefs can be represented by a BPA such that , , and . For such a BPA, .
All three representations—BPA, belief and plausibility functions—have exactly the same information, as any one of them allows us to recover the other two Shafer1976 . Next, we describe the two main operations for making inferences.
3.2 Basic operations in the D-S theory
There are two main operations in the D-S theory—Dempster’s combination rule and marginalization.
Dempster’s Combination Rule
In the D-S theory, we can combine two BPAs and representing distinct pieces of evidence by Dempster’s rule Dempster1967 and obtain the BPA , which represents the combined evidence. Dempster refers to this rule as the product-intersection rule, as the product of the BPA values are assigned to the intersection of the focal sets, followed by normalization. Normalization consists of discarding the mass assigned to , and normalizing the remaining values so that they add to 1. In general, Dempster’s rule of combination can be used to combine two BPAs for arbitrary sets of variables.
Let denote a finite set of variables. The state space of is . Thus, if then the state space of is . Projection of states simply means dropping extra coordinates; for example, if is a state of , then the projection of to , denoted by , is simply , which is a state of . Projection of subsets of states is achieved by projecting every state in the subset. Suppose . Then . Notice that .
Vacuous extension of a subset of states of to a subset of states of , where , is a cylinder set extension, i.e., if , then . Thus, if , then .
Definition 4 (Dempster’s rule using BPAs).
Suppose and are BPAs for and , respectively. Then is a BPA for , say, given by and
for all , where is a normalization constant given by
The definition of Dempster’s rule assumes that the normalization constant is non-zero. If , then the two BPAs and are said to be in total conflict and cannot be combined. If , we say and are non-conflicting.
Marginalization in D-S theory is addition of values of BPAs.
Definition 5 (Marginalization).
Suppose is a BPA for . Then, the marginal of for , where , denoted by , is a BPA for such that for each ,
3.3 Conditional belief functions
Suppose that there is a BPA for expressing our belief about if we know that , and denote it by . Notice that is such that . We can embed this conditional BPA for into a BPA for , which is denoted by , such that the following three conditions hold. First, tells us nothing about , i.e., . Second, tells us nothing about , i.e., . Third, if we combine with the deterministic BPA for such using Dempster’s rule, and marginalize the result to we obtain , i.e., . The least committed way to obtain such an embedding, called conditional embedding, was derived by Smets Smets1978 ; Smets1993 (see also Shafer1982 ). It consists of taking each focal set of , and converting it to a corresponding focal set of (with the same mass) as follows: , where denotes the complement of in . It is easy to confirm that this method of embedding satisfies the three conditions mentioned above, and is the least committed (minimally informative) BPA verifying this property.
Example 1 (Conditional embedding).
Consider discrete variables and , with and . Suppose that is a BPA for such that . If we have a conditional BPA for given as follows:
then its conditional embedding into BPA for is
There are some differences with conditional probability distributions. First, in probability theory, consists of all conditional distributions that are well-defined, i.e., for all such that . In D-S belief function theory, we do not have similar constraints. We can include only those non-vacuous conditionals such that . Also, if we have more than one conditional BPA for , given, say for , and (assuming , and ), we embed these two conditionals for to get BPAs and for , and then combine them using Dempster’s rule of combination to obtain one conditional BPA , which corresponds to in probability theory.
Second, given any joint PMF for , we can always factor this into for , and for , such that . This is not true in D-S belief function theory. Given a joint BPA for , we cannot always find a BPA for such that . However, we can always construct joint BPA for by first assessing for , and assessing conditionals for for those that we have knowledge about and such that , embed these conditionals into BPAs for , and combine all such BPAs to obtain the BPA for . An implicit assumption here is that BBAs are distinct, and it is acceptable to combine them using Dempster’s rule. We can then construct .
4 A Utility Theory for D-S Belief Function Theory
In this section, we describe a new utility theory for lotteries where the uncertainty is described by D-S belief functions. These lotteries, called belief function lotteries,333This notion was previously introduced in Denoeux2019 under the name “evidential lottery.” will be introduced in Section 4.1. We present and discuss assumptions in Section 4.2 and state a representation theorem in Section 4.3. Finally, we describe a practical model allowing us to compute the utility of a bf lottery based on two parameters in Section 4.4.
4.1 Belief function lotteries
We generalize the decision framework outline in Section 2 by assuming that uncertainty about the state of nature is described by a BPA for . The probabilistic framework is recovered as a special case when is Bayesian. As before, we define an act as a mapping from to the set O of outcomes. Mapping pushes forward from to O, transferring each mass for to the image of subset a by , denoted as . The resulting BPA for O is then defined as
for all Denoeuxetal2019 . Eq. (13) clearly generalizes Eq. (1). The pair will be called a belief function (bf) lottery. As before, we assume that two acts can be compared from what we believe their outcomes will be, irrespective of the evidence on which we base our beliefs. This assumption is a form of what Wakker Wakker2000 calls the principle of complete ignorance (PCI). It implies that two acts resulting in the same bf lottery are equivalent. The problem of expressing preferences between acts becomes that of expressing preferences between bf lotteries.
As a consequence of the PCI, preferences between acts do not depend on the cardinality of the state space in case of complete ignorance. For instance, assume that we define , and we are completely ignorant of the state of nature, so that our belief state is described by the vacuous BPA . Consider two acts and that yield if, respectively, or occurs, and otherwise. These two acts induce the same vacuous bf lottery with : consequently, they are equivalent according to the PCI. Now, assume that we decide to express the states of nature with finer granularity and we refine state into two states and . Let denote the refined frame. We still have and , so that our preferences between acts and are unchanged. We note that a Bayesian DM applying Laplace’s principle of indifference (PI) would reach a different conclusion: before the refinement, the PI implies
are unchanged. We note that a Bayesian DM applying Laplace’s principle of indifference (PI) would reach a different conclusion: before the refinement, the PI implies, which results in the same probabilistic lottery on for the two acts, but after the refinement the same principle gives us ; this results in two different lotteries for act and for act , which makes strictly preferable to . Considering that the granularity of the state space is often partly arbitrary as discussed by Shafer in Shafer1976 , we regard this property of invariance to refinement under complete ignorance as a valuable feature of a decision theory based on D-S belief functions.
We are thus concerned with a DM who has preferences on , the set of all bf lotteries. Our task is to find a utility function , where denotes the set of closed real intervals, such that the is viewed as an interval-valued utility of . The interval-valued utility can be interpreted as follows: and are, respectively, the degrees of belief and plausibility of receiving the best outcome in a bf reference lottery equivalent to . Given two lotteries and , is preferred to if and only if and . This leads to incomplete preferences on the set of all bf lotteries. If we assume for all bf lotteries, then we have a real-valued utility function on , and consequently, complete preferences.
Example 2 (Ellsberg’s Urn).
Ellsberg Ellsberg1961 describes a decision problem that questions the adequacy of the vN-M axiomatic framework. Suppose we have an urn with 90 balls, of which 30 are red, and the remaining 60 are either black or yellow. We draw a ball at random from the urn. Let denote the color of the ball drawn, with . Notice that the uncertainty of can be described by a BPA for such that , and .
First, we are offered a choice between Lottery : on red, and Lottery : on black, i.e., in , you get if the ball drawn is red, and if the ball drawn is black or yellow, and in , you get if the ball drawn is black and if the ball drawn is red or yellow. Choice of can be denoted by alternative such that , . Similarly, choice of can be denoted by alternative such that , . can be represented by the BPA for = as follows: , . can be represented by BPA for as follows: , . Notice that and are bf lotteries. Ellsberg notes that a frequent pattern of response is preferred to .
Second, we are offered a choice between : on red or yellow, and : on black or yellow, i.e., in you get if the ball drawn is red or yellow, and if the ball drawn is black, and in , you get if the ball drawn is black or yellow, and if the ball drawn is red. can be represented by BPA as follows: , and , and can be represented by the BPA as follows: , . and are also belief function lotteries. Ellsberg notes that is often strictly preferred to . Also, the same subjects who prefer to , prefer to . Table 1 is a summary of the four bf lotteries.
Thus, if the outcomes of a lottery are based on the states of a random variable
Thus, if the outcomes of a lottery are based on the states of a random variable, which is described by a BPA for , then we have a belief function lottery. In this example, we have only two outcomes, , and . and can also be regarded as probabilistic lotteries as the corresponding BPAs are Bayesian. and have BPAs with non-singleton focal sets. Thus, these two lotteries can be considered as involving “ambiguity” as the exact distribution of the probability of between outcomes and is unknown. Regardless of how the probability of is distributed between and , the preferences of subjects violate the tenets of vN-M utility theory.
|($ on )|
|($ on )|
|($ on or )|
|($ on or )|
4.2 Assumptions of our framework
As in the probabilistic case, we will assume that a DM’s preferences for bf lotteries are reflexive and transitive. However, unlike the probabilistic case (Assumption 2.4), we do not assume that these preferences are complete. In the probabilistic case, incomplete preferences are studied in Aumann1962 , and in the case of sets of utility functions, in Dubraetal2004 .
Our first assumption is identical to Assumption 2.1.
Assumption 4.1 (Weak ordering of outcomes).
The DM’s preferences for outcomes in are complete and transitive.
This allows us to label the outcomes such that
Let denote the set of all bf lotteries on = , where the outcomes satisfy Eq. (14). As every BPA on is a bf lottery, is essentially the set of all BPAs on . As the set of all BPAs include Bayesian BPAs, the set is a superset of , i.e., every probabilistic lottery on can be considered a bf lottery.
Consider a compound lottery , where , is a BPA for L, and is a bf lottery on O, where is a conditional BPA for O in the second stage given that lottery is realized in the first stage. Assumption 4.2 below posits that we can reduce the compound lottery to a simple bf lottery on O using the D-S calculus, and that the compound lottery is equally preferred to the reduced simple lottery on O.
Assumption 4.2 (Reduction of compound lotteries).
Suppose is a compound lottery as described in the previous paragraph. Then, , where
and is a BPA for obtained from by conditional embedding, for .
Let be a set of bf lotteries, with , in which is a Bayesian conditional BPA for O such that and for . Let be a compound lottery in which is a Bayesian BPA for L such that for with . Then BPA defined by (15) is Bayesian and it verifies
The conditional embedding of is given by
Let . It is a BPA for defined by
for all . Combining with , we get a Bayesian BPA on such that
After marginalizing on O, we finally get Eq. (16). ∎
Next, we define a bf reference lottery as a bf lottery on . A bf reference lottery has three parameters , , and , which are all non-negative and sum to 1. The following assumption states that any deterministic bf lottery is equally preferred to some bf reference lottery.
Assumption 4.3 (Continuity).
Any subset of outcomes (considered as a deterministic bf lottery) is indifferent to a bf reference lottery such that
where , and . Furthermore, if is a singleton subset.
Notice that , and . For singleton subsets, the equivalent bf reference lottery is Bayesian: this ensures that Assumption 4.3 is a generalization of Assumption 2.3. For non-singleton subsets a of outcomes, we may have , i.e., the bf reference lottery may not be Bayesian. In other words, we do not assume that ambiguity can be resolved by selecting an equivalent probabilistic reference lottery.
Consider lottery in Example 2, where , and . Suppose we wish to assess the utility of focal set using a probabilistic reference lottery . A DM may have the following preferences. For any she prefers to the probabilistic reference lottery, and for any , she prefers the probabilistic reference lottery to . However, she is unable to give us a precise such that . For such a DM, we can assess a bf reference lottery such that and , i.e., , , and .
Assumption 4.4 (Quasi-order).
The preference relation for bf lotteries on is a quasi-order, i.e., it is reflexive and transitive.
In contrast with the probabilistic case (Assumption 2.4), we do not assume that is complete. There are many reasons we may not wish to assume completeness. It is not descriptive of human behavior. Even from a normative point of view, it is questionable that a DM has complete preferences on all possible lotteries. The assumption of incomplete preferences is consistent with the D-S theory of belief functions where we have non-singleton focal sets. Several authors, such as Aumann Aumann1962 , and Dubra et al. Dubraetal2004 argue why the assumption of complete preferences may not be realistic in many circumstances.
The substitutability assumption is similar to the probabilistic case (Assumption 2.5)– we replace an outcome in the probabilistic case by a focal set of in the bf case.
Assumption 4.5 (Substitutability).
In any bf lottery , if we substitute a focal set a of by an equally preferred bf reference lottery , then the result is a compound lottery that is equally preferred to .