Recently, Samantha Leung and I [Halpern and LeungHalpern and Leung2012] suggested representing uncertainty by a weighted set of probability measures, and suggested a way of making decisions based on this representation of uncertainty: maximizing weighted regret. However, we did not answer an apparently simpler question: given this representation of uncertainty, what does it mean for an event to be more likely than an event ? This is what I do in this paper. To explain the issues, I start by reviewing the Halpern-Leung approach.
It has frequently been observed that there are many situations where an agent’s uncertainty is not adequately described by a single probability measure. Specifically, a single measure may not be adequate for representing an agent’s ignorance. For example, there seems to be a big difference between a coin known to be fair and a coin whose bias an agent does not know, yet if the agent were to use a single measure to represent her uncertainty, in both of these cases it would seem that the measure that assigns heads probability would be used.
One approach that has been suggested for representing ignorance is to use a set of probability measures [HalpernHalpern2003]. That approach has the benefit of representing uncertainty in general, not by a single number, but by a range of numbers. This allows us to distinguish the certainty that a coin is fair (in which case the uncertainty of heads is represented by a single number, ) from knowing only that the probability of heads could be anywhere between, say, and .
But this approach also has its problems. For example, consider an agent who believes that a coin may have a slight bias. Thus, although it is unlikely to be completely fair, it is close to fair. How should we represent this with a set of probability measures? Suppose that the agent is quite sure that the bias is between and . We could, of course, take to consist of all the measures that give heads probability between and . But how does the agent know that the possible biases are exactly between and . Does she not consider possible for some small ? And even if she is confident that the bias is between and , this representation cannot take into account the possibility that she views biases closer to as more likely than biases further from .
There is also a second well-known concern: learning. Suppose that the agent initially considers possible all the measures that gives heads probability between and . She then starts tossing the coin, and sees that, of the first 20 tosses, 12 are heads. It seems that the agent should then consider a bias of greater than more likely than a bias of less than . But if we use the standard approach to updating with sets of probability measures (see [HalpernHalpern2003]), and condition each of the measures on the observation, since the coin tosses are viewed as independent, the agent will continue to believe that the probability of the next coin toss is between and . The observation has no impact as far as learning to predict better. The set stays the same, no matter what observation is made.
There is a well-known solution to these problems: using a second-order measure on these measures to express how likely the agent considers each of them to be. (See [GoodGood1980] for a discussion of this approach and further references.) For example, an agent can express the fact that the bias of a coin is more likely to be close to than far from . In addition, the problem of learning can be dealt with by straightforward conditioning. But this approach leads to other problems. Essentially, it seems that the ambiguity that an agent might feel about the outcome of the coin toss seems to have disappeared. For example, suppose that the agent has no idea what the bias is. The obvious second-order probability to use is the uniform probability on possible biases. While we cannot talk about the probability that the coin is heads (there is a set of probabilities, after all, not a single probability), the expected probability of heads is . Why should an agent that has no idea of the bias of the coin know or believe that the expected probability of heads is ? Of course, if one had to use a single probability measure to describe uncertainty, symmetry considerations dictate that it should be the one that ascribes equal likelihood to heads and tails; similarly, if one had to put a single second-order probability on the set of possible biases, uniform probability seems like the most obvious choice. Moreover, if our interest is in making decisions, then maximizing the expected utility using the expected probability again does not take the agent’s ignorance into account. Kyburg Kyburg and Pearl Pearl87 have even argued that there is no need for a second-order probability on probabilities; whatever can be done with a second-order probability can already be done with a basic probability.
Nevertheless, when it comes to decision-making, it does seem useful to use an approach that represents ambiguity, while still maintaining some of the features of having a second-order probability on probabilities. One suggestion, made by Walley Walley97, is to put a second-order possibility measure on probability Leung and I similarly suggested putting weights on each probability measure in . Since we assumed that the weights are normalized so that the supremum of the weights is 1, these weights can also be viewed as a possibility measure. If the set is finite, we can also normalize so as to view the weights as being second-order probabilities. As with second-order probabilities, the weights can vary over time, as more information is acquired. For example, we can start with a state of complete ignorance (modeled by assuming that all probability measures have weight 1), and update the weights after making an observation , we take the weight of a measure to be the relative likelihood of if were the true measure. (See Section 2 for details.) With this approach, if there is a true underlying measure generating the data, over time, the weight of the true measure approaches 1, while the weight of all other measures approaches 0. Thus, this approach allows learning in a natural way. If, for example, the actual bias of the coin was in the example above, no matter what the initial weights, as long as had positive weight, then its weight would almost surely converge to 1 as more observations were made, while the weight of all other measures would approach 0. This, of course, is exactly what would happen if we had a second-order probability on .The weights can also be used to represent the fact that some probabilities in the set are more likely than others.
What makes this approach different from just using a second-order probability on lies in how decisions are made. Leung and I used regret, a standard approach to decision-making that goes back to Niehans Niehans and Savage Savage51. If uncertainty is represented by a set of probability measures, then regret works as follows: for each act and each measure , we can compute the expected regret of with respect to ; this is the difference between the expected utility of and the expected utility of the act that gives the highest expected utility with respect to . We can then associate with an act its worst-case expected regret of , over all measures , and compare acts with respect to their worst-case expected regret. With weights in the picture, we modify the procedure by multiplying the expected regret associated with measure by the weight of , and compare acts according to their worst-case weighted expected regret. This approach to making decisions is very different from that suggested by Walley Walley97. Moreover, using the weights in the way means that we cannot simply replace a set of weighted probability measures by a single probability measure; the objections of Kyburg Kyburg and Pearl Pearl87 do not apply.
Leung and I [Halpern and LeungHalpern and Leung2012] show that this approach seems to do reasonable things in a number of examples of interest, and provide an elegant axiomatization of decision-making. So how can we represent relative likelihood using this approach? This is something not considered in earlier papers using sets of weighted probabilities. If uncertainty is represented by a single probability measure, the answer is immediate: is more likely than exactly if the probability of is greater than the probability of . When using sets of probability measures, various approaches have been considered in the literature. The most common takes to be more likely than if the lower probability of is greater than the lower probability of , where the lower probability of is its worst-case probability, taken over the measures in (see Section 3). We could also compare and with respect to their upper probabilities (the best-case probability with respect to the measures in ). Another possibility is to take to be more likely than if for all measures ; this gives a partial order on likelihood.
In this paper, I define a notion of relative likelihood when uncertainty is represented by a weighted set of probability measures that generalizes the ordering defined by lower probability in a natural way; I also define a generalization of upper probability. We can then associate with an event two numbers that are analogues of lower and upper probability. If uncertainty is represented by a single measure, then these two numbers coincide; in general, they do not. The interval can be thought of as representing the degree of ambiguity in the likelihood of . Indeed, in the special case when all the weights are 1, the numbers are essentially just the lower and upper probability (technically, they are 1 minus the lower and upper probability, respectively). Interestingly, the approach to assigning likelihood is based on the approach to decision-making. Essentially, what I am doing is the analogue of defining probability in terms of expected utility, rather than the other way around. The approach can be viewed as generalizing both probability and lower probability, while at the same time allowing a natural approach to updating.
Why we should be interested in such a representation. If all that we ever did with probability was to use it to make decisions, then arguably this wouldn’t be of much interest; Halpern and Leung’s work already shows how weighted sets of probabilities can be used in decision-making. The results of this paper add nothing further to that question. However, we often talk about the likelihood of events quite independent of their use in decision-making (think of the use of probability in physics, to take just one example). Thus, having an analogue of probability seems important and useful in its own right.
The rest of this paper is organized as follows. After reviewing the relevant material in [Halpern and LeungHalpern and Leung2012] in Section 2, I define regret-based likelihood in Section 3, and compare it to lower probability. I provide an axiomatic characterization of regret-based likelihood in Section 4, and show how it relates to the axiomatic characterization of lower probability. I conclude in Section 5.
2 Weighted Expected Regret: A Review
Consider the standard setup in decision theory. We have a state space and an outcome space . An act is a function from to ; it describes an outcome for each state. Suppose that we have a utility function on outcomes and a set of weighted probability measures. That is, consists of pairs , where is a weight in and is a probability on . Let . For each there is assumed to be exactly one , denoted , such that . It is further assumed that weights have been normalized so that there is at least one measure such that .111The assumption that at least one probability measure has a weight of 1 is convenient for comparison to other approaches; see below. However, making this assumption has no impact on the results of this paper; as long as we restrict to sets where the weight is bounded, all the results hold without change. Note that the assumption that the weights are probabilities runs into difficulties if we have an infinite number of measures in ; for example, if includes all measures on heads from to , as discussed in the Introduction, using a uniform probability, we would be forced to assign each individual probability measure a weight of 0, which would not work well for our later definitions. Finally, is assumed to be weakly closed, so that if for , , and , then . (I discuss below why I require to be just weakly closed, rather than closed.)
Where are the weights in coming from? In general, they can be viewed them as subjective, just like the probability measures. However, as observed in [Halpern and LeungHalpern and Leung2012], there is an important special case where the weights can be given a natural interpretation. Suppose that, as in the case of the biased coin in the Introduction, we make observations in a situation where the probability of making a given observation is determined by some objective source. Then we can start by giving all probability measures a weight of 1. Given an observation (e.g., sequence of coin tosses in the example in the Introduction), we can compute for each measure ; we can then update the weight of to be . Thus, the more likely the observation is according to , the higher the updated weight of .222The idea of putting a possibility on probabilities in that is determined by likelihood also appears in the work of Moral Moral92, although he does not consider a general approach to dealing with sets of weighted probability measures. (The denominator is just a normalization to ensure that some measure has weight 1.) With this approach to updating, if there is a true underlying measure generating the data, then as an agent makes more observations, almost surely, the weight of the true measure approaches 1, while the weight of all other measures approaches 0.333The “almost surely” is due to the fact that, with probability approaching 0, as more and more observations are made, it is possible that an agent will make a misleading observations, that are not representative of the true measure. This also depends on the set of possible observations being rich enough to allow the agent to ultimately discover the true measure generating the observations. Since learning is not a focus of this paper, I do not make this notion of “rich enough” precise here.
I now review the definition of weighted regret, and introduce the notion of absolute (weighted) regret. I start with regret. The regret of an act in a state is the difference between the utility of the best act at state and the utility of at . Typically, the act is not compared to all acts, but to the acts in a set , called a menu. Thus, the regret of in state relative to menu , denoted , is . 444Recall that if is a set of real numbers, , the supremum of , is the smallest real numbers that is greater than or equal to all the elements of . If is finite, then the sup is the same as the max. But if is, say, the interval , then . Similarly, is the largest real number that is less than or equal to all the elements in . There are typically some constraints put on to ensure that is finite—this is certainly the case if is finite, or the convex closure of a finite set of acts, or if there is a best possible outcome in the outcome space . The latter assumption holds in this paper, so I assume throughout that is finite.
For simplicity, I assume that the state space is finite. Given a probability measure on , the expected regret of an act with respect to relative to menu is just . The (expected) regret of with respect to and a menu is just the worst-case regret, that is,
Similarly, the weighted (expected) regret of with respect to and a menu is just the worst-case weighted regret, that is,
Thus, regret is just a special case of weighted regret, where all weights are 1.
Note that, as far weighted regret goes, it does not hurt to augment a set of weighted probability measures by adding pairs of the form for . But if we start with an unweighted set of probability measures, the weighted set is not closed in general, although it is weakly closed. There may well be a sequence , where for all , but . But then we would have have converging to . This is exactly why I required only weak closedness. Note for future reference that, since is assumed to be weakly closed, if , then there is some element such that .
Weighted regret induces an obvious preference order on acts: act is at least as good as with respect to and , written , if . As usual, I write if but it is not the case that . The standard notion of regret is the special case of weighted regret where all weights are 1. I sometimes write to denote the unweighted case (i.e., where all the weights in are 1).
In this setting, using weighted regret gives an approach that allows an agent to transition smoothly from regret to expected utility. It is well known that regret generalizes expected utility in the sense that if is a singleton , then iff (where denotes the expected utility of act with respect to probability ).555This follows from the observation that, given a menu , there is a constant such that, for all acts , . (In particular, this means that if is a singleton, regret is menu independent.) If we start with all the weights being 1, then, as observed above, the weighted regret is just the standard notion of regret. As the agent makes observations, if there is a measure generating the uncertainty, the weights will get closer and closer to a situation where gets weight 1, with the weights of all other measures dropping off quickly to 0, so the ordering of acts will converge to the ordering given by expected utility with respect to .
There is another approach with some similar properties, that again starts with uncertainty being represented by a set of (unweighted) probability measures. Define . Thus is the worst-case expected utility of , taken over all . Then define if . This is the maxmin expected utility rule, quite often used in economics [Gilboa and SchmeidlerGilboa and Schmeidler1989]. There are difficulties in getting a weighted version of maxmin expected utility [Halpern and LeungHalpern and Leung2012] (see Section 3); however, Epstein and Schneider epstein05a propose another approach that can be combined with maxmin expected utility. They fix a parameter , and update after an observation by retaining only those measures such that . For any choice of , we again end up converging almost surely to a single measure, so again this approach converges almost surely to expected utility.
I conclude this section with a discussion of menu dependence. Maxmin expected utility is not menu dependent; the preference ordering on acts induced by regret can be, as the following example illustrates.
: Take the outcome space to be , and the utility function to be the identity, so that and . As usual, if , denotes the indicator function on , where, for each state , we have if , and if . Let , , , , , , and , where , , and . A straightforward calculation shows that , , , , , , , and . Thus, , while . The preference on and depends on whether we consider the menu or the menu .
Suppose that there is an outcome that gives the maximum utility; that is, for all . If is the constant act that gives outcomes in all states, then is clearly the best act in all states. If there is such a best act, an “absolute”, menu-independent notion of weighted expected regret can be defined by always comparing to . That is, define
If there is a best act, then I write if ; similarly in the unweighted case, I write if .
Conceptually, we can think of the agent as always being aware of the best outcome , and comparing his actual utility with to . Equivalently, the absolute notion of regret is equivalent to a menu-based notion with respect to a menu that includes (since if the menu includes , it is the best act in every state). As we shall see, in our setting, we can always reduce menu-dependent regret to this absolute, menu-independent notion, since there is in fact a best act: .
3 Relative Ordering of Events Using Weighted Regret
In this section, I consider how a notion of comparative likelihood can be defined using sets of weighted probability measures.
As in Example 2.1, take the outcome space to be , the utility function to be the identity, and consider indicator functions. It is easy to see that , so that with this setup, we can recover probability from expected utility. Thus, if uncertainty is represented by a single probability measure and we make decisions by preferring those acts that maximize expected utility, then we have iff .
Consider what happens if we apply this approach to maxmin expected utility. Now we have that iff . In the literature, , denoted , is called the lower probability of , and is a standard approach to describing likelihood. The dual upper probability, , is denoted . An easy calculation shows that (where, as usual, denotes the complement of ). The interval can be thought of as describing the uncertainty of ; the larger the interval, the greater the ambiguity.
What happens if we apply this approach to regret? First consider unweighted regret. If we restrict to acts of the form , then the best act is clearly , which is just the constant function . Thus, we can (and do) use the absolute notion of regret here, and for the remainder of this paper. We then get that iff iff ; that is, . Moreover, easy manipulation shows that . It follows that iff iff iff ; both regret and maxmin expected utility put the same ordering on events.
The extension to weighted regret is immediate. Let , the (weighted) regret-based likelihood of , be defined as . If is unweighted, so that all the weights are 1, I write to denote . Note that , so iff . That is, the ordering induced by is the opposite of that induced by . So, for example, and ; smaller sets have a larger regret-based likelihood.666Since an act with smaller regret is viewed as better, the ordering on acts of the form induced by regret is the same as that induced by maxmin expected utility.
Regret-based likelihood provides a way of associating a number with each event, just as probability and lower probability do. Moreover, just as lower probability gives a lower bound on uncertainty, we can think of as giving an upper bound on the uncertainty. (It is an upper bound rather than a lower bound because larger regret means less likely, just as smaller lower probability does.) The naive corresponding lower bound is given by . This lower bound is not terribly interesting; if there are probability measures such that is close to 0, then this lower bound will be close to 0, independent of the agent’s actual feeling about the likelihood of . A more reasonable lower bound is given by the expression (recall that the analogous expression relates upper probability and lower probability). The intuition for this choice is the following. If nature were conspiring against us, she would try to prove us wrong by making as large as possible—that is, make the weighted probability of being wrong as large as possible. On the other hand, if nature were conspiring with us, she would try to make as large as possible, or, equivalently, make as small as possible. Note that this is different from making as large as possible, unless for all . An easy calculation shows that
This motivates the definition of .
The following lemma clarifies the relationship between these expressions, and shows that really does give an interval of ambiguity.
Since, as observed above,
and for all , we have
it follows that .
Since, by assumption, there is a probability measure such that , it follows that
In general, equality does not hold in Lemma 3.1, as shown by the following example. The example also illustrates how the “ambiguity interval” can decrease with weighted regret, if the weights are updated as suggested in [Halpern and LeungHalpern and Leung2012].
: Suppose that the state space consists of (for heads and tails); let be the measure that puts probability on . Let . That is, we initially consider all the measures that put probability between and on heads. We toss the coin and observe it land heads. Intuitively, we should now consider it more likely that the probability of heads is greater than . Indeed, applying likelihood updating, we get the set ;777The weight of is the likelihood of observing heads according to , which is just , normalized by the likelihood of observing heads according to the measure that gives heads the highest probability, namely . the probability measures that give higher probability get higher weight. In particular, the weight of is still 1, but the weight of is only . If the coin is tossed again and this time tails is observed, we update further to get . An easy calculation shows that , , and .
It is also easy to see that , so . Thus, for , we get strict inequalities for the expressions in Lemma 3.1.
The width of the interval can be viewed as a measure of the ambiguity the agent feels about , just as the interval . Indeed, if all the weights are 1, the two intervals have the same width, since and in this case.
However, weighted regret has a significant advantage over upper and lower probability here. If the true bias of the coin is, say , then if the set represents the uncertainty after steps, as increases, almost surely, will be a smaller and smaller interval containing . More generally, using likelihood updated combined with weighted regret provides a natural way to model the reduction of ambiguity via learning.
One concern with the use of regret has been the dependence of regret on the menu. It is also worth noting that, in this context, there is a sense in which we can work with the absolute notion of weighted regret without loss of generality: if we restrict to indicator functions, then a preference relative to a menu can always be reduced to an absolute preference. Given a menu consisting of indicator functions, let . that is, is the union of the events for which the corresponding indicator function is in .
: If is a menu consisting of indicator functions, and , then iff .
Proof: Let be any menu consisting of indicator functions that includes , , and . Recall that iff ; the absolute notion of regret is equivalent to the menu-based notion, as long as the menu includes the best act, which in this case is . It clearly suffices to show that, for all states and all acts ,
This is straightforward. There are two cases, depending on whether .
If , then, by definition, there is some act such that , so . Clearly , since . Moreover, , so . Thus, for ,
For , we have for all and , so . On the other hand, , and , so again . Thus, we again have .
4 Characterizing Weighted Regret-Based Likelihood
The goal of this section is to characterize weighted regret-based likelihood axiomatically. In order to do so, it is helpful to review the characterizations of probability and lower probability.
A probability measure on a finite set maps subsets of to in a way that satisfies the following three properties:888Since I assume that is finite here, I assume that all probability measures have domain , and ignore measurability issues.
.999This property actually follows from the other two, using the observation that ; I include it here to ease the comparison to other approaches.
These three properties characterize probability in the sense that any function that satisfies these properties is a probability measure.
Lower probabilities satisfy analogues of these properties:
However, these properties do not characterize lower probability. There are functions that satisfy LP1, LP2, and LP3 that are not the lower probability corresponding to some set of probability measures. (See [Halpern and PucellaHalpern and Pucella2002, Proposition 2.2] for an example showing that analogous properties do not characterize ; the same example also shows that they do not characterize .)
Various characterizations of (and ) have been proposed in the literature [Anger and LembckeAnger and Lembcke1985, GilesGiles1982, HuberHuber1976, HuberHuber1981, LorentzLorentz1952, WilliamsWilliams1976, WolfWolf1977], all similar in spirit. I discuss one due to Anger and Lembcke Anger85 here, since it makes the contrast between lower probability and regret particularly clear. The characterization is based on the notion of set cover: a set is said to be covered times by a multiset if every element of appears at least times in . It is important to note here that is a multiset, not a set; its elements are not necessarily distinct. (Of course, a set is a special case of a multiset.) Let denote multiset union; thus, if and are multisets, then consists of all the elements in or , which appear with multiplicity that is the sum of the multiplicities in and . For example, using the notation to denote a multiset, then .
If , then an -cover of is a multiset that covers times and covers times. Multiset is an n-cover of if covers times. For example, if , then is a -cover of , a -cover of , and a 3-cover of . Consider the following property:
For all integers and all subsets of , if is an -cover of , then .101010Note that LP3 implies LP2, using the fact that is a (1,0)-cover of .
There is an analogous property for upper probability, where is replaced by . It is easy to see that LP3 implies LP3 (since is a cover of ).
: [Anger and LembckeAnger and Lembcke1985] If , then there exists a set of probability measures with if and only if satisfies LP1, LP2, and LP3.
Moving to regret-based likelihood, clearly we have
The whole space has the least regret; the empty set has the greatest regret. In the unweighted case, since , REG1, REG2, and the following analogue of LP3 (appropriately modified for ) clearly characterize :
For all integers and all subsets of , if is an -cover of , then .
Note that complements of sets () are used here, since regret is minimized if the probability of the complement is maximized. This need to work with the complement makes the statement of the properties (and the proofs of the theorems) slightly less elegant, but seems necessary.
It is not hard to see that REG3 does not hold for weighted regret-based likelihood. For example, suppose that and , where, identifying the probability with the tuple , we have
Then , while . Since is a (1,1)-cover of , REG3 would require that
which is clearly not the case.
We must thus weaken REG3 to capture weighted regret-based likelihood. It turns out that the appropriate weakening is the following:
For all integers and all subsets of , if is an -cover of , then .
Although REG3 is weaker than REG3, it still has some nontrivial consequences. For example, it follows from REG3 that is anti-monotonic. If , then is a 1-cover of , so by REG3, we must have . Since is trivially a 1-cover of , it also follows that . REG3 also implies REG1, since (= ) is an -cover of itself for all .
I can now state the representation theorem. It says that a representation of uncertainty satisfies REG1, REG2, and REG3 iff it is the weighted regret-based likelihood determined by some set . The set is not unique, but it can be taken to be maximal, in the sense that if weighted regret-based likelihood with respect to some other set gives the same representation, then for all pairs , there exists such that . This (unique) maximal set can be viewed as the canonical representation of uncertainty.
: If , then there exists a weakly closed set of weighted probability measures with if and only if satisfies REG1, REG2, and REG3; moreover, can be taken to be maximal.
Proof: Clearly, given a weakly closed set