1 Introduction
We study a class of 2player stochastic games with vector payoffs, building upon the classical models proposed by Shapley [32] and Blackwell [5].
A finite zerosum stochastic game is a repeated game with perfect monitoring where the action spaces are finite and the stage payoff depends on a stateparameter that can have finitely many different values and whose evolution is controlled by both players. Shapley [32] proved that such a game with the discounted evaluation has a value . This existence result extends easily to more general evaluations, such as where is the weight of stage , see Laraki and Sorin [19]. For example, in the classical stage game, where for , the corresponding value is denoted . Bewley and Kohlberg [4] proved that every stochastic game has an asymptotic value , i.e. and both converge to the same limit , as and . Mertens and Neyman [22] showed that the players can guarantee uniformly in the sense that for every , each player has a strategy that guarantees up to simultaneously in every discounted game for sufficiently small and in every stage game for sufficiently large . Such a result was earlier obtained by Blackwell and Ferguson [7] for the game BigMatch (introduced by Gillette [12]) and for the class of all absorbing games by Kohlberg [16]. In fact, absorbing games are one of the very few classes where the asymptotic value has an explicit formula in terms of the oneshot game, see Laraki [18]. We recall here that the asymptotic value may even fail to exist if we drop any of the following assumptions: finite action spaces, see Vigeral [37], finite state space, or perfect monitoring, see Ziliotto [38]. Ergodic stochastic games, where all states are visited infinitely often almost surely regardless the actions chosen by the players, are another example.
A Blackwell approachability problem is a 2player repeated game with perfect monitoring and stage vectorpayoffs in which player 1’s objective is to enforce the convergence of to some target set . On the other hand, player 2 aims at preventing and his ultimate objective is to exclude this target set, i.e., to approach the complement of some neighborhood of it. Blackwell [5] proved that the game is uniformly determined for any convex target set: either player 1 can uniformly approach or player 2 can uniformly exclude . More importantly, Blackwell provided a simple geometric characterization of approachable sets from which one can easily build an optimal strategy. However, if is not convex, uniform determinacy fails. This led Blackwell to define a weaker version of determinacy by allowing the strategy to depend on the horizon . Several years later, Vieille [36] solved the problem and proved that a set is weakly approachable if and only if the value of an auxiliary differential game is zero. Weak determinacy follows from the existence of the value in differential games.
Combining the models of Shapley and Blackwell: It is natural to consider stochastic games with vector payoffs and try to characterize approachable target sets in these games, notably to develop new tools for stochastic games with incomplete information. This challenging problem has already been tackled but, so far, only few results have been achieved. The most relevant work, by Milman [23], only apply to ergodic stochastic games and no geometric characterization of approachable sets has been provided. On a different matter, it has been remarked that uniform determinacy failed to hold in stochastic games^{5}^{5}5This remark was already made by Sorin in the eighties in a small but unpublished note; its flavor is provided in Example 15., even in variants of BigMatch.
Guided by the history of stochastic games, we tackle the general model of stochastic games with vector payoffs by focusing, in a first step, on the class of absorbing games, and in particular BigMatch games. Indeed, to obtain a simple geometric characterization of approachable sets, it is helpful to consider an underlying class of games that admit an explicit characterization for the asymptotic value.
We call “generalized quitting games" the subclass of absorbing games we focus on. This terminology refers to quitting games, in which each player has exactly one quitting action and one nonquitting action. In contrast, in our case, one or both players may have none or many quitting actions. The game is repeated until a quitting action is chosen at some stage , in which case it enters an absorbing state that may depend on both actions at stage . When only player 1 (resp. player 2) has quitting actions, the game is called BigMatch of type I (resp. type II).
Main contributions: We introduce three strongly related simple geometric conditions on a convex target set . They are nested (the first implies the second, which implies the third) and they all have flavors of both Blackwell’s condition [5] and Laraki’s formula in [18] for the asymptotic value in absorbing games. We prove that the first condition is sufficient for player 1 to weakly approach and that it can be used to build an approachability strategy, even though the explicit construction is delicate and relies on a calibration technique developed notably in Perchet [24]. The second condition is a useful intermediate condition, but it is neither necessary nor sufficient. The third condition is proven to be necessary: indeed, if does not satisfy it, then is weakly excludable by player 2. Finally, we show that there are convex sets that are neither weakly approachable nor weakly excludable.
We examine BigMatch games in detail. In BigMatch games of type I, our three conditions are shown to be equivalent and to coincide with Blackwell’s condition. This provides a full characterization for weak approachability and proves that this class is weakly determined. This contrasts with the uniform indeterminacy, see Example 15, where we provide a 1dimensional counterexample.
In BigMatch games of type II, the first two conditions are equivalent and they are proven to be necessary and sufficient for uniform approachability. Despite this full characterization, uniform determinacy fails. For weak approachability we show that none of the three conditions is both necessary and sufficient. We also develop in some cases, an approach based on differential game, similarly to Vieille [36].
To summarize, our analysis of BigMatch games reveals that: (1) in BigMatch games of type I, a simple full characterization is available for weak approachability; (2) in BigMatch games of type II, a simple full characterization is available for uniform approachability; (3) uniform determinacy fails in both types of BigMatch games; (4) weak determinacy holds for BigMatch games of type I. Weak determinacy for BigMatch games of type II remains an open problem.
Almost sure approachability: In the classical Blackwell model on convex sets, and in ergodic stochastic games with vector payoffs, weak, uniform, in expectation or almost sure approachability problems are equivalent. In our case, they all differ. In this paper we focus on weak and uniform approachability in expectation, as they appear to be very interesting and challenging in generalized quitting games. We refer to Section 2 and Appendix C for more details.
Related literature: Blackwell approachability is frequently used in the literature of repeated games. It was used first by Aumann and Maschler [2] to construct optimal strategies in zerosum repeated games with incomplete information and perfect monitoring. Their construction has been extended by Kohlberg [17] to the imperfect monitoring case. Blackwell approachability was further used by Renault and Tomala [30] to characterize the set of communication equilibria in player repeated games with imperfect monitoring; by Hörner and Lovo [14] and Hörner, Lovo and Tomala [15] to characterize belieffree equilibria in player repeated games with incomplete information, and by Tomala [35] to characterize belieffree communication equilibria in player repeated games. Blackwell approachability has also been used to construct adaptive strategies leading to correlated equilibria (see Hart and MasColell [13]
), machine learning strategies minimizing regret (see Blackwell
[6], Abernethy, Bartlett and Hazan [1]), and calibrating algorithms in prediction problems (see Dawid [8], Foster and Vohra [11], Perchet [26, 28]). In fact, one can show that Blackwell approachability, regretminimization and calibration are formally equivalent (see for instance Abernethy, Bartlett and Hazan [1] or Perchet [26]).Applications: Classical machine learning assumes that a one stage mistake has small consequences. Our paper allows to tackle realistic situations where the total payoff can be affected by one stage decisions. One could think of clinical trials between two treatments: at some point in time one of the two must be selected and prescribed to the rest of the patients. At a more theoretical level, as in Aumann and Maschler [2] for zerosum repeated games with incomplete information, our paper may be a useful step towards a characterization of the asymptotic value of absorbing games with incomplete information and determining the optimal strategy of the non–informed player. A problem for which we know existence of the asymptotic value (see Rosenberg [31]), and have some explicit characterizations of the asymptotic value in BigMatch games (see Sorin [33, 34]).
2 Model and Main Results
In this section, we describe the model of generalized quitting games, the problem of Blackwell approachability, and present our main results.
Generalized quitting games.
We denote by the finite set of (pure) actions of player 1 and by the finite set of actions of player 2. The actions in and are called nonquitting, and the actions in and are called quitting. A payoff vector is associated to each pair of actions , and to ease notations we assume that .
The game is played at stages in as follows: at stage , the players choose actions simultaneously, say and . If only nonquitting actions have been played before stage , i.e. and for every , then player 1 is free to choose any action in and player 2 is free to choose any action in . However, if a quitting action was played by either player at a stage prior to stage , i.e. or for some , then the players are obliged to take and . Another equivalent way to model this setup is to assume that, as soon as a quitting action is played, the game absorbs in a state where the payoff is constant.
When a player plays a quitting action and neither player has played a quitting action before, we say that this player quits and that play absorbs.
Mixed actions.
A mixed action for a player is a probability distribution over his (pure) actions. We will denote mixed actions of player 1 by
, , . Thus, a bold letter stands for a mixed action over the full set of actions , a regular letter for a mixed action restricted to nonquitting actions in and a letter with an asterisk for a mixed action over the set of quitting actions in . Similarly, we denote mixed actions of player 2 by , and .To introduce our conditions for a convex set to be approachable, it will be helpful to consider finite nonnegative measures on and instead of probability distributions. We shall denote them by for player 1 and by for player 2.
The payoff mapping is extended as usual multilinearly to the set of mixed actions and and, more generally, to the set of measures and .
We also introduce the “measure” or “probability of absorption” and the “expected absorption payoff” (which is not the expected payoff conditional to absorption), defined respectively by
Strategies.
In our model of generalized quitting games, histories are defined as long as no quitting action is played. Thus, the set of histories is the set of finite sequences in (that is, ). A strategy for player 1 is a mapping , and a strategy for player 2 is a mapping .
Specific subclasses of games: BigMatch games.
We shall consider two subclasses of generalized quitting games, in which only one of the players can quit.
Following the nomenclature of Gillette [12], a generalized quitting game is called a BigMatch game of type I if player 1 has at least one nonquitting action – to avoid degenerate cases – and at least one quitting action, but player 2 has no quitting action, i.e. , and .
A generalized quitting game is called a BigMatch game of type II if player 2 has at least one nonquitting action and at least one quitting action, but player 1 has no quitting action, i.e. , and .
Objectives.
In short, the objective of player 1 is to construct a strategy such that, for any strategy of player 2, the expected average payoff is close to some exogenously given convex set , called the “target set”. Instead of the Cesaro average, we can also consider the expected discounted payoff or even a general payoff evaluation , where and , with the interpretation that is the weight of stage .
We emphasize here that we focus on the distance of the expected average payoff to (and not on the expected distance of the average payoff to , corresponding to almost sure convergence, see e.g. Milman [23]), as it is might be more traditional and even challenging in stochastic games. Indeed, consider the toy game where player 1 has only two actions, both absorbing, and they give payoffs and respectively. In this game, is obviously not approachable in the almost sure sense, but is easily approachable in the expected sense by playing each action with probability at the first stage. We still quickly investigate almost sure approachability in Appendix C.
We can distinguish at least two different concepts of approachability, that we respectively call uniform approachability and weak approachability.
Specifically, we say that a convex set is uniformly approachable by player 1, if for every player 1 has a strategy such that after some stage , the expected average payoff is close to , against any strategy of player 2. Stated with quantifiers
Reciprocally, a convex set is uniformly excludable by player 2 if she can uniformly approach the complement of some neighborhood of , for some fixed .
A similar definition holds for general evaluations induced by a sequence of weights such that and for all . For every , there must exist a threshold so that if the sequence satisfies then the evaluation of the payoffs is within distance of . We emphasize here that the Cesaro average corresponds to for while the discounted evaluation, with discount factor corresponds to for all . We then denote the accumulated weighted average payoff up to stage as
We now focus on our main objective, weak approachability. We say that a convex set is weakly approachable by player 1, if for every , if the horizon of the game is sufficiently large and known, player 1 has a strategy such that the expected payoff is close to , against any strategy of player 2. Stated with quantifiers
Reciprocally, a convex set is weakly excludable by player 2 if she can weakly approach the complement of some neighborhood of . This definition of weak approachability may be extended, just as above, for general evaluation, where the strategy of player 1 depends on .
Observe that we can assume without loss of generality that the target set is closed, because approaching a set or its closure are two equivalent problems.
We emphasize that, without an irreversible Markov chain structure, uniform approachability will be equivalent to weak approachability, because of the doubling trick, when the target set is convex. However, as we shall see, it is no longer the case in generalized quitting games.
Reminder on approachability in classical repeated games.
Blackwell [5] proved that in classical repeated games (i.e., when ) there is a simple geometric necessary and sufficient condition under which a convex set is (uniformly and weakly) approachable. It reads as follows
This immediately entails that a convex set is either weakly approachable or weakly excludable.
Approachability conditions.
We aim at providing a similar geometric condition ensuring that a convex set is weakly approachable (or weakly excludable). Inspired by a recent formula, obtained in Laraki [18], which characterizes the asymptotic value by making use of perturbations of mixed actions with measures, we introduce the following three conditions.
The strongest of the three conditions is:
(1) 
The next condition will be shown to be a useful intermediate condition:
(2) 
Finally, the weakest of the three conditions is:
(3) 
We emphasize here that, in the above conditions, the maxima and minima are indeed attained since the mapping is Lipchitz.
Main results.
We can already state our main results, which we will prove throughout the paper. In these results, approachability always refers to player 1 whereas excludability always refers to player 2.
Theorem 1 (Weak Approachability)
The theorem above is proven in Section 3.
In the special class of BigMatch games, our findings for weak approachability are summarized by the next proposition.
Proposition 2 (Weak Approachability in BigMatch Games)
In the proposition above, the first part of the claim on BigMatch games of type I follows from Lemma 8. The second part is then a direct consequence of Theorem 1. Indeed, suppose that a convex set is not weakly approachable. Then by Theorem 1, does not satisfy Condition (1). Since Conditions (1) and (3) coincide, does not satisfy Condition (3) either. Hence, by Theorem 1 once again, is weakly excludable.
For uniform approachability we obtain the following results.
Proposition 3 (Uniform Approachability in BigMatch Games)

In BigMatch games of either type: There are convex sets which are weakly approachable but not uniformly approachable; and so they are neither uniformly approachable nor uniformly excludable.
The first claim is shown by Examples 15 and 16. The second claim on BigMatch games of type II follows from Proposition 12.
Notice that, in the results above, approachability conditions for BigMatch games of types I and II are drastically different. The necessary and sufficient weak approachability condition takes a simple form for type I, but not for type II, the situation being completely reversed for uniform approachability. The second consequence is that determinacy of convex sets is very specific to the original model of Blackwell [5]. We remark that determinacy also fails in the standard model of Blackwell, if player 1 has an imperfect observation on past actions of player 2, as proved by Perchet [25] by providing an example of a convex set that is neither approachable nor excludable.
Outline of the Paper.
The remaining of the paper is organized as follows. In Section 3, we prove Theorem 1. In Section 4 we compare the notions of weak and uniform approachability, with a focus on BigMatch games. In Section 5, we present several examples. Additional results and examples can be found in the Appendices.
3 Necessary and Sufficient Conditions for Weak Approachabilty
In this section, we prove Theorem 1. First we shall prove that, assuming the sufficiency of Condition (1) for weak approachability, Condition (3) is indeed necessary. Then, we show that Condition (1) ensures weak approachability.
3.1 If Condition (1) is Sufficient, then Condition (3) is Necessary
As claimed, we will prove later in Proposition 11 that Condition (1) is sufficient for the weak approachability of convex sets in generalized quitting games. The purpose of this section is to demonstrate that this entails the necessity of Condition (3) by switching the role of players 1 and 2.
Proposition 4
Proof. As Condition (3) is not satisfied for , there exists such that
Choose some and that realize the supremum up to . It is not difficult to see that
is a bounded convex set that is away from and whose closure is denoted by .
To prove the convexity of the above set, suppose that
Taking , we obtain that
Thus we have proved that
and that satisfies Condition (1), but stated from the point of view of player 2. Therefore, by assumption, player 2 can weakly approach , and hence she can weakly approach the complement of the neighborhood of . This means that is weakly excludable by player 2 (and in particular not weakly approachable by player 1), as desired.
3.2 Condition (1) is Sufficient
We prove in this section that Condition (1) is sufficient for the weak approachability of a convex set . Assuming that the target set satisfies Condition (1), the construction of the approachability strategy will be based on a calibrated algorithm, as introduced by Dawid [8]. Similar ideas can be found in the online learning literature (see, e.g., Foster and Vohra [11], Perchet [24], and Bernstein, Mannor and Shimkin [3]), where Blackwell approachability and calibration now play an increasingly important role (as evidenced by Abernethy, Bartlett and Hazan [1], Mannor and Perchet [20], Perchet [27], Rakhlin, Sridharan and Tewari [29], and Foster, Rakhlin, Sridharan and Tewari [10]).
For the sake of clarity, we divide this section in several parts. First, we introduce the auxiliary calibration tool, then we prove the sufficiency condition in BigMatch games of types II and I, and finally we show how the main idea generalizes.
3.2.1 An auxiliary tool: Calibration
In this subsection, we adapt a result of Mannor, Perchet and Stoltz [21] on calibration to the setup with the general payoff evaluation (thus not necessarily with Cesaro averages).
We recall that calibration is the following sequential decision problem. Consider a nonempty and finite set , a finite grid of the set of probability distributions on denoted by where for , and a sequence of weights . At each stage , Nature chooses a state and the decision maker predicts it by choosing simultaneously a point of the grid . Once is chosen, the state and the weight are revealed (we emphasize that the sequence is not necessarily known in advance by the decision maker).
We denote by the set of stages where was predicted and by
the empirical weighted distribution of the state on .
In that setting, we say that an algorithm of the decision maker is calibrated if
almost surely.
Lemma 5
The decision maker has a calibrated algorithm such that, for all ,
Proof. The proof is almost identical to the one in Mannor, Perchet and Stoltz [21], Appendix A, thus is omitted.
3.2.2 Condition (1) is sufficient in BigMatch games of type II
We first focus on Bigmatch games of type II, where only player 2 can quit. The following lemma exhibits a useful equivalence between some of the conditions.
Lemma 6
A consequence of (4) is that if player 2, at every stage, either plays a nonquitting action i.i.d. accordingly to or decides to quit, then player 1 can approach by playing i.i.d. accordingly to . The sufficiency of this condition means that it is not more complicated to approach against an opponent than against an i.i.d. process that could eventually quit at some (unknown) time.
Now we prove that Condition (2) implies (4). So, assume that satisfies Condition (2). Since for all and , Condition (2) implies
Now (4) follows by taking and respectively taking with tending to infinity.
Finally, we prove that (4) implies Condition (1). So, assume that satisfies (4). Let . Decompose it as , where , and . For this , let be given by (4). Then
because all involved payoffs, and , belong to the convex set . Since we can choose , we have shown that Condition (1) holds, as desired.
Proposition 7
In BigMatch games of type II, a convex set is (weakly or uniformly) approachable by player 1 if Condition (1) is satisfied.
Proof. As advertised, the approachability strategy we will consider is based on calibration (as it can be generalized to more complex settings). The main insight is that player 1 predicts, stage by stage, using a calibrated procedure and plays the response given by Lemma 6. Let be the sequence of weights used for the general payoff evaluation (recall that Cesaro average corresponds to while discounted evaluation is ).
Let be a finite discretization of and be given by Lemma 6 for every . Consider the calibration algorithm introduced in Lemma 5 with respect to the sequence of weights . The strategy in the BigMatch game of type II is defined as follows: whenever is predicted by the calibration algorithm, player 1 actually plays accordingly to .
Assume that player 2 has never chosen an action in before stage . Then Lemma 5 ensures that
where is the weighted empirical distribution of actions of player 2 on the set of stages where was predicted. We recall that on each of these stages, player 1 played accordingly to so that the average weighted expected payoffs on those stages is .
Summing over , we obtain that
We stress that the payoff on the lefthand side of the above equation is exactly the expected weighted average vectorial payoff obtained by player 1 up to stage . As a consequence, if player 2 never uses quitting action in , letting to infinity in the above equation yields
It remains to consider the case where player 2 used some quitting action at stage . At that stage, player 1 played accordingly to for some , which ensures that . As a consequence, absorption took place and the expected absorption payoff belongs to . We therefore obtain that
hence the result.
3.2.3 Condition (1) is sufficient in BigMatch games of type I
We now turn to the case of Bigmatch games of type I, where only player 1 can quit. In those games, we have the following useful equivalence result.
Lemma 8
A consequence of (6) is that if player 2 plays i.i.d. according to , then player 1 can approach by playing “perturbed” by with an overall total probability of absorption of .
Proof: We decompose the proof in three main parts.
Part a. First we argue that Condition (1) implies the Blackwell condition (5). Decompose every as , where . Similarly, decompose every into and . Then the fraction in Condition (1) can be rewritten into
(7) 
Now suppose that satisfies Condition (1). Let , and take an that gives the minimum in Condition (1). We distinguish two cases.
Suppose first that . Then, by taking in (7) and letting tend to infinity, we find for all . Hence, .
Now assume that . Define . Then in view of (7), we have
So, in both cases, satisfies the Blackwell condition (5).
Part b. Now we prove that the Blackwell condition (5) implies Condition (1). So, assume that satisfies (5). Then, for every , we decompose again the associated as . The choice of where ensures that for all , and hence satisfies Condition (1).
Part c. We already know that Condition (1) implies Condition (2), which further implies Condition (3). Since Condition (1) is equivalent to the Blackwell condition (5), it only remains to verify that Condition (3) implies (5). This can be easily checked by taking and using the same decomposition trick as above.
Proposition 9
In BigMatch games of type I, a convex set is weakly approachable by player 1 if Condition (1) is satisfied.
Proof: The approachability strategy is rather similar to the one introduced for BigMatch games of type II. Given the finite discretization of denoted by , Lemma 8 guarantees, for any , the existence of , and such that is close to .
Based on an auxiliary calibration algorithm (to be adapted and described later) whose prediction at stage is some , we consider the strategy of player 1 that dictates to play at this stage
Thus at stage , with probability player 1 quits according to (then the expected absorption payoff is ), and with the remaining probability, which is positive, he plays and play does not absorb at this stage. Since the cumulative weight of all the remaining stages is , the associated expected payoff of the decision taken at stage is
where . As a consequence, summing over and using the fact that the game is absorbed at stage with probability , we obtain that
We stress the fact that the sequence is predicable with respect to the filtration induced by the strategies of players 1 and 2 (i.e., it does not depend on the choices made at stages ).
To define the strategy of player 1, we consider an auxiliary algorithm calibrated with respect to the sequence of weights (which is possible even though depends on the past predictions). Using the fact that , Lemma 5 and the same argument as in Bigmatch games of type II, we obtain that our strategy guarantees the following:
The result follows by, for instance, taking .
3.2.4 Condition (1) is sufficient in all generalized quitting games
Using the tools introduced in the previous subsections for BigMatch games of types I and II, we are now able to give a simple proof of the main result, that Condition (1) is sufficient to ensure weak approachability in all generalized quitting games.
We start with a useful consequence of Condition (1).
Lemma 10
In generalized quitting games, Condition (1) implies that at least one of the following conditions holds:

with
(8) 
with
(9)
Proof: Let . Assume first that Condition (1) is satisfied with some that puts positive weight on , i.e. with . Then, taking in the expression of Condition (1) with going to infinity, for any , yields condition (a).
Otherwise, Condition (1) is satisfied for some . Then, the same argument as in the proof of Lemma 8 yields condition (b).
Proposition 11
In generalized quitting games, a convex set is weakly approachable by player 1 if Condition (1) is satisfied.
Proof. Assume that Condition (1) is satisfied. Then either condition (a) or condition (b) of Lemma 10 is satisfied.
First assume that condition (a) of Lemma 10 is satisfied. Then, player 1 just has to play i.i.d. according to . Indeed, then the probability of absorption at each stage is at least , so absorption will eventually take place with probability 1, and by condition (a) the expected absorption payoff is in . As a consequence,
hence the result.
Now assume that condition (b) of Lemma 10 is satisfied. We claim that the strategy defined in the proof of Proposition 9 is an approachability strategy. Indeed, as long as player 2 does not play an absorbing action , the analysis is identical.
If, on the other hand, player 2 plays at stage , then the absorbing payoff is equal to
Comments
There are no comments yet.