1 Introduction
Stochastic games provide a general model for studying dynamic interactions between players whose actions affect the state of the environment. The change in state is described by a probability distribution called the law of motion. The first such games were introduced by Shapley
[17]. We may view his model as discretetime finite twoplayer zerosum games, where players receive immediate payoff in each round of play and discount future payoffs. Shapley proved that in such games optimal stationary (or, memoryless) strategies exists. The initial model of Shapley has since been extensively extended and studied in many variations. With each model the main question is existence of optimal strategies or a Nash equilibrium. Next, it is of interest how complicated such strategies must be. We shall limit our discussion to discretetime games having an arbitrary but finite number of states.Everett [6] defined recursive games, where players only receive a (possibly) nonzero payoff when play terminates by entering special absorbing states. These payoffs are also called terminal payoffs. While players are no longer guaranteed to have optimal strategies, Everett proved that they do have optimal stationary strategies. Gillette [9] considered finite twoplayer zerosum games where players again receive immediate payoffs each round of play, but now evaluate their payoff as the average of the immediate payoffs received (limit average payoff). Here players are no longer guaranteed to have optimal stationary strategies, but as shown by Mertens and Neyman [15] they do have optimal strategies. An even more general result was obtained by Martin [13] showing that for twoplayer zerosum games where payoffs are Borel measurable functions of the history of play, the players have optimal strategies. Here the extension from deterministic games (i.e. games having a deterministic law of motion) to the general case is due to an observation of Maitra and Sudderth [11].
For nonzero sum games much less is known. For discounted payoffs, a Nash equilibrium exists in stationary strategies as shown by Fink [7] and Takahashi [19]. The existence of Nash equilibrium in recursive games is an open problem, even for three players. In addition, Flesch, Thuijsman and Vrieze [8] gave an example of a twoplayer recursive game without stationary Nash equilibrium. Vieille [21, 22] proved existence of Nash equilibrium in every twoplayer game with limitaverage payoff.
Mertens and Neyman (cf. [14]) showed, using the celebrated determinacy result by Martin [12], that an Nash equilibrium exists in any turnbased (i.e. perfect information) game with Borel payoff functions. Later this was observed again by Chatterjee et al. [5]. When the payoff function has finite range, an actual Nash equilibrium exists. This is particularly the case of deterministic games where the payoff function is the indicator function of a Borel set. We refer to the indicator function of a Borel set as well as the set itself as a Borel winning set or Borel objective.
The most basic of these are given by the open and closed sets. Given a set of states , the reachability objective given by consists of the histories of play that visit a state in . The safety objective given by consists of the histories of play that stays within the states in . These winning sets are the open and closed Borel objectives typically studied, and they have applications in the verification and synthesis of reactive systems [4].
Games where the players have reachability or safety objectives are closely related to recursive games. First note that for a given recursive game, after normalizing all payoffs to be in the range
, every terminal payoff vector can be written as a convex combination of payoff vectors having only entries from the set
. This means that any absorbing state can be replaced by a set of absorbing states where all players have payoffs in the set as well by modifying the (probabilistic) law of motion accordingly. Then, if a player only receive terminal payoffs from the set , this is equivalent to a safety objective, and likewise if a player only receive terminal payoffs from the set , this is equivalent to a reachability objective.Secchi and Sudderth [16] considered the class of games where each player has a safety objective, and called these games for stayinaset games. For these games they proved existence of a Nash equilibrium in any (finite) stayinaset game. The equilibrium strategies are not stationary but prescribe, as a function of the set of the players whose safety objective has not yet been violated, a stationary strategy profile. A natural open question raised by Secchi and Sudderth was then existence of a stationary Nash equilibrium. We give an example of a twoplayer game without a stationary Nash equilibrium. Our game is furthermore turnbased. By example we also illustrate the Nash equilibria obtained from the proof of Secchi and Sudderth. They rely crucially on the willingness of the second player to change strategy after already having lost. Finally we note that players do have a stationary Nash equilibrium.
It is necessary that our example game is not deterministic. In fact, in every deterministic twoplayer turnbased games, where each player has a reachability or a safety objective, a Nash equilibrium exists in positional (i.e. pure and memoryless) strategies. This follows from the fact that twoplayer zerosum games with a reachability and safety objective are positionally determined. Thus in the nonzero sum game it is either the case that one of the two players may guarantee a win (and relative to that we let the other player play optimally) or it is the case that both players can ensure that the opponent loses.
2 The game
The game we consider is played by two players each taking turns in choosing whether to continue the game or to attempt to quit the game, with Player 1 making the first choice. A choice of the quit action by one of the players is successful with probability , and otherwise the game continues with the other player as before. If Player 1 makes the choice to quit both players win with probability and both players lose with the remaining probability . If Player 2 makes the choice to quit both players win with probability and both players lose with the remaining probability . Finally, Player 2 is incentivized to choose quit by having the continue action of Player 2 lead to a loss for Player 2 with probability . Infinite play leads to Player 1 winning (and Player 2 losing with probability 1). This leads to a discontinuity in the payoff function of Player 1, which is crucial for our example.
The game is illustrated in Figure 1 and is modeled with a set of 5 states , with Player 1 controlling state 1, Player 2 controlling state 2. State exists merely to enforce a loss to Player 2, whereas the states and are winning and losing states of both players, respectively. The game is a stayinaset game with the safe sets of the two players being and , respectively. The diamondshaped nodes in Figure 1 are used to indicate the probabilistic transitions.
A stationary strategy profile in can be described by a pair of probabilities , where is the probability that Player chooses the quit action , when in state (and thus is the probability that Player chooses the continue action , when in state ).
2.1 No stationary Nash equilibrium
We give here a simple analysis showing that no stationary Nash equilibrium exists in . We can place all plays of in 3 groups. Group 1 are plays where Player 1 quits successfully, group 2 are plays where Player 2 quits successfully, and group 3 are plays that never reach or .
Consider a stationary strategy profile given by . When the play belongs to group 3, where Player 1 wins and Player 2 loses with probability . When or the play belongs to group 1 or group 2 with probability . The players both prefer a play from group 1 where Player 1 is the player to quit successfully.
Suppose that is a Nash equilibrium. If , then the only best reply of Player 1 is to have , since otherwise is reached with positive probability. But if also , Player 2 loses with probability 1, whereas would lead to reaching with positive probability. This rules out having in a Nash equilibrium.
Suppose now , which means that the play belongs to group or group with probability 1. The probability that the play belongs to group 1 strictly increases with , and it follows that we must have . But this is also not a Nash equilibrium, as Player 2 would then be better off having . Indeed, let us consider a play from state 2 until the play either returns to state 2, reaches state before returning to state 2, or reaches state or state before returning to state 2. We denote these events a return, a win, or a loss.
The quit action for Player 2 has probability of a loss, probability of a win and of a return. The continue action has probability of a loss, probability of a win and of a return. Since a return is better than a loss for Player 2, this rules out in a Nash equilibrium as well.
2.2 Detailed payoff analysis
For , let be the payoff to Player of the strategy profile when starting play in state . The payoffs satisfy the following equations
and from these follows further
When both and we have that and . When at least one of or holds, we can solve for , and likewise we can always solve for to obtain
Using , we find that
And likewise , which means that
The function is continuous in the entire domain, whereas the function has a single discontinuity when . Note that for all , and for all .
The best replies of the players are as follows. If , the only best reply of Player 1 is to have , giving . If , the only best reply of Player 1 is to have , giving . If , the only best reply for Player 2 is to have , giving . When , Player 2 has no preferred action. Finally, if the only best reply of Player 2 is to have , giving .
2.3 Nash equilibria
We give here two examples of Nash equilibria in the game following the general result of Secchi and Sudderth [16]. The idea is that once Player 2 has lost by entering state the incentive of Player 2 is removed and all strategies are equally good.
Suppose first that Player 2 commits to always playing the continue action after entering state . The best reply of Player 1 is then to always play the continue action as well, ending up with payoff 1. We may thus consider the modified game that stops when entering upon which Player 1 receives payoff 1. This lead to the modified equation
giving
which solves to
and we see that .
A Nash equilibria is thus that the players play the quit action with probabilities and respectively until state is reached and after which both players play the quit action with probability . The equilibrium payoffs are and .
Suppose next that Player 2 commits to always playing the quit action after entering state . The best reply of Player 1 is then to always play the quit action as well, ending up with payoff . The modified game now has the equation
giving
which solves to
We find that , which means that the best reply of Player 1 is always to play the quit action, and in turn the best reply of Player 2 to that is to always play the continue action.
A Nash equilibria is thus that the players play the quit action with probabilities and respectively until state is reached and after which Player 2 changes to playing the quit action with probability as well. The equilibrium payoffs are here and .
2.4 Stationary Nash equilibrium
Whereas we have shown that the game has no stationary Nash equilibrium, it does have Nash equilibria, for any .
When no Nash equilibrium can have . Indeed, then the only best reply of Player 1 would be the actual best reply having . To that, any best reply of Player 2 must have , when .
A few examples of Nash equilibria are given by and , given by and , and given by and . We omit the simple task of verifying that these are indeed Nash equilibria. In the payoffs are and , and in both and the payoffs satisfy and . We note that Player 1 is playing the best reply in , but is far from the best reply in and . Player 2 is playing close to the best reply in and , but far from the best reply in .
3 Conclusion and Further Problems
We have given a simple example of a twoplayer turnbased game with safety objectives for both players without a stationary Nash equilibrium. A remaining open question is the existence of a stationary Nash equilibrium when players have safety objectives, even in the case of twoplayer turnbased games.
Several related open questions concern games with reachability objectives or with combinations of reachability and safety objectives. We first consider the setting where all players have reachability objectives, also called reachaset games [5]. Flesch, Thuijsman and Vrieze [8] give an example of a threeplayer recursive game with nonnegative payoffs with no stationary Nash equilibrium. The game is furthermore deterministic. Simon [18] gave an example of a twoplayer recursive game with nonnegative payoffs with no stationary Nash equilibrium. These both give examples of reachaset games without stationary Nash equilibria by the general method of simulating terminal payoffs with the probabilistic law of motion. The example of Flesch, Thuijsman and Vrieze is however such that the terminal payoff vectors satisfy that either none or precisely two players receive a strictly positive payoff. The payoff vectors where two players receive strictly positive payoff can (after scaling) be constructed as unique equilibrium payoffs of winlose bimatrix games ^{1}^{1}1These payoff vectors are and . It is easy to construct two bimatrix games with only payoffs from the set in which the unique equlibrium payofff vectors are and , respectively, which may replace and . This then results in a threeplayer deterministic reachaset game with no stationary Nash equilibrium.
For twoplayer games, it was erroneously claimed (cf. [3]) first by Chatterjee et al. [5] and later again by Ummels and Wojtczak [20] that a simple adaptation of an example of a zerosum game of Everett resulted in a deterministic reachaset game without a Nash equilibrium. Thus it remains an open question whether every deterministic twoplayer reachaset game has a Nash equilibrium. It is also an open problem whether every deterministic twoplayer reachaset game has a stationary Nash equilibrium. Boros and Gurvich [1] and Kuipers et al. [10] give an example of a threeplayer turnbased recursive game with nonnegative payoffs that has no stationary Nash equilibrium. Do every twoplayer turnbased reachaset game have a stationary Nash equilibrium?
Little is known when some players have a reachability objective and some players a safety objectives. In the twoplayer zerosum case an example of Everett [6] shows that optimal strategies, and hence a Nash equilibrium, may fail to exist. On the other hand an optimal stationary equilibrium always exists. Do every twoplayer game where one player has a reachability objective and one player a safety objetive always have a stationary Nash equilibrium? In the case of turnbased games, it is an open problem whether every threeplayer deterministic game has a stationary Nash equilibrium. An example given by Boros et al. [2] appears to be close to answer this question. Namely, Boros et al. construct a threeplayer deterministic recursive game without a stationary Nash equilibrium, that may be realized with payoffs such that player two has only nonnegative terminal payoffs and player one and player three have only nonpositive terminal payoffs.
References
 [1] E. Boros and V. Gurvich. On Nashsolvability in pure stationary strategies of finite games with perfect information which may have cycles. Mathematical Social Sciences, 46(2):207 – 241, 2003.
 [2] Endre Boros, Vladimir Gurvich, Martin Milanič, Vladimir Oudalov, and Jernej Vičič. A threeperson deterministic graphical game without nash equilibria. Discrete Applied Mathematics, 243:21 – 38, 2018.
 [3] Patricia Bouyer, Nicolas Markey, and Daniel Stan. Mixed Nash Equilibria in Concurrent TerminalReward Games. In Venkatesh Raman and S. P. Suresh, editors, FSTTCS 2014, volume 29 of Leibniz International Proceedings in Informatics (LIPIcs), pages 351–363. Schloss Dagstuhl–LeibnizZentrum fuer Informatik, 2014.
 [4] Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic regular games. J. Comput. Syst. Sci, 78(2):394–413, 2012.
 [5] Krishnendu Chatterjee, Rupak Majumdar, and Marcin Jurdzinski. On Nash equilibria in stochastic games. In Jerzy Marcinkowski and Andrzej Tarlecki, editors, CSL 2004, volume 3210 of Lecture Notes in Computer Science, pages 26–40. Springer, 2004.
 [6] H. Everett. Recursive games. In Contributions to the Theory of Games Vol. III, volume 39 of Ann. Math. Studies, pages 67–78. Princeton University Press, 1957.
 [7] A. M. Fink. Equilibrium in a stochastic person game. J. Sci. Hiroshima Univ. Ser. AI Math., 28(1):89–93, 1964.
 [8] János Flesch, Frank Thuijsman, and O. J. Vrieze. Recursive repeated games with absorbing states. Math. Oper. Res, 21(4):1016–1022, 1996.
 [9] D. Gillette. Stochastic games with zero stop probabilities. In Contributions to the Theory of Games III, volume 39 of Ann. Math. Studies, pages 179–187. Princeton University Press, 1957.
 [10] Jeroen Kuipers, János Flesch, Gijs Schoenmakers, and Koos Vrieze. Pure subgameperfect equilibria in free transition games. European Journal of Operational Research, 199(2):442 – 447, 2009.

[11]
Ashok P. Maitra and William D. Sudderth.
Finitely additive stochastic games with borel measurable payoffs.
Int. J. Game Theory
, 27(2):257–267, 1998.  [12] Donald A. Martin. Borel determinacy. Annals of Mathematics, 102(2):363–371, 1975.
 [13] Donald A. Martin. The determinacy of blackwell games. J. Symb. Log, 63(4):1565–1581, 1998.
 [14] J.F. Mertens. Repeated games. In Proceedings of the International Congress of Mathematicians, 1986, pages 1528–1577. American Mathical Society, San Diego, 1987.
 [15] J.F. Mertens and A. Neyman. Stochastic games. Int. J. of Game Theory, pages 53–66, 1981.
 [16] Piercesare Secchi and William D. Sudderth. Stayinaset games. Int. J. Game Theory, 30(4):479–490, 2002.
 [17] L.S. Shapley. Stochastic games. Proc. Natl. Acad. Sci. U. S. A., 39:1095–1100, 1953.
 [18] Robert Samuel Simon. Value and perfection in stochastic games. Israel Journal of Mathematics, 156(1):285–309, 2006.
 [19] Masayuki Takahashi. Equilibrium points of stochastic noncooperative person games. J. Sci. Hiroshima Univ. Ser. AI Math., 28(1):95–99, 1964.
 [20] Michael Ummels and Dominik Wojtczak. The complexity of nash equilibria in limitaverage games. In JoostPieter Katoen and Barbara König, editors, CONCUR 2011, pages 482–496. Springer Berlin Heidelberg, 2011.
 [21] Nicolas Vieille. Twoplayer stochastic games i: A reduction. Israel Journal of Mathematics, 119(1):55–91, 2000.
 [22] Nicolas Vieille. Twoplayer stochastic games ii: The case of recursive games. Israel Journal of Mathematics, 119(1):93–126, 2000.
Comments
There are no comments yet.