A Stay-in-a-Set Game without a Stationary Equilibrium

03/28/2019
by   Kristoffer Arnsfelt Hansen, et al.
Aarhus Universitet
0

We give an example of a finite-state two-player turn-based stochastic game with safety objectives for both players which has no stationary Nash equilibrium. This answers an open question of Secchi and Sudderth.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/28/2018

Effect of information asymmetry in Cournot duopoly game with bounded rationality

We investigate the effect of information asymmetry on a dynamic Cournot ...
06/01/2021

Colonel Blotto Games with Favoritism: Competitions with Pre-allocations and Asymmetric Effectiveness

We introduce the Colonel Blotto game with favoritism, an extension of th...
10/13/2021

EPROACH: A Population Vaccination Game for Strategic Information Design to Enable Responsible COVID Reopening

The COVID-19 lockdowns have created a significant socioeconomic impact o...
04/01/2021

Back to Square One: Superhuman Performance in Chutes and Ladders Through Deep Neural Networks and Tree Search

We present AlphaChute: a state-of-the-art algorithm that achieves superh...
04/18/2021

Explaining the Entombed Algorithm

In <cit.>, John Aycock and Tara Copplestone pose an open question, namel...
11/29/2020

Minimax Sample Complexity for Turn-based Stochastic Game

The empirical success of Multi-agent reinforcement learning is encouragi...
02/25/2021

A Fragile multi-CPR Game

A Fragile CPR Game is an instance of a resource sharing game where a com...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Stochastic games provide a general model for studying dynamic interactions between players whose actions affect the state of the environment. The change in state is described by a probability distribution called the law of motion. The first such games were introduced by Shapley 

[17]. We may view his model as discrete-time finite two-player zero-sum games, where players receive immediate payoff in each round of play and discount future payoffs. Shapley proved that in such games optimal stationary (or, memoryless) strategies exists. The initial model of Shapley has since been extensively extended and studied in many variations. With each model the main question is existence of optimal strategies or a Nash equilibrium. Next, it is of interest how complicated such strategies must be. We shall limit our discussion to discrete-time games having an arbitrary but finite number of states.

Everett [6] defined recursive games, where players only receive a (possibly) non-zero payoff when play terminates by entering special absorbing states. These payoffs are also called terminal payoffs. While players are no longer guaranteed to have optimal strategies, Everett proved that they do have -optimal stationary strategies. Gillette [9] considered finite two-player zero-sum games where players again receive immediate payoffs each round of play, but now evaluate their payoff as the average of the immediate payoffs received (limit average payoff). Here players are no longer guaranteed to have -optimal stationary strategies, but as shown by Mertens and Neyman [15] they do have -optimal strategies. An even more general result was obtained by Martin [13] showing that for two-player zero-sum games where payoffs are Borel measurable functions of the history of play, the players have -optimal strategies. Here the extension from deterministic games (i.e. games having a deterministic law of motion) to the general case is due to an observation of Maitra and Sudderth [11].

For non-zero sum games much less is known. For discounted payoffs, a Nash equilibrium exists in stationary strategies as shown by Fink [7] and Takahashi [19]. The existence of -Nash equilibrium in recursive games is an open problem, even for three players. In addition, Flesch, Thuijsman and Vrieze [8] gave an example of a two-player recursive game without stationary -Nash equilibrium. Vieille [21, 22] proved existence of -Nash equilibrium in every two-player game with limit-average payoff.

Mertens and Neyman (cf. [14]) showed, using the celebrated determinacy result by Martin [12], that an -Nash equilibrium exists in any turn-based (i.e. perfect information) game with Borel payoff functions. Later this was observed again by Chatterjee et al. [5]. When the payoff function has finite range, an actual Nash equilibrium exists. This is particularly the case of deterministic games where the payoff function is the indicator function of a Borel set. We refer to the indicator function of a Borel set as well as the set itself as a Borel winning set or Borel objective.

The most basic of these are given by the open and closed sets. Given a set of states , the reachability objective given by consists of the histories of play that visit a state in . The safety objective given by consists of the histories of play that stays within the states in . These winning sets are the open and closed Borel objectives typically studied, and they have applications in the verification and synthesis of reactive systems [4].

Games where the players have reachability or safety objectives are closely related to recursive games. First note that for a given recursive game, after normalizing all payoffs to be in the range

, every terminal payoff vector can be written as a convex combination of payoff vectors having only entries from the set

. This means that any absorbing state can be replaced by a set of absorbing states where all players have payoffs in the set as well by modifying the (probabilistic) law of motion accordingly. Then, if a player only receive terminal payoffs from the set , this is equivalent to a safety objective, and likewise if a player only receive terminal payoffs from the set , this is equivalent to a reachability objective.

Secchi and Sudderth [16] considered the class of games where each player has a safety objective, and called these games for stay-in-a-set games. For these games they proved existence of a Nash equilibrium in any (finite) stay-in-a-set game. The equilibrium strategies are not stationary but prescribe, as a function of the set of the players whose safety objective has not yet been violated, a stationary strategy profile. A natural open question raised by Secchi and Sudderth was then existence of a stationary Nash equilibrium. We give an example of a two-player game without a stationary Nash equilibrium. Our game is furthermore turn-based. By example we also illustrate the Nash equilibria obtained from the proof of Secchi and Sudderth. They rely crucially on the willingness of the second player to change strategy after already having lost. Finally we note that players do have a stationary -Nash equilibrium.

It is necessary that our example game is not deterministic. In fact, in every deterministic two-player turn-based games, where each player has a reachability or a safety objective, a Nash equilibrium exists in positional (i.e. pure and memoryless) strategies. This follows from the fact that two-player zero-sum games with a reachability and safety objective are positionally determined. Thus in the non-zero sum game it is either the case that one of the two players may guarantee a win (and relative to that we let the other player play optimally) or it is the case that both players can ensure that the opponent loses.

2 The game

The game we consider is played by two players each taking turns in choosing whether to continue the game or to attempt to quit the game, with Player 1 making the first choice. A choice of the quit action by one of the players is successful with probability , and otherwise the game continues with the other player as before. If Player 1 makes the choice to quit both players win with probability  and both players lose with the remaining probability . If Player 2 makes the choice to quit both players win with probability  and both players lose with the remaining probability . Finally, Player 2 is incentivized to choose quit by having the continue action of Player 2 lead to a loss for Player 2 with probability . Infinite play leads to Player 1 winning (and Player 2 losing with probability 1). This leads to a discontinuity in the payoff function of Player 1, which is crucial for our example.

The game is illustrated in Figure 1 and is modeled with a set of 5 states , with Player 1 controlling state 1, Player 2 controlling state 2. State  exists merely to enforce a loss to Player 2, whereas the states and are winning and losing states of both players, respectively. The game is a stay-in-a-set game with the safe sets of the two players being and , respectively. The diamond-shaped nodes in Figure 1 are used to indicate the probabilistic transitions.

A stationary strategy profile in can be described by a pair of probabilities , where is the probability that Player  chooses the quit action , when in state  (and thus is the probability that Player  chooses the continue action , when in state ).

Figure 1: The game .

2.1 No stationary Nash equilibrium

We give here a simple analysis showing that no stationary Nash equilibrium exists in . We can place all plays of in 3 groups. Group 1 are plays where Player 1 quits successfully, group 2 are plays where Player 2 quits successfully, and group 3 are plays that never reach or .

Consider a stationary strategy profile given by . When the play belongs to group 3, where Player 1 wins and Player 2 loses with probability . When or the play belongs to group 1 or group 2 with probability . The players both prefer a play from group 1 where Player 1 is the player to quit successfully.

Suppose that is a Nash equilibrium. If , then the only best reply of Player 1 is to have , since otherwise is reached with positive probability. But if also , Player 2 loses with probability 1, whereas would lead to reaching with positive probability. This rules out having in a Nash equilibrium.

Suppose now , which means that the play belongs to group  or group  with probability 1. The probability that the play belongs to group 1 strictly increases with , and it follows that we must have . But this is also not a Nash equilibrium, as Player 2 would then be better off having . Indeed, let us consider a play from state 2 until the play either returns to state 2, reaches state  before returning to state 2, or reaches state  or state  before returning to state 2. We denote these events a return, a win, or a loss.

The quit action for Player 2 has probability of a loss, probability of a win and of a return. The continue action has probability of a loss, probability of a win and of a return. Since a return is better than a loss for Player 2, this rules out in a Nash equilibrium as well.

2.2 Detailed payoff analysis

For , let be the payoff to Player  of the strategy profile when starting play in state . The payoffs satisfy the following equations

and from these follows further

When both and we have that and . When at least one of or holds, we can solve for , and likewise we can always solve for to obtain

Using , we find that

And likewise , which means that

The function is continuous in the entire domain, whereas the function has a single discontinuity when . Note that for all , and for all .

The best replies of the players are as follows. If , the only best reply of Player 1 is to have , giving . If , the only best reply of Player 1 is to have , giving . If , the only best reply for Player 2 is to have , giving . When , Player 2 has no preferred action. Finally, if the only best reply of Player 2 is to have , giving .

2.3 Nash equilibria

We give here two examples of Nash equilibria in the game following the general result of Secchi and Sudderth [16]. The idea is that once Player 2 has lost by entering state the incentive of Player 2 is removed and all strategies are equally good.

Suppose first that Player 2 commits to always playing the continue action after entering state . The best reply of Player 1 is then to always play the continue action as well, ending up with payoff 1. We may thus consider the modified game that stops when entering upon which Player 1 receives payoff 1. This lead to the modified equation

giving

which solves to

and we see that .

A Nash equilibria is thus that the players play the quit action with probabilities and respectively until state is reached and after which both players play the quit action with probability . The equilibrium payoffs are and .

Suppose next that Player 2 commits to always playing the quit action after entering state . The best reply of Player 1 is then to always play the quit action as well, ending up with payoff . The modified game now has the equation

giving

which solves to

We find that , which means that the best reply of Player 1 is always to play the quit action, and in turn the best reply of Player 2 to that is to always play the continue action.

A Nash equilibria is thus that the players play the quit action with probabilities and respectively until state is reached and after which Player 2 changes to playing the quit action with probability as well. The equilibrium payoffs are here and .

2.4 Stationary -Nash equilibrium

Whereas we have shown that the game has no stationary Nash equilibrium, it does have -Nash equilibria, for any .

When no -Nash equilibrium can have . Indeed, then the only -best reply of Player 1 would be the actual best reply having . To that, any -best reply of Player 2 must have , when .

A few examples of -Nash equilibria are given by and , given by and , and given by and . We omit the simple task of verifying that these are indeed -Nash equilibria. In the payoffs are and , and in both and the payoffs satisfy and . We note that Player 1 is playing the best reply in , but is -far from the best reply in and . Player 2 is playing -close to the best reply in and , but -far from the best reply in .

3 Conclusion and Further Problems

We have given a simple example of a two-player turn-based game with safety objectives for both players without a stationary Nash equilibrium. A remaining open question is the existence of a stationary -Nash equilibrium when players have safety objectives, even in the case of two-player turn-based games.

Several related open questions concern games with reachability objectives or with combinations of reachability and safety objectives. We first consider the setting where all players have reachability objectives, also called reach-a-set games [5]. Flesch, Thuijsman and Vrieze [8] give an example of a three-player recursive game with non-negative payoffs with no stationary -Nash equilibrium. The game is furthermore deterministic. Simon [18] gave an example of a two-player recursive game with non-negative payoffs with no stationary -Nash equilibrium. These both give examples of reach-a-set games without stationary -Nash equilibria by the general method of simulating terminal payoffs with the probabilistic law of motion. The example of Flesch, Thuijsman and Vrieze is however such that the terminal payoff vectors satisfy that either none or precisely two players receive a strictly positive payoff. The payoff vectors where two players receive strictly positive payoff can (after scaling) be constructed as unique equilibrium payoffs of win-lose bimatrix games 111These payoff vectors are and . It is easy to construct two bimatrix games with only payoffs from the set in which the unique equlibrium payofff vectors are and , respectively, which may replace and . This then results in a three-player deterministic reach-a-set game with no stationary -Nash equilibrium.

For two-player games, it was erroneously claimed (cf. [3]) first by Chatterjee et al. [5] and later again by Ummels and Wojtczak [20] that a simple adaptation of an example of a zero-sum game of Everett resulted in a deterministic reach-a-set game without a Nash equilibrium. Thus it remains an open question whether every deterministic two-player reach-a-set game has a Nash equilibrium. It is also an open problem whether every deterministic two-player reach-a-set game has a stationary -Nash equilibrium. Boros and Gurvich [1] and Kuipers et al. [10] give an example of a three-player turn-based recursive game with non-negative payoffs that has no stationary Nash equilibrium. Do every two-player turn-based reach-a-set game have a stationary Nash equilibrium?

Little is known when some players have a reachability objective and some players a safety objectives. In the two-player zero-sum case an example of Everett [6] shows that optimal strategies, and hence a Nash equilibrium, may fail to exist. On the other hand an -optimal stationary equilibrium always exists. Do every two-player game where one player has a reachability objective and one player a safety objetive always have a stationary -Nash equilibrium? In the case of turn-based games, it is an open problem whether every three-player deterministic game has a stationary Nash equilibrium. An example given by Boros et al. [2] appears to be close to answer this question. Namely, Boros et al. construct a three-player deterministic recursive game without a stationary Nash equilibrium, that may be realized with payoffs such that player two has only non-negative terminal payoffs and player one and player three have only non-positive terminal payoffs.

References

  • [1] E. Boros and V. Gurvich. On Nash-solvability in pure stationary strategies of finite games with perfect information which may have cycles. Mathematical Social Sciences, 46(2):207 – 241, 2003.
  • [2] Endre Boros, Vladimir Gurvich, Martin Milanič, Vladimir Oudalov, and Jernej Vičič. A three-person deterministic graphical game without nash equilibria. Discrete Applied Mathematics, 243:21 – 38, 2018.
  • [3] Patricia Bouyer, Nicolas Markey, and Daniel Stan. Mixed Nash Equilibria in Concurrent Terminal-Reward Games. In Venkatesh Raman and S. P. Suresh, editors, FSTTCS 2014, volume 29 of Leibniz International Proceedings in Informatics (LIPIcs), pages 351–363. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2014.
  • [4] Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic -regular games. J. Comput. Syst. Sci, 78(2):394–413, 2012.
  • [5] Krishnendu Chatterjee, Rupak Majumdar, and Marcin Jurdzinski. On Nash equilibria in stochastic games. In Jerzy Marcinkowski and Andrzej Tarlecki, editors, CSL 2004, volume 3210 of Lecture Notes in Computer Science, pages 26–40. Springer, 2004.
  • [6] H. Everett. Recursive games. In Contributions to the Theory of Games Vol. III, volume 39 of Ann. Math. Studies, pages 67–78. Princeton University Press, 1957.
  • [7] A. M. Fink. Equilibrium in a stochastic -person game. J. Sci. Hiroshima Univ. Ser. A-I Math., 28(1):89–93, 1964.
  • [8] János Flesch, Frank Thuijsman, and O. J. Vrieze. Recursive repeated games with absorbing states. Math. Oper. Res, 21(4):1016–1022, 1996.
  • [9] D. Gillette. Stochastic games with zero stop probabilities. In Contributions to the Theory of Games III, volume 39 of Ann. Math. Studies, pages 179–187. Princeton University Press, 1957.
  • [10] Jeroen Kuipers, János Flesch, Gijs Schoenmakers, and Koos Vrieze. Pure subgame-perfect equilibria in free transition games. European Journal of Operational Research, 199(2):442 – 447, 2009.
  • [11] Ashok P. Maitra and William D. Sudderth. Finitely additive stochastic games with borel measurable payoffs.

    Int. J. Game Theory

    , 27(2):257–267, 1998.
  • [12] Donald A. Martin. Borel determinacy. Annals of Mathematics, 102(2):363–371, 1975.
  • [13] Donald A. Martin. The determinacy of blackwell games. J. Symb. Log, 63(4):1565–1581, 1998.
  • [14] J.F. Mertens. Repeated games. In Proceedings of the International Congress of Mathematicians, 1986, pages 1528–1577. American Mathical Society, San Diego, 1987.
  • [15] J.F. Mertens and A. Neyman. Stochastic games. Int. J. of Game Theory, pages 53–66, 1981.
  • [16] Piercesare Secchi and William D. Sudderth. Stay-in-a-set games. Int. J. Game Theory, 30(4):479–490, 2002.
  • [17] L.S. Shapley. Stochastic games. Proc. Natl. Acad. Sci. U. S. A., 39:1095–1100, 1953.
  • [18] Robert Samuel Simon. Value and perfection in stochastic games. Israel Journal of Mathematics, 156(1):285–309, 2006.
  • [19] Masayuki Takahashi. Equilibrium points of stochastic non-cooperative -person games. J. Sci. Hiroshima Univ. Ser. A-I Math., 28(1):95–99, 1964.
  • [20] Michael Ummels and Dominik Wojtczak. The complexity of nash equilibria in limit-average games. In Joost-Pieter Katoen and Barbara König, editors, CONCUR 2011, pages 482–496. Springer Berlin Heidelberg, 2011.
  • [21] Nicolas Vieille. Two-player stochastic games i: A reduction. Israel Journal of Mathematics, 119(1):55–91, 2000.
  • [22] Nicolas Vieille. Two-player stochastic games ii: The case of recursive games. Israel Journal of Mathematics, 119(1):93–126, 2000.