 # Zero-determinant strategies in repeated incomplete-information games: Consistency of payoff relations

We derive zero-determinant strategies for general multi-player multi-action repeated incomplete-information games. By formulating zero-determinant strategy in terms of linear algebra, we prove that linear payoff relations assigned by players always have solutions. An example for zero-determinant strategy in a repeated incomplete-information game is also provided.

Comments

There are no comments yet.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Setup

We consider an -player multi-action repeated game, in which player has possible actions, where is a positive integer. Let denote a state of the game, which is the combination of the actions taken by the players. Let be the size of the state space . We assume that player decides the next action stochastically according to her own previous action and common information

with the conditional probability

, where is some set. We also define the conditional probability that common information arises when actions of players in the preceding round are by

. Then the sequence of states of the repeated game forms a Markov chain

 P(σ,t+1)=∑σ′T(σ|σ′)P(σ′,t) (1)

with the transition probability

 T(σ|σ′)≡∑τW(τ|σ′)N∏n=1^Tn(σn|σ′n,τ), (2)

where denotes the state distribution at time . When and , the above formulation reduces to that of complete-information games. Otherwise, it represents a certain type of incomplete-information games, where players cannot directly observe actions of other players. The model treated here can therefore be regarded as an extension of repeated games with complete information to those with incomplete information, and the extension includes the former as a special case.

For each state , a payoff of player is defined as . Let be the

-dimensional vector representing the payoffs of player

, which we call the payoff vector of player . It should be noted that in the following analysis we do not assume the payoffs to be symmetric, unless otherwise stated.

We remark on discounting. In standard repeated games, discounting of future payoffs is considered by introducing a discounting factor  (1). In the original work on ZD strategy by Press and Dyson, only the case without discounting (i.e., ) was investigated (6). After their work, ZD strategy was extended to case (20, 15, 21). In this paper, we consider only the non-discounting case .

## Results

### Zero-determinant strategies

In what follows, we assume that the Markov chain defined via the transition probabilities has a unique stationary distribution, and let denote the stationary distribution. It satisfies

 P(s)(σ)=∑σ′T(σ|σ′)P(s)(σ′). (3)

Taking summation of both sides of (3) with respect to with an arbitrary , we obtain

 0=∑σ′[Tn(σn|σ′)−δσn,σ′n]P(s)(σ′), (4)

where we have defined

 Tn(σn|σ′)≡∑τW(τ|σ′)^Tn(σn|σ′n,τ). (5)

Regarding as representing the strategy “Repeat”, where player repeats the previous action with probability one, one can readily see that (4) is an extension of Akin’s lemma (22, 12, 23, 15), relating a player’s strategy with the stationary distribution, to the multi-player multi-action incomplete-information case. Letting

 ~Tn(σn|σ′)≡Tn(σn|σ′)−δσn,σ′n, (6)

(4) means that the average of with respect to the stationary distribution is zero for any and . We remark that , and thus as well, are solely under control of player . Because of the normalization condition , the relation

 Mn∑σn=1~Tn(σn|σ′)=0 (7)

holds.

Let , which we call the strategy vector of player associated with action . (Another name for is the Press-Dyson vector (22).) A strategy of player is represented as an matrix composed of the strategy vectors for her actions . For a matrix , let be the subspace spanned by the column vectors of . Let and denote the -dimensional zero vector and the -dimensional vector of all ones, respectively. From (7), one has

 Tn1Mn=Mn∑σn=1~Tn(σn)=0M (8)

for any player , implying that the dimension of is at most .

Let be the vector representation of the stationary distribution . When player chooses a strategy , for any vector , one has due to (4). In other words, the expectation of with respect to the stationary distribution vanishes.

Let and . The following definition is an extension of the notion of the ZD strategy (6, 22) to multi-player multi-action incomplete-information games.

###### Definition 1.

A zero-determinant (ZD) strategy is defined as a strategy for which holds.

To see that this is indeed an extended definition of the ZD strategy, note that any vector is represented as , where is the coefficient vector. Let be the vector with element equal to the expected payoff of player in the steady state. When player employs a ZD strategy, it amounts to enforcing linear relations on with satisfying .

### Consistency

A question naturally arises: When more than one of the players employ ZD strategies, are they consistent? Let be the set of players who employ ZD strategies. The set consists of all combinations of the expected payoffs that satisfy the enforced linear relations by the players in . If is empty, then it implies that the set of ZD strategies is inconsistent in the sense that there is no valid solution of the linear relations enforced by the players.

###### Definition 2.

ZD strategies are said to be consistent when is not empty.

In the multi-player setting, one may regard as a variant of a ZD strategy alliance (12), where the players in agree to coordinate on the linear relations to be enforced on the expected payoffs. The above question then amounts to asking whether it is possible for a player to serve as a counteracting agent who participates in the ZD strategy alliance with a hidden intention to invalidate it by adopting a ZD strategy that is inconsistent with others.

The following proposition is the first main result of this paper, whose proof is given in Appendices.

###### Proposition 1.

Any set of ZD strategies is consistent.

Proposition 1 states that it is impossible for any player to serve as a counteracting agent to invalidate ZD strategy alliances. This statement is quite general in that it applies to any instance of repeated games covered by our formulation.

In Ref. (16), it was shown that every player can have at most one master player, who can play an equalizer strategy on the given player (that is, controlling the expected payoff of the given player), in multi-player multi-action games. Indeed, our general result on the absence of inconsistent ZD strategies (Proposition 1) immediately implies that more than one ZD players cannot simultaneously control the expected payoff of a player to different values. Therefore, our result generalizes their result on equalizer strategy to arbitrary ZD strategies.

Since the dimension of is at most , depending on , it should be possible for player with to adopt a ZD strategy for which holds. The dimension of corresponds to the number of independent linear relations to be enforced on the expected payoffs of the players, so that it implies that one player may be able to enforce multiple independent linear relations. On the other hand, our result on the absence of inconsistent ZD strategies implies that for any set of ZD players the dimension of should be at most , the number of players, since any set of ZD strategies should contain at most independent linear relations if it is consistent. This in turn implies that if the dimension of is equal to for a subset of players then players not in cannot employ independent ZD strategy any more.

### Independence

Another naturally-arising question would be regarding independence for a set of ZD strategies, which we define as follows:

###### Definition 3.

A set of ZD strategies is independent if any set of non-zero vectors in is linearly independent. Otherwise, is said to be dependent.

If a set of ZD strategies is dependent, then there exists a ZD player whose ZD strategy adds no linear constraints other than those already imposed by other ZD players. One of the simplest example of a dependent set of ZD strategies is the case where two players enforce exactly the same linear relation to the expected payoffs. Our second main result is to show that any set of ZD strategies is independent under a general condition.

###### Proposition 2.

Let be a subset of players. Assume that does not have zero elements for any and any . Then, any set of ZD strategies of players in is independent.

See Appendices for the proof.

It should be noted that when has zero elements then one might have dependent ZD strategies. A simple example can be found in a two-player two-action complete-information (iterated prisoner’s dilemma) game: Let the payoff vectors and for players 1 and 2 be and , with . If player 1 adopts the strategy

 ~T1(1)=⎛⎜ ⎜ ⎜⎝0−110⎞⎟ ⎟ ⎟⎠=1T−Ss1−1T−Ss2, (9)

then it enforces the linear payoff relation . This strategy is a well-known tit-for-tat strategy (6). By symmetry, player 2 can also adopt the same strategy , implying that these two strategies are indeed dependent.

### Simultaneous multiple linear relations by one player

As mentioned above, when the number of possible actions for player is more than two, player may be able to employ a ZD strategy with to simultaneously enforce more than one linear relations. Such a possibility has never been reported in the context of ZD strategies. Here, we provide a simple example of such a situation in a two-player three-action symmetric game.

We consider the symmetric game

 s1 =(0,r1,0,r2,0,0,0,0,0)T s2 =(0,r2,0,r1,0,0,0,0,0)T. (10)

We remark that , , and are linearly independent when . We choose strategies of player as

 T1(1) =(1,1−p,1,p′,0,0,0,0,0)T T1(2) =(0,q,0,1−q′,1,1,0,0,0)T T1(3) =(0,p−q,0,q′−p′,0,0,1,1,1)T (11)

with , , , , , and . Then we obtain

 q′r1+qr2p′q−pq′~T1(1)+p′r1+pr2p′q−pq′~T1(2) =s1 (12) q′r2+qr1p′q−pq′~T1(1)+p′r2+pr1p′q−pq′~T1(2) =s2. (13)

Therefore, player can simultaneously control average payoffs of both players, and , as . Note that with is an absorbing state regardless of the strategy of player in this case.

In general, when one player simultaneously enforces two linear relations in two-player multi-action symmetric games, only is allowed with some . This is explained as follows: Assume that player can simultaneously enforce and with by one ZD strategy. Because the game is symmetric, player can also simultaneously enforce and independently by one ZD strategy. This contradicts the consistency of ZD strategies (Proposition 1). Therefore, the only possibility is .

The above argument can be extended straightforwardly to the multi-player case. For that purpose, we introduce some notions of symmetric multi-player games. The following definition of a symmetric multi-player game is due to von Neumann and Morgenstern (24, Section 28).

###### Definition 4.

A game is symmetric with respect to a permutation on if holds for any and if preserves the payoff structure of the game, that is,

 sπ(n)(σ)=sn(σπ) (14)

holds for any and for any , where .

The following definition is due to (25).

###### Definition 5.

A game is weakly symmetric if for any pair of players and there exists some permutation on satisfying such that the game is symmetric with respect to .

Consider an -player weakly symmetric game. Assume that one player simultaneously enforces independent linear relations on the average payoffs of players via adopting an -dimensional ZD strategy111Note that for this to be possible the number of actions should satisfy .. Then, the average payoffs should be simultaneously controlled, but they should satisfy due to the consistency of ZD strategies.

The difficulty of construction of a ZD strategy of one player with dimension in weakly symmetric -player games can be seen in the following two propositions, whose proofs are given in Appendices.

###### Proposition 3.

In a weakly symmetric -player game, if the strategy vectors of one player contain no zero element, then a ZD strategy of the player with dimension is impossible.

###### Proposition 4.

In a weakly symmetric -player game, if payoffs of player are different from each other for all , then a ZD strategy with dimension is impossible.

## Discussion

In this paper, we have derived ZD strategies for general multi-player multi-action incomplete-information games, in which players cannot observe actions of other players. By formulating ZD strategy in terms of linear algebra, we have proved that linear payoff relations enforced by ZD players are consistent. Furthermore, we have proved that linear payoff relations enforced by players with ZD strategies are independent under a general condition. We emphasize that these results hold not only for incomplete-information games but also for complete-information games. We have also provided a simple example in which one player can simultaneously enforce more than one linear constraints on the expected payoffs. These results elucidate constraints on ZD strategies in terms of linear algebra.

Although we have discussed mathematical properties of ZD strategies if exist, we do not know the criterion for whether ZD strategies exist or not when a game is given. For example, we can easily show that ZD strategy does not exist for the rock-paper-scissors game, which is the simplest two-player three-action symmetric zero-sum game. Specifying a general criterion for the existence of ZD strategies is an important future problem.

Another remark is related to memory of strategies. In this work, we considered only memory-one strategies. In Ref. (6), it has been proved that a player with longer memory does not have advantage over a player with short memory in terms of average payoff in two-player games. In Ref. (13, 16), it has been shown that this statement also holds for multi-player games. Therefore, considering only memory-one strategies should be sufficient even in our incomplete-information situation.

We remark on the effect of incomplete information. In complete information case, the strategy vectors are arbitrary as long as they satisfy the conditions for probability distributions. In contrast, in incomplete information case, forms of the strategy vectors are constrained by (

5). Therefore, the space of ZD strategies for incomplete information games is generally smaller than that for complete information games. In SI, we provide an example of equalizer strategy in a simple incomplete information game.

## Appendices

### Proof of Proposition 1

We first prove a lemma.

Let . Then .

###### Proof.

Assume to the contrary that with . Taking the inner product of with the stationary distribution , one has since is represented as a linear combination of the strategy vectors and since the inner product of a strategy vector and the stationary distribution is zero. On the other hand, holds because of the normalization of the stationary distribution. Therefore we obtain , leading to contradiction. ∎

We return to the proof of Proposition 1. For any set of ZD strategies, let be the dimension of , and let be a basis of . The expected payoff vector should be given by a non-zero solution of the linear equation in , where we define , , and as

 A=(bT¯A)≡(α1,α2,⋯,αK). (15)

One has

 SA=(u1,u2,⋯,uK)=1MbT+¯S¯A, (16)

where .

The Rouché-Capelli theorem (26) tells us that is a necessary and sufficient condition for the linear equation in to have a solution, that is, for to be consistent. An equivalent expression of this condition is that there is no vector such that and hold. Assume to the contrary that there exist such that and hold. One would then have

 SAc=1MbTc+¯S¯Ac=(bTc)1M. (17)

On the other hand, is a linear combination of , so that Lemma 1 states that it should be zero if it is proportional to , leading to contradiction.

### Proof of Proposition 2

We first show the following lemma.

###### Lemma 2.

Let be a subset of players. Assume that does not have zero elements for any and any . For , let be an arbitrary non-zero vector in . Then are linearly independent.

###### Proof.

We assume to the contrary that are linearly dependent. Then there is a set of coefficients with which holds. Without loss of generality we assume for .

Since , it is expressed as with a non-zero vector . Let , where ties may be broken arbitrarily, and . With (8), one obtains

 vn=Tn(cn−~cn1Mn), (18)

and thus

 anvn(σ′)=Mn∑σn=1an(cn,σn−~cn)~Tn(σn|σ′). (19)

We show that the inequality

 an(cn,σn−~cn)~Tn(σn|σ′)≥0 (20)

holds for any , any , and any satisfying . We first note that for any strategy vector with action , one has, from (6),

 ~Tn(σn|σ′){≤0,σ′n=σn,≥0,σ′n≠σn. (21)

Fix any satisfying

for a moment. Then, for

one has by definition, making the left-hand side of (20) equal to zero. For , on the other hand, one has by definition. Also, since , from (21) one has . These imply that the inequality (20) holds for . Putting the above arguments together, we have shown that the inequality (20) holds for any , any , and any satisfying .

Fix any satisfying for all . The above argument has shown that the inequality Eq. (20) holds for any and any . On the other hand, at the beginning of the proof we have assumed that

 ∑n∈N′anvn(σ′)=∑n∈N′Mn∑σn=1an(cn,σn−~cn)~Tn(σn|σ′)=0 (22)

holds, implying that the summand is equal to zero for any and any . By assumption, and , so that one has , and consequently, , leading to contradiction. ∎

The proof of Proposition 2 is immediate by taking as belonging to in Lemma 2.

### Proof of Proposition 3

We first show the following lemma.

###### Lemma 3.

Consider an -player game which is symmetric with respect to a permutation on . Assume that the column vectors of are linearly independent. For any pair of players and satisfying , if the strategy vectors of these players contain no zero element, then it is impossible for these players to adopt ZD strategies with which player enforces linear relation with , and where player enforces , where .

###### Proof.

We assume to the contrary that there exists satisfying the properties stated in Lemma 3. By assumption, and . There then exist and satisfying and . One has

 (Sαπ)(σ′π) =α0+N∑n=1απ(n)sn(σ′π) =α0+N∑n=1απ(n)sπ(n)(σ′)=(Sα)(σ′), (23)

where the second equality is due to the assumed symmetry of the game with respect to . Letting , , and , one has

 (T¯n,π¯c¯n)(σ′) =(T¯n¯c¯n)(σ′π)=(Sαπ)(σ′π) =(Sα)(σ′)=(Tncn)(σ′), (24)

implying that holds. Let .

Let and , where ties may be broken arbitrarily, and and . One then has

 v=Tn(cn−cn,max1Mn)=T¯n,π(¯c¯n−¯c¯n,min1M¯n). (25)

Recalling that we have assumed , let be an arbitrary state satisfying and . Then, in view of (21), one has

 v(σ′) =Mn∑σn=1(cn,σn−cn,max)~Tn(σn|σ′)≤0, =M¯n∑σ¯n=1(¯c¯n,σ¯n−¯c¯n,min)~T¯n(σ¯n|σ′π)≥0, (26)

implying that holds. Since for all , they are all equal to zero. Since is assumed non-zero, one has for all and consequently . One similarly has . Therefore, from (8) one has . Due to the assumption of linear independence of the columns of , it in turn implies that holds, leading to contradiction. ∎

It should be noted that Lemma 3 holds even if one takes , in which case the Lemma implies that, if the game is symmetric with respect to , player with cannot enforce linear relations simultaneously. It should also be noted that Lemma 3 furthermore implies that it is impossible for that player to enforce a linear relation satisfying . In other words, in a symmetric game no player to whom the game is symmetric can enforce a linear relation with the same symmetry as the game itself.

Proposition 3 is a direct consequence of Lemma 3 in weakly symmetric multi-player games.

### Proof of Proposition 4

Without loss of generality, we assume that player takes an -dimensional ZD strategy determining the average payoffs for . Due to the above discussion, only is allowed. Letting for , one can take as a basis of the -dimensional ZD strategy. Let be defined as

 Tkc(n)=Sα(n)=sn−C1M,n∈{1,…,N}. (27)

By the assumption of weak symmetry, for any player , there exists a permutation satisfying such that the game is symmetric with respect to . Noting that , from (Proof of Proposition 3) one has

 (Tkc(k))(σ′π)=(Tkc(n))(σ′). (28)

For , define and , where ties may be broken arbitrarily provided that holds, and and . From (7), one has

 Tkc(n) =Tk(c(n)−c(n)max1Mk) =Tk(c(n)−c(n)min1Mk). (29)

Then, from (28) and (21), we obtain for an arbitrary satisfying and

 sn(σ∗)−C =(Tk(c(n)−c(n)max1Mk))(σ∗)≤0 (30) =(Tk(c(k)−c(k)min1Mk))(σ∗π)≥0 (31)

implying . On the other hand, we also obtain for an arbitrary satisfying and

 sn(σ∗∗)−C =(Tk(c(n)−c(n)min1Mk))(σ∗∗)≥0 (32) =(Tk(c(k)−c(k)max1Mk))(σ∗∗π)≤0 (33)

implying . Then, because we have assumed that all elements of the payoff vector are different from each other, we have arrived at a contradiction.

We thank Ryosuke Kobayashi for valuable discussions. This study was supported by JSPS KAKENHI Grant Numbers JP18H06476.

## References

• (1) Fudenberg D, Tirole J (1991) Game Theory. (MIT Press, Massachusetts).
• (2) Smith JM, Price GR (1973) The logic of animal conflict. Nature 246(5427):15.
• (3) Nowak MA (2006) Five rules for the evolution of cooperation. Science 314(5805):1560–1563.
• (4) Axelrod R, Hamilton WD (1981) The evolution of cooperation. Science 211(4489):1390–1396.
• (5) Axelrod R (1984) The Evolution of Cooperation. (Basic Books, New York).
• (6) Press WH, Dyson FJ (2012) Iterated prisoner’s dilemma contains strategies that dominate any evolutionary opponent. Proceedings of the National Academy of Sciences 109(26):10409–10413.
• (7) Hilbe C, Nowak MA, Sigmund K (2013) Evolution of extortion in iterated prisoner’s dilemma games. Proceedings of the National Academy of Sciences 110(17):6913–6918.
• (8) Adami C, Hintze A (2013) Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything. Nature Communications 4:2193.
• (9) Stewart AJ, Plotkin JB (2013) From extortion to generosity, evolution in the iterated prisoner’s dilemma. Proceedings of the National Academy of Sciences 110(38):15348–15353.
• (10) Hilbe C, Nowak MA, Traulsen A (2013) Adaptive dynamics of extortion and compliance. PLOS ONE 8(11):1–9.
• (11) Stewart AJ, Plotkin JB (2012) Extortion and cooperation in the prisoner’s dilemma. Proceedings of the National Academy of Sciences 109(26):10134–10135.
• (12) Hilbe C, Wu B, Traulsen A, Nowak MA (2014) Cooperation and control in multiplayer social dilemmas. Proceedings of the National Academy of Sciences 111(46):16425–16430.
• (13) Pan L, Hao D, Rong Z, Zhou T (2015) Zero-determinant strategies in iterated public goods game. Scientific Reports 5:13096.
• (14) Guo JL (2014) Zero-determinant strategies in iterated multi-strategy games. ArXiv e-prints.
• (15) McAvoy A, Hauert C (2016) Autocratic strategies for iterated games with arbitrary action spaces. Proceedings of the National Academy of Sciences 113(13):3573–3578.
• (16) He X, Dai H, Ning P, Dutta R (2016) Zero-determinant strategies for multi-player multi-action iterated games. IEEE Signal Processing Letters 23(3):311–315.
• (17) Hao D, Rong Z, Zhou T (2015) Extortion under uncertainty: Zero-determinant strategies in noisy games. Phys. Rev. E 91(5):052803.
• (18) Daoud AA, Kesidis G, Liebeherr J (2014) Zero-determinant strategies: A game-theoretic approach for sharing licensed spectrum bands. IEEE Journal on Selected Areas in Communications 32(11):2297–2308.
• (19) Zhang H, Niyato D, Song L, Jiang T, Han Z (2016) Zero-determinant strategy for resource sharing in wireless cooperations. IEEE Transactions on Wireless Communications 15(3):2179–2192.
• (20) Hilbe C, Traulsen A, Sigmund K (2015) Partners or rivals? strategies for the iterated prisoner’s dilemma. Games and Economic Behavior 92:41–52.
• (21) Ichinose G, Masuda N (2018) Zero-determinant strategies in finitely repeated games. Journal of Theoretical Biology 438:61–77.
• (22) Akin E (2012) The iterated prisoner’s dilemma: Good strategies and their dynamics. ArXiv e-prints.
• (23) Akin E (2015) What you gotta know to play good in the iterated prisoner’s dilemma. Games 6(3):175–190.
• (24) von Neumann J, Morgensternx O (1953) Theory of Games and Economic Behavior. (Princeton University Press), 3rd edition.
• (25) Plan A (2017) Symmetric -player games. preprint.
• (26) Shafarevich IR, Remizov AO (2012) Linear Algebra and Geometry. (Springer, New York).
• (27) Kobayashi R (2018) Master’s thesis (Kyoto University).

## Supplemental Information

### ZD strategy in incomplete information game

As an example of ZD strategy for a repeated incomplete-information game, we consider a two-player two-action symmetric game (27). We assume and the probability is given by

 W(1|1,1)