Zero-Determinant strategies in finitely repeated n-player games

10/17/2019 ∙ by Alain Govaert, et al. ∙ 0

In two-player repeated games, Zero-Determinant (ZD) strategies are a class of strategies that can enable a player to unilaterally enforce a linear payoff relation between her own and her opponent's payoff irrespective of the opponent's strategy. This manipulative nature of the ZD strategies attracted significant attention from researchers due to its close connection to controlling distributively the outcome of evolutionary games in large populations. In this paper, we study the existence of ZD strategies in repeated n-player games with a finite but undetermined time horizon. Necessary and sufficient conditions are derived for a linear relation to be enforceable by a ZD strategist in n-player social dilemmas, in which the expected number of rounds can be captured by a fixed and common discount factor (0<δ<1). Thresholds exist for such a discount factor above which generous, extortionate and equalizer payoff relations can be enforced. For the first time in the studies of repeated games, ZD strategies are examined in the setting of finitely repeated n-player, two-action games. Our results show that depending on the group size and the ZD-strategist's initial probability to cooperate, for finitely repeated n-player social dilemmas, it is possible for extortionate, generous and equalizer ZD-strategies to exist. The threshold discount factors rely on the slope and baseline payoff of the desired linear relation and the variation in the "one-shot" payoffs of the n-player game. To show the utility of our general results, we apply them to a linear n-player public goods game.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The functionalities of many complex social systems rely on their composing individuals’ willingness to set aside their personal interest for the benefit of the greater good [12]. One mechanism for the evolution of cooperation is known as direct reciprocity: even if in the short run it pays off to be selfish, mutual cooperation can be favoured when the individuals encounter each other repeatedly. Direct reciprocity is often studied in the standard model of repeated games and it is only recently, inspired by the discovery of a novel class of strategies, called zero-determinant (ZD) strategies [14], that repeated games began to be examined from a new angle by investigating the level of control that a single player can exert on the average payoffs of its opponent. In [14] Press and Dyson showed that in infinitely repeated prisoners dilemma games, if a player can remember the actions in the previous round, this player can unilaterally impose some linear relation between its own expected payoff and that of its opponent. It is emphasized that this enforced linear relation cannot be avoided even if the opponent employs some intricate strategy with a large memories. Such strategies are called zero-determinant because they enforce a part of the transition matrix to have a determinant that is equal to zero. Later, ZD strategies were extended to games with more than two possible actions [15], continuous action spaces [11], and alternative moves [10]. The success of ZD strategies in an evolutionary setting was examined in [16, 3]. For a given population size, in the limit of weak selection it was shown in [17] that all ZD strategies that can survive an invasion of any memory-one strategy must be “generous”, namely enforcing a linear payoff relation that favors others. This surprising fact was tested experimentally in [4]. Most of the literature focuses on two-player games; however, in [13] the existence of ZD-startegies in infinitely repeated public goods games was shown by extending the arguments in [14] to a symmetric public goods game. Around the same time, characterization of the feasible ZD strategies in multiplayer social dilemmas and those strategies that maintain cooperation in such -player games were reported in [6]. Both in [13] and [6] it was noted that group size imposes restrictive conditions on the set of feasible ZD strategies and that alliances between co-players can overcome this restrictive effect of the group size. The evolutionary success of ZD strategies in such -player games was studied in [7] and the results show that sustaining large scale cooperation requires the formation of alliances. ZD strategies for finitely repeated games with discounted payoffs were defined and characterized in [5]. The threshold discount factors above which the ZD strategy can exist were derived in [8]. In this paper we use the framework of ZD strategies in infinitely repeated multiplayer social dilemmas from [6] and extend it to the finitely repeated case in which future payoffs are discounted. We build upon our results in [2], in which enforceable payoff relations were characterized, by developing new theory that allows us to express threshold discount factors that determine how fast a strategic player can enforce a desired linear payoff relation. These general results are applicable to multiplayer and two player games and can be applied to a variety of complex social dilemma settings including the famous prisoner’s dilemma, the public goods game, the volunteer’s dilemma, the -player snowdrift game and much more. These additional results can be also be used to determine ones possibilities for exerting control given a constraint on the expected number of interactions, and thus provide novel insights for one’s level of influence in real-world repeated interactions. The results in this paper can be used to investigate, both analytically and experimentally, the effect of the group size and the initial condition on the level of control that a single player can exert in finitely repeated -player social dilemma games. Thus, our results may open the door for novel control techniques that seek to achieve or sustain cooperation in large social systems that evolve under evolutionary forces. The paper is organized as follows. In section II, preliminaries are given concerning the necessary notations and the considered game model with underlying assumptions. Furthermore, memory-one strategies are formally defined. In section III, the mean distribution of the finitely repeated -player game and the relation to the memory-one strategy is given. In section IV, ZD strategies for finitely repeated -player games are defined, and in section V the enforceable payoff relations are characterized. In section VI, threshold discount factors are given for generous, extortionate and equalizer ZD strategies. In Section VII, we provide the proofs of our main results. We apply our results to the -player linear public goods game in Section VIII, and conclude the paper in Section IX.

Ii Preliminaries

Ii-a Notations

For some vector

we denote its element by . To emphasize is obtained by stacking its elements we sometimes write For a pair of vectors , is the dot product. We denote the column vector of all ones by . Likewise, is the column vector of all zeros. When the dimensions are disregarded, and are always column vectors. We denote the -ary Cartesian product over the sets by .

Ii-B Symmetric -player games

In this paper we consider -player games in which players can repeatedly choose to either cooperate or defect. The set of actions for each player is denoted by . The actions chosen in the group in round of the repeated game is described by an action profile . A player’s payoff in a given round depends on the player’s own action and the actions of the co-players. In a group in which co-players cooperate, a cooperator receives payoff , whereas a defector receives . As in [6, 13] we assume the game is symmetric, such that the outcome of the game depends only on one’s own decision and the number of cooperating co-players, and hence does not depend on which of the co-players have cooperated. Accordingly, the payoffs of all possible outcomes for a player can be conveniently summarized in table I.

Number of cooperators among co-players
Cooperator’s payoff
Defector’s payoff
TABLE I: Payoffs of the symmetric -player games. A player’s payoff depends on its own decision and the number of co-players who cooperate.

We have the following assumptions on the payoffs of the symmetric -player game.

Assumption 1 (Social dilemma assumption [6, 9]).

The payoffs of the symmetric -player game satisfy the following conditions: For all , it holds that and . For all , it holds that . .

Assumption 1 is standard in -player social dilemma games and it ensures that there is a conflict between the interest of each individual and that of the group as a whole. Thus, those games whose payoffs satisfy Assumption 1 can model a social dilemma that results from selfish behaviors in a group. Consider the following examples.

Example 1.

As an example of a game that satisfies Assumption 1, consider a public goods game in which each cooperator contributes an amount to a public good. The sum of the contributions get multiplied by an enhancement factor and then divided evenly among all group members. This results in the following payoffs:

Example 2 (-player stag hunt game).

In the public goods of Example 1, a single player can create a benefit. In some other social dilemma games only a group of cooperators can create a benefit. For example, in the -player stag hunt game, players obtain the benefit if and only if all players cooperate [18]. This results in the following payoffs,

Ii-C Strategies

In repeated games the players must choose how to update their actions as the game interactions are repeated over rounds. A strategy of a player determines the conditional probabilities with which actions are chosen by the player. To formalize this concept we introduce some additional notation. A history of plays up to round is denoted by such that each for all . The union of possible histories is denoted by , with being the empty set. Finally, let

denote the probability distribution over the action set

. As is standard in the theory of repeated games, a strategy of player is then defined by a function that maps the history of play to the probability distribution over the action set. An interesting and important subclass of strategies are those that only take into account the action profile in round , (i.e. ) to determine the conditional probabilities to choose some action in round . Correspondingly these strategies are called memory-one strategies and are formally defined as follows.

Definition 1 (Memory-one strategy, [5]).

A strategy is a memory-one strategy if for all histories and with and

The theory of Press and Dyson showed that, for determining the best performing strategies in terms of expected payoffs in two-action repeated games, it is sufficient to consider only the space of memory-one strategies [14, 15].

Iii Mean distributions of memory-one strategies in finitely repeated -player games

In this section we zoom in on a particular player that employs a memory-one strategy in the -player game and refer to this player as the key player. In particular, we focus on the relation between the mean distribution of action profiles and the memory-one strategy of the key player. Let denote the probability that the key player cooperates in round given that, in round , the player plays and of the co-players cooperate. By stacking these probabilities for all possible outcomes into a vector, we obtain the memory-one strategy that determines the probability for the key player to cooperate in round :

where we have used the convention to order the conditional probabilities based on the key player’s decision and a descending number of cooperating co-players. Accordingly, the memory-one strategy gives the probability to cooperate when the current action is simply repeated. That is, when the key player cooperates in round , by employing she will cooperate in round with probability one. Let denote the probability that the outcome of round is , with . And let be the vector of outcome probabilities in round . As in [5, 8, 11, 10] we focus on finitely repeated games. The finite number of rounds is determined by a fixed and common discount factor that, given the current round, determines the probability that a next round takes place. By taking the limit of the geometric sum of , the expected number of rounds is . As in [5], the mean distribution of is:

(1)

In this paper we are interested in the average discounted payoffs of the finitely repeated -player game. Let denote the payoff in a given round that player receives by choosing and of its co-players cooperated. By stacking the possible payoffs we obtain the vector that contains all possible payoffs in a given round of player . The expected “one-shot” payoff of player in round is And the average discounted payoff in the finitely repeated game for player is then:

The following lemma relates the limit distribution of the finitely repeated game to the memory-one strategy of the key player. The presented lemma is a straightforward -player extension of the 2-player case that is given in [5] and relies on the fundamental results from [1].

Lemma 1 (Limit distribution).

Suppose the key player applies memory-one strategy and the strategies of the other players are arbitrary, but fixed. For the finitely repeated -player game, it holds that

where is the key player’s initial probability to cooperate.

Proof:

The probability that cooperated in round is And the probability that cooperates in round is Now define,

(2)

Multiplying equation (2) by and summing up over we obtain

Because , it follows that

And by the definition of in equation (1):

By substituting back into the equation we obtain

This completes the proof. ∎

Remark 1.

Note that in the limit , the infinitely repeated game is recovered. In this setting, the expected number of rounds is infinite. And, if the limit exists, the average payoffs are given by

By Akins Lemma (see [1, 6]), for the infinitely repeated game irrespective of the initial probability to cooperate, it holds that

Hence, a key difference between the infinitely repeated and finitely repeated game is that is important for the relation between the memory-one strategy and the mean distribution when the game is repeated a finite number of expected rounds. When the game is infinitely repeated, i.e. , the importance of the initial conditions on the relation between and disappears [6].

Iv ZD-strategies in finitely repeated -player games

Based on Lemma 1 we now formally define a ZD strategy for a finitely repeated -player game.

Definition 2.

A memory-one strategy is a ZD-strategy for an -player game if there exist constants , with such that

(3)

The following proposition shows how the ZD strategy can enforce a linear relation between the key players expected payoff and that of her co-players.

Proposition 1.

Suppose the key player employs a fixed ZD strategy with parameters , and as in definition 2. Then, irrespective of the fixed strategies of the remaining co-players, the payoffs obey the equation

(4)
Proof:
(5)

To be consistent with the earlier work on ZD startegies in infinitely repeated -player games in [6], we introduce the parameter transformations:

Using these parameter transformations, equation (3) can be written as

(6)

under the conditions that , and . Moreover, the linear payoff relation in equation (4) becomes

where . The four most widely studied ZD strategies are given in Table II.

ZD-Strategy Parameter values Enforced payoff relation
Fair
Generous ,
Extortionate ,
Equalizer
TABLE II: The four mostly studied ZD strategies. Depending on the parameter values and , players may be fair, generous, extortionate or equalizers.

Because the entries of the ZD-strategy correspond to conditional probabilities, they are required to belong to the unit interval. Hence, not every linear payoff relation with parameters is valid. Let denote the vector of weigths that the ZD strategist assigns to her co-players. Consider the following definition that was given in [5] for two-player games.

Definition 3 (Enforceable payoff relations).

Given a discount factor , a payoff relation with weights is enforceable if there are and , such that each entry in according to equation (3) is in . We indicate the set of enforceable payoff relations by .

An intuitive implication of decreasing the expected number of rounds in the repeated game (e.g. by decreasing ) is that the set of enforceable payoff relations will decrease as well. This monotone effect is formalized in the following proposition that extends a result from [5] to the -player case.

Proposition 2 (Monotonicity of ).

If , then

Proof:

Albeit with different formulations of the , the proof follows from the same argument used in the the two-player case [6]. It is presented here to make the paper self-contained. From Definition 3, if and only if one can find and such that . We have

(7)

Then by substituting (3) into the above inequality we obtain,

(8)

with

Now observe that on the left-hand side of the inequality (8) is decreasing for increasing . Moreover, on the right-hand side of the inequality is increasing for increasing . The middle part of the inequality, which is exactly the definition of a ZD strategy for the infinitely repeated game in [6], is independent of . It follows that by increasing the range of possible ZD parameters and increases and hence if is satisfied for some , then it is also satisfied for some . ∎

Now we are ready to state the existence problem studied in this paper.

Problem 1 (The existence problem).

For the class of -player games with payoffs as in Table I that satisfy Assumption 1, what are the enforceable payoff relations when the expected number of rounds is finite, i.e., ?

V Existence of ZD strategies

In this section, we present our main results on the existence problem. The proofs of these results are found in Section VII. We begin by formulating conditions on the parameters of the ZD strategy that are necessary for the payoff relation to be enforceable in the finitely repeated -player game.

Proposition 3.

The enforceable payoff relations for the finitely repeated -player game with , with payoffs as in table I that satisfy Assumption 1, require the following necessary conditions:

(9)

with at least one strict inequality in (9).

Because fair strategies are defined with the slope (see, Table II), an immediate consequence of Proposition 3 is stated in the following corollary.

Corollary 1.

For the finitely repeated -player social dilemma game with payoffs that satisfy Assumption 1 there do not exist fair ZD strategies.

In the following theorem we extend the results for infinitely repeated -player games from [6] to finitely repeated games. To write the statement compactly, we let . Moreover, let denote the sum of the smallest weights and let .

Theorem 1.

For the finitely repeated -player game with payoffs as in Table I that satisfy Assumption 1, the payoff relation with weights is enforceable if and only if and

(10)

moreover, at least one inequality in (10) is strict.

Remark 2.

For the full weight is placed on the single opponent i.e., . When the payoff parameters are defined as , , , , the result in Theorem 1 recovers the earlier result obtained for the finitely repeated 2-player game in [5].

Theorem 1 does not stipulate any conditions on the key player’s initial probability to cooperate other than . However, the existence of extortionate and generous strategies does depend on the value of . This is formalized in the following proposition.

Proposition 4.

For the existence of extortionate strategies it is necessary that . Moreover, for the existence of generous ZD strategies it is necessary that .

These requirements on the key player’s initial probability to cooperate make intuitive sense. In a finitely repeated game, if the key player aims to be an extortioner that profits from the cooperative actions of others, she cannot start to cooperate because she could be taken advantage off by defectors. On the other hand, if she aims to be generous, she cannot start as a defector because this will punish both cooperating and defecting co-players.

Vi Thresholds on discount factors

In the previous section we have characterized the enforceable payoff relations of ZD strategies in finitely repeated -player social dilemma games. Our conditions generalize those obtained for two-player games and illustrate how a single player can exert control over the outcome of an -player repeated game with a finite number of expected rounds. The conditions that result from the existence problem do not specify requirements on the discount factor other than . In practice, one could be interested in how long it would take to enforce some desired payoff relation. In this section we address this problem.

Problem 2 (The minimum threshold problem).

Suppose the desired payoff relation satisfies the conditions in Theorem 1. What is the minimum under which the linear relation with weights can be enforced by the ZD strategist?

We consider the three classes of ZD strategies separately. Before giving the main results it is necessary to introduce some additional notation. Define to be the maximum sum of weights for some permutation of with cooperating co-players. Additionally, for some given payoff relation and define

(11)

In the following, we will use these extrema to derive threshold discount factors for extortionate, generous and equalizer strategies in symmetric -player social dilemma games. The proofs of our results can be found in Section VII.

Vi-a Extortionate ZD strategies

We first consider the case in which and , such that the ZD strategy is extortionate. We have the following result.

Theorem 2.

Assume and satisfy the conditions in Theorem 1, then and . Moreover, the threshold discount factor above which extortionate ZD strategies exist is determined by

Vi-B Generous ZD strategies

If a player instead aims to be generous, in general, different thresholds will apply. Thus, we now consider the case in which and such that the ZD strategy is generous.

Theorem 3.

Assume and satisfy the conditions in Theorem 1. Then and . Moreover, the threshold discount factor above which generous ZD strategies exist is determined by

Vi-C Equalizer ZD strategies

The existence of equalizer strategies with does not impose any requirement on the initial probability to cooperate. In general, one can identify different regions of the unit interval for in which different threshold discount factors exist. For instance, the boundary cases can be examined in a similar manner as was done for extortionate and generous strategies and, in general, will lead to different requirements on the discount factor. In this section, we derive conditions for the discount factor such that the equalizer payoff relation can be enforced for a variable initial probability to cooperate that is within the open unit interval.

Theorem 4.

Suppose and satisfies the bounds in Theorem 1. Then, the equalizer payoff relation can be enforced for if and only if the following inequalities hold

(12)
(13)
(14)
(15)

Based on Theorem 4, the following corollary provides relatively easy to check sufficient conditions that allow an equalizer strategy to enforce a desired linear relation for every initial probability to cooperate in the open unit interval. These sufficient conditions link thresholds for generous and extortionate strategies to those of equalizer strategies.

Corollary 2.

Suppose and satisfies the bounds in Theorem 1. Then, the equalizer payoff relation can be enforced for all if

Proof.

The sufficient conditions are obtained by solving the conditions in Theorem 4, that are linear in , for the smallest upper-bounds on the discount factor . ∎

With Theorems 2, 3, and 4, we have provided expressions for deriving the minimum discount factor for some desired linear relation. Because the expressions depend on the “one-shot” payoff of the -player game, in general they will differ between social dilemmas. In order to determine these expressions, one needs to find the global extrema of a function over that can be efficiently done for a large class of social dilemma games. The derived thresholds can, for example, be used as an indicator for a minimum number of rounds in a experiments on extortion and generosity in repeated games or simply as an indicator for how many expected interactions a single ZD strategists requires to enforce some desired payoff relation in a group of decision makers.

Vii Proofs of the main results

Vii-a Proof of Proposition 3

Suppose all players are cooperating e.g. . Then from the definition of in equation (6) and the payoffs given in Table I, it follows that

(16)

Now suppose that all players are defecting. Similarly, we have

(17)

In order for these payoff relations to be enforceable, it needs to hold that both entries in equations (16) and (17) are in the interval . Equivalently,

(18)

and

(19)

Combining (18) and (19) it follows that From the assumption that listed in Assumption 1, it follows that

(20)

Now suppose there is a single defecting player, i.e., or any of its permutations. In this case, the entries of the memory-one strategy are as given in equation (21). Again, for both cases we require to be in the interval . This results in the inequalities given in equations (22) and (23).

(21)
(22)
(23)

By combining the equations (22) and (23) we obtain

(24)

Again, because of the assumption it follows that

(25)

The inequalities (25) and (20) together imply that

(26)

Because at least one , it follows that

(27)

Combining with equation (20) we obtain

(28)

In combination with equation (26) it follows that

(29)

The inequalities in the equations (28) and (29) finally produce the bounds on :

(30)

Moreover, because it is required that , it follows that . Hence the necessary condition turns into:

(31)

We continue to show the necessary upper and lower bound on . From equation (18) we obtain:

(32)

From equation (20) we know . Together with equation (32) this implies the necessary condition

(33)

We continue with investigating the lower-bound on , from equation (19)

(34)

Because (see equation (20)) it follows that

Naturally, when by assumption 1 it holds that and when then

Vii-B Proof of Proposition 4

For brevity, in the following proof we refer to equations that are found in the proof of Proposition 3. Assume the ZD strategy is extortionate, hence . From the lower bound in (19) in order for to be enforceable, it is necessary that . This proves the first statement. Now assume the ZD strategy is generous, hence . From the lower bound in (18) in order for to be enforceable, it is necessary that . This proves the second statement and completes the proof. 

Vii-C Proof of Theorem 1

In the following we refer to the key player, who is employing the ZD strategy, as player . Let such that and let be the number of co-players that cooperate and let , be the number of co-players that defect. Also, let be the total number of cooperators including player . Using this notation, for some action profile we may write the ZD strategy as

(35)

Also, note that

(36)

and because it holds that

Substituting this into equation (36) and using the payoffs as in Table I we obtain

Accordingly, the entries of the ZD strategy are given by equation (38). For all we require that

(37)

This leads to the inequalities in equations (39) and (40). Because can be chosen arbitrarily small, the inequalities in equation (39) can be satisfied for some and if and only if for all such that the inequalities in equation (41) are satisfied.

(38)
(39)
(40)
(41)

The inequality (41) together with the necessary condition (see Proposition 3) implies that

(42)

and thus provides an upper-bound on the enforceable baseline payoff . We now turn our attention to the inequalities in equation (40) that can be satisfied if and only if for all such that the following holds

(43)

Combining equations (43) and (42) we obtain

(44)

Because