Cooperation on the monte carlo rule Prison's dilemma game on the grid

04/15/2019
by   Jiadong Wu, et al.
0

In this paper, we investigate the prison's dilemma game with monte carlo rule in the view of the idea of the classic Monte Carlo method on the grid. Monte carlo rule is an organic combination of the current dynamic rules of individual policy adjustment, which not only makes full use of information but also reflects the individual's bounded rational behavior and the ambivalence between the pursuit of high returns and high risks. In addition, it also reflects the individual's behavioral execution preferences. The implementation of monte carlo rule brings an extremely good result, higher cooperation level and stronger robustness are achieved by comparing with the unconditional imitation rule, replicator dynamics rule and fermi rule. When analyse the equilibrium density of cooperators as a function of the temptation to defect, it appears a smooth transition between the mixed state of coexistence of cooperators and defectors and the pure state of defectors when enhancing the temptation, which can be perfectly characterized by the trigonometric behavior instead of the power-law behavior discovered in the pioneer's work. When discuss the relationship between the temptation to defect and the average returns of cooperators and defectors, it is found that cooperators' average returns is almost a constant throughout the whole temptation parameter ranges while defectors' decreases as the growth of temptation. Additionally, the insensitivity of cooperation level to the initial density of cooperators and the sensitivity to the social population have been both demonstrated.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

02/08/2021

Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

We model online recommendation systems using the hidden Markov multi-sta...
04/28/2017

The Impact of Coevolution and Abstention on the Emergence of Cooperation

This paper explores the Coevolutionary Optional Prisoner's Dilemma (COPD...
04/03/2019

Monte Carlo algorithms are very effective in finding the largest independent set in sparse random graphs

The effectiveness of stochastic algorithms based on Monte Carlo dynamics...
03/20/2013

A Monte-Carlo Algorithm for Dempster-Shafer Belief

A very computationally-efficient Monte-Carlo algorithm for the calculati...
03/21/2021

Effects of Dynamic-Win-Stay-Lose-Learn model with voluntary participation in social dilemma

In recent years, Win-Stay-Lose-Learn rule has attracted wide attention a...
08/20/2018

Dynamic-sensitive cooperation in the presence of multiple strategy updating rules

The importance of microscopic details on cooperation level is an intensi...
05/30/2020

Temporal Trends of Intraurban Commuting in Baton Rouge 1990-2010

Based on the 1990-2010 CTPP data in Baton Rouge, this research analyzes ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The cooperation among selfish individuals [1] such as Kin cooperation [2], Mutually cooperation and Reputation-seeking cooperation [3], which are contrary to natural selection, are widely discovered in human society [4-6], having already become one of the challenges of evolutionary game theory [7-9].

This phenomenon can be characterized by the prison’s dilemma game [10], in which the individuals adopt one of two strategies: C (cooperation) or D (defection). A selfish individual would select D as his strategy for the sake of higher returns. Nevertheless, if both sides choose D simultaneously, each of them will get less returns than those acquired for mutual cooperation.

Over the years, scholars often focus exclusively on the promotion of cooperation on different spatial structures [11-13]. However, there are not many researches on the dynamic rules of game individual policy adjustment. Different update rules often lead to different results. For example, Sysi-Aho et al. [14] have modified a more rational updating rule on the basis of the research of Harut and Doebeli [15], believing that the individuals have original intelligence in the form of local decision-making rule deciding their strategies. In this rule, individuals suppose their neighbors’ strategies retain unchanged and aim at choosing a strategy to maximize their instant returns. This rule results in the density of cooperator at equilibrium which differ tremendously from those resulting from the replicator dynamics rule in the literature [15], and the cooperation can persist throughout the whole temptation parameter ranges. Li et al. [16] adopt the unconditional imitation rule used in the Nowak and May’s work [17] to revise the regulation of replication dynamics owing to its briefness and the ability of according with the psychology of most individuals. It is discovered that in some parameter ranges, the performance of inhibiting cooperative behaviors originally turns to promoting. Xia et al. [18] compare the effect on cooperation under unconditional imitation rule, replicator dynamics rule and Moran process [19] respectively, discovering that Moran process promotes cooperation much more than the others. Those facts affirm the status of update rules in the evolution of cooperative behavior.

In the existing research, the dynamic rules of game individual strategy adjustment are mainly unconditional imitation rule, replicator dynamics rule and fermi rule [20]. However, replicator dynamics and fermi rule can not make full use of the game information while unconditional imitation rule can not tolerate individuals’ irrational behavior. An excellent update rule that simulates individual policy updates more perfectly needs to be discovered.

In this paper we propose monte carlo rule, which is an organic combination of the current dynamic rules of individual policy adjustment, not only making full use of information but also reflecting the individual’s bounded rational behavior and the ambivalence between the pursuit of high interest and high risk [21,22]. In addition, it also reflects the individual’s behavioral execution preferences [23,24]. We analyse the effect on cooperative level under monte carlo rule and verify its robustness. Further, we analyse the equilibrium density of cooperator as a function of the temptation to defect and use trigonometric curves to characterize it. It is confirmed that the trigonometric fitting effect is better than the power-law fitting in the pioneer’s work [20]. We also investigate the relationship between temptation to defect and the average returns of cooperators and defectors. Additionally, the insensitivity of cooperation level to the initial density of cooperators and the sensitivity to the social population have been both demonstrated by numerical simulation.

2 Prison’s dilemma game model with monte carlo rule

The prison’s dilemma represents a class of game models that its Nash equilibrium only falls on the non-cooperation. In this game model, each individual can adopt one of two strategies C (cooperation) and D (defection). The returns depend on the strategy of both sides. Enormous temptation forces rational individuals to defect. However, if both sides choose the strategy of D simultaneously, each of them will get less returns than those acquired for mutual cooperation.

Game individuals are located on the nodes of the grid, and the edges indicate the connections between one individual and another. Along the footsteps of Nowak et al. [25], the game returns can be simplified as the matrix below:

   Cooperate Defect
Cooperate          
Defect          

Where parameter characterizes the interests of the temptation to defect (). The larger the value of , the greater the temptation of defection to the individuals.

Individuals gain returns by playing prisoner’s dilemma game with their nearest neighbors. During the evolutionary process, each individual adopts one of neighbors’ strategies whose returns is more than or equal to himself’s in the way of roulette, or just insists the original strategy:

(1)

Where represents individual’s own returns and represent the returns of the nearest neighbors respectively, expressing the dimension of the rule network (that is, =4 in this paper) and

showing the probability that individual

would imitate .

is a characteristic function:

(2)

Where represents individual ’s returns and () represents the returns of individual who is one of ’s neighbors.

3 Results and Discussions on Simulation Experiment

For the sake of investigating the influence of spatial structure to cooperative behavior, we apply classical Mean Field Theory [26] which is insensitive to the topology, to preliminary predict the density of cooperators characterized by . Under this circumstance, the average returns of cooperators can be expressed as:

(3)

Where characterizes the dimension of the rule network. In the same way, the average returns of defectors can be expressed as:

(4)

Where the characterizes the temptation to defect.

Following the monte carlo rule, we have obtained the differential equation of :

(5)

Where indicates the probability that a defector transforms into a cooperator and indicates the probability that a cooperator transforms into a defector. is the game round.

On the basis of the differential equation, it turns out that as the game progresses, the density of the cooperator () decreases monotonically until it approaches 0, that is, all the cooperators would go extinct (dotted line in Fig. 1). However, in the space rule networks, cooperators can survive in the form of clusters where they can get support from peers (solid line in Fig. 1), reaffirming the positive role of spatial structure in cooperative behavior[16].

Figure 1: Numerical simulation of cooperator density under monte carlo rule (solid line) at b=1.10. The dotted line represents the well-mixed case obtained by Mean Field Theory in the same parameter environment. Those data are simulated on grid of 100x100 where 50% cooperators as well as 50% defectors are randomly distributed at the beginning, and 1000 simulations are averaged in each case.
Update strategy Information utilization Irrationality
unconditional imitation full not exists
replicator dynamics not full exists
fermi not full exists
monte carlo full exists
Table 1: A brief comparison of the four update rules in this paper.

Similar to the unconditional imitation rule, monte carlo rule makes full use of the the game information when compared with fermi rule and replicator dynamics rule (Table 1). That is, during the policy update phase, individuals would collect game information from all the neighbors to determine the most satisfactory strategy in the next game round, instead of just randomly selecting a neighbor to decide whether to imitate or not. This behavior reflects the rigor of individuals. Also, individual’s psychology of pursuing return growth is reflected vividly in the monte carlo rule. Therefore, the game system under the rule can support the germination of cooperation to a large extent. Figure 2 shows this fact: again the social population =10000 and =1.10, yet there is only one very small cooperative group in the middle of the network at t=0, cooperative behavior can still spread promptly. The middlemost subgraph in Fig. 2 shows the track of function

, and other subgraphs display the distribution of cooperators and defectors at several important simulate moments. At the beginning, the small cooperative group spreads the cooperative behavior in the form of clusters with irregular shapes. At the edge of the clusters, the cooperators resist the temptation of the outside world through the support of peers in the clusters. Owing to the protection of marginal cooperators, the individuals within the clusters are undoubtedly the loyal defenders of cooperation. This model of mutual support enables cooperators to survive in the society. As the game progresses, the rate of propagation continues to increase, reaching the peak at around t=500. Then the rate is slowly reduced to 0, achieving dynamic balance at around t=800. It is clearly that driven by the idea of pursuing progress and full game information, the cooperative behavior quickly spreads throughout the whole network.

(a) t=0
(b) t=20
(c) t=100
(d) t=1000
(e)
(f) t=200
(g) t=800
(h) t=600
(i) t=400
Figure 2: Snapshots of spatial distribution of cooperators (white boxes) and defectors (black boxes) at t=0, 20, 100, 200, 400, 600, 800 and 1000 when there is only one very small cooperative group in the center of the grid at the initial game moment. The middlemost subgraph shows the track of function , and the value of is still set to be 1.10.

Meanwhile, monte carlo strategy continues the inclusiveness of fermi rule and moran process for irrational behavior [27]. It considers the individual’s game returns as his fitness for the society, and the higher the game returns, the more the one can adapt to the society. In general, monte carlo rule is an organic combination of the current dynamic rules of individual policy adjustment, so that it not only makes full use of information, but also reflects the individual’s bounded rational behavior and the ambivalence between the pursuit of high interest and high risk.

Further, we investigate the role of update rules in the prisoner’s dilemma game on the grid. In this paper, we mainly compare three classic update rules with monte carlo rule: unconditional imitation rule, replicator dynamics rule and fermi rule. Here, we briefly outline them:

1. Unconditional imitation rule: during the evolutionary process, the individual would compare his own returns with those of all the neighbors and choose the game strategy with the highest returns as his strategy in next round of game.

2. Replicator dynamics rule: during the evolutionary process, the individual randomly choose a neighbor for returns comparison. If ’s game returns is greater than his own game returns , the individual would imitate ’s strategy with probability in the next game round:

(6)

Where , represent the returns of individual , in the previous game round respectively, and , are the number of neighbors they have. Parameter is the difference between the largest and the smallest parameters in the game matrix (i.e. ).

3. Fermi rule: during the evolutionary process, the individual randomly choose a neighbor for returns comparison, imitating ’s strategy with a certain probability which depends on the difference between the two:

(7)

where represents the noise effect according to the rationality of individuals.

Figure 3: Numerical simulation of cooperator density under monte carlo rule (solid line), replicator dynamics rule (circles), fermi rule (triangles) and unconditional imitation rule (squares) on the grid at b=1.10. 1000 simulations are averaged in each case.

We takes =0.0625 in fermi rule for the reason that, in this case the track of is closest to those under monte carlo rule with the guidance of Mean Field Theory. This setting makes it more fair to compare the effects on cooperation level. Since individuals cannot judge the priority of the two strategies at the beginning, their initial strategies are all based on coin toss. That is, cooperators and defectors occupy 50% of the grid, respectively. The result are showed in Fig. 3. During the PD game, the tends to be stable promptly. We observe that the aggregate cooperation level between individuals is largely elevated under unconditional imitation rule or monte carlo rule, when compared to fermi rule and replicator dynamics. It is clearly that update rules play an important role in the evolutionary theory [19].

(a) (a) R=4%
(b) UI
(c) MC
(d) (b) R=6%
(e) UI
(f) MC
(g) (c) R=11%
(h) UI
(i) MC
Figure 4: The influence of defective invasions on society. White boxes represent cooperators and black boxes represent defectors. Each row corresponds to the different number of defectors (), which are around 4%, 6%, 11% of the society, respectively. The first column shows the strategic distribution when the invasion occurs, the second column shows the game equilibrium results under the unconditional imitation rule, and the third column shows the game equilibrium results under the monte carlo rule. is set to be 10000 and is set to be 1.10.

Although cooperation level under unconditional imitation rule is higher than that of monte carlo rule, its mechanism of just imitating the best without any hesitation makes individuals no longer consider any risks behind the high-yields, so there is a reason to believe that it is not of robustness. To confirm this fact, we consider an extreme situation. All the individuals on the grid are cooperators, and at some point, a group deliberately change its game strategy (defect). After that, we simulate enough time steps for accommodation of the defective invasion. The results are shown in Fig. 4. Each row corresponds to the different number of defectors (), which are around 4%, 6%, 11% of the society, respectively. The first column shows the strategic distribution when the invasion occurs, the second column shows the game equilibrium results under the unconditional imitation rule, and the third column shows the game equilibrium results under the monte carlo rule. As the invasion of the defection, defectors’ neighbors did not hesitate to imitate the defective strategy for higher returns under unconditional imitation, formed the defective core, which could not update its own strategy, resulting in a significant reduction in cooperation level. However, under monte carlo rule, individuals not only seek for high returns, but also concern the high risks behind the high returns and would make a balance between the two. So no matter how serious the invasion, the cooperation will be balanced over time (), not extinct. It is overwhelmingly clear that the robustness of monte carlo rule is significantly better than unconditional imitation rule.

Next, we investigate the fraction of cooperators () as a function of the temptation parameter () under monte carlo rule on the grid since is a pretty meaningful quantity in evolutionary games. The results are showed in Fig.5(a). The monte carlo rule always performs a cooperative society when temptation is low. As increases, the balanced pattern has been broke. On the border, the internal support can not conquer the loss caused by defectors, and the cluster begins to collapse layer by layer until a new balance appears. The cooperators go extinct ultimately at =1.31, for the reason that the huge temptation prevents any form of clusters from overcoming the loss on the border.

(a) (a)
(b) (b) returns comparison
Figure 5: (a) The effect of power law fitting (solid line), quadratic fitting (pluses) and trigonometric fitting (squares) on the curve . For easy comparison, the original data expressed by gray dots are also given in the figure. (b) The relationship between average returns of individuals and the temptation parameter (). 1000 simulations are averaged in each case.

According to Szabo’s outstanding work, the cooperator density data refer to a power-law behavior in prisoner’s dilemma game under the fermi rule on the grid, that is [20], in which is the threshold of the disappearance of cooperation, once , then all the cooperators would go extinct. With the guidance of the goodness of fit () for nonlinear regression and Root Mean Squared Error () displayed below:

(8)
(9)

where represents the original data, is on behalf of the fitting data, and is the length of . We find that power-law fitting originally performing very well under the fermi rule is unsatisfactory under monte carlo rule, which can be seen in Fig.5(a). The solid line represents the best result of the power-law fitting, where =1.31 and =0.923. After further observing the original data, we choose quadratic and trigonometric curves to fit them. Their expressions can be respectively indicated as:

(10)

The fitting effects are also showed in Fig.5(a), and the specific fitting value can be seen in Table 2. The pluses represent the result of quadratic fitting where the most optimal parameter are =-2.7363, =4.8644 and =-1.7205. The squares represent trigonometric’s fitting effect where the best parameters are =-0.2568, =7.2462, =-2.5603 and =0.1314. The fitting effects of the two are better than that of power-law fitting based on the smaller and the better . Furthermore, trigonometric fitting has a better result than quadratic fitting, so we say that the cooperator density data refer to a trigonometric behavior under the monte carlo rule on the grid, that is .

Fitting method Root Mean Squared Error Goodness of Fit
power-law 0.2608 0.9052
quadratic 0.0168 0.9356
trigonometric 0.0149 0.9428
Table 2: The optimal solution of thr three fitting methods.

We have also observed the relationship between average returns of individuals and the temptation parameter . The results are showed in Fig.5(b). Macroscopically, the strategy of cooperation is obviously better than defection due to the better returns. It is worth noting that the average returns of cooperators is insensitive to the temptation parameter . Furthermore, it is surprised to find that the returns of the defectors decreases as the growth of , for the reason that excessive defectors gatherings have greatly reduced the success of defection. It is clear that cooperation makes individuals live better in society.

(a) (a) Sensitivity of to
(b) (b) Sensitivity of to
Figure 6: (a) The numerical simulation results for the density of cooperator () as the game progresses for different initial density of cooperator () on the grid at =1.10. From bottom to top, the is set to be 0.2, 0.4, 0.6, 0.8 and 0.99, respectively. (b) The numerical simulation results for the density of cooperator () at equilibrium for different social population () at =1.10. 3000 simulations are averaged in each case.

In the investigation above, we choose 50% as the initial density of cooperators (). In this section we would illustrate the effect of different values of on the cooperators density at equilibrium (). Figure 6(a) shows the real-time fraction of cooperators () for different initial cooperator densities (). Those data obtained by taking the average value after 3000 times through the same experiment are simulated on the grid of 100x100 at =1.1 and =10000. As we see, different value of can only appreciably affect the the time it takes for the game to reach equilibrium, but do not change the fraction of cooperators at equilibrium. Thus we can say that the fraction of cooperators at equilibrium is insensitive to the initial fraction of cooperators.

Finally, we investigate the effect of the social population () on cooperation level. We fix and , then change social population to obtain corresponding fraction of cooperators at equilibrium. The results are showed in Fig.6(b). For , does not depend on the social population. However, for , decreases with smaller . It is clear that a highly cooperative society depends on a sufficient social population.

4 Conclusions

We introduce a new dynamic rule of game individual strategy adjustment, that is, monte carlo rule and investigate the prison’s dilemma game under it on the grid. Monte carlo rule not only promotes cooperative behavior, but also has higher robustness when compared with unconditional imitation rule, replicator dynamics rule and fermi rule. Under this rule, spatial structure plays a positive role in cooperative behavior, and the equilibrium density of cooperator as a function of the temptation to defect can be perfectly characterized by the trigonometric behavior instead of the power-law behavior discovered in the pioneer’s work under the fermi rule. The society obviously welcomes the cooperation: cooperators can obtain higher and stabler returns than defectors throughout the whole temptation parameter ranges. In addition, the cooperation level is insensitive to the initial density of cooperators but enough social population is needed to maintain a high cooperation level.

References

  • (1) J.W. Weibull, Evolutionary Game Theory, MIT Press, Cambridge, MA (1995).
  • (2) W.D. Hamilton, J. Theor, Biol. 7, 17 (1964).
  • (3) E. Fehr, U. Fischbacher, Nature (London) 425, 785 (2003).
  • (4) R. Axelrod, W.D. Hamilton, Science 211, 1390 (1981).
  • (5) R. Axelrod, The Evolution of Cooperation, Basic Book, New York (1984).
  • (6) L.A. Dugatkin, Cooperation Among Animals, Oxford University Press, Oxford (UK) (1997).
  • (7) J.V. Neumann, O. Morgenstern, Theory of Games and Economic Behaviour, Princeton University Press (1944).
  • (8) J. Maynard Smith, G. Price, Nature 246, 15 (1973).
  • (9) D. Fudenberg, D.K. Levine, The Theory of Learning in Games, The MIT Press (1998).
  • (10) A. Rapoport, A.M. Chammah, Prisoner’s Dilemma, University of Michigan Press, Ann Arbor (1970).
  • (11) J. Vukov, G. Szabo, A. Szolnoki, Cooperation in the noisy case: Prisoner’s dilemma game on two types of regular random graphs, Phy. Rev. E 73, 067103 (2006).
  • (12) C. Hauert, G. Szabo, Game theory and physics, Am. J. Phys. 73, 405 (2005).
  • (13) M. Tomassini, E. Pestelacci, L. Luthi, Social Dilemmas and Cooperation in Complex Networks, International Journal of Modern Physics C 18, 1173-1185 (2007).
  • (14) M. Sysi-Aho, J. Saramaki, J. Kertesz, K. Kaski, Spatial snowdrift game with myopic agents, Eur. Phys. J. B 44, 129 C135 (2005).
  • (15) C. Hauert, M. Doebeli, Spatial structure often inhibits the evolution of cooperation in the snowdrift game, Nature 428, 643-646 (1973).
  • (16) P.P. Li, J.H. Ke, Z.Q. Lin, Cooperative behavior in evolutionary snowdrift games with the unconditional imitation rule on regular lattices, Phys. Rev. E 85, 021111 (2012).
  • (17) M.A. Nowak, R.M. May, Evolutionary games and spatial chaos, Nature 359, 826-829 (1992).
  • (18) C.Y. Xia, J. Wang, L. Wang, S.W. Sun, J.Q. Sun, J.S. Wang, Role of update dynamics in the collective cooperation on the spatial snowdrift games: Beyond unconditional imitation and replicator dynamics, Chaos, Solitons & Fractals 45, 1239-1245 (2012).
  • (19) M. PAP, The statistical process of evolutionary theory, Oxford, UK: Clarendon Press (1962).
  • (20) G. Szabo, C. Toke, Evolutionary prisoner’s dilemma game on a square lattice, Phys. Rev. E 58, 69-73 (1998).
  • (21) D.X. Huang, Game Analysis of Demand Side Evolution of Green Building Based on Revenue-Risk, Journal of Civil Engineering (Chinese edition), 50(2), 110-118 (2017).
  • (22) L.J. He, L.P. Yang, Travel agency income Nash equilibrium model influenced by risk preference, Guangdong University of Technology (Chinese edition), 36 (3), 56-67 (2019).
  • (23) J.C. Coultas, When in Rome: An evolutionary perspective on conformity, Group Processes Intergroup Relations, 7(4), 317-331 (2004).
  • (24) C. Efferson, R. Lalive, P.J. Richerson, et al. Conformists and mavericks: the empirics of frequency - dependent cultural transmission, Evulution and Human Behavior, 29(1), 56-64 (2008).
  • (25) M.A. Nowak and R.M. May, Int. J. Bifurcation Chaos Appl, Sci. Eng. 3, 35 (1993).
  • (26) Z.X. Wu, Complex network and evolutionary game research on it (Chinese edition), Lanzhou: Lanzhou University, 2007.
  • (27) E. Lieberman, C Hauert, M.A. Nowak, Evolutionary dynamics on graphs, Nature, 433(7023), 312-316 (2005).