The cooperation among selfish individuals  such as Kin cooperation , Mutually cooperation and Reputation-seeking cooperation , which are contrary to natural selection, are widely discovered in human society [4-6], having already become one of the challenges of evolutionary game theory [7-9].
This phenomenon can be characterized by the prison’s dilemma game , in which the individuals adopt one of two strategies: C (cooperation) or D (defection). A selfish individual would select D as his strategy for the sake of higher returns. Nevertheless, if both sides choose D simultaneously, each of them will get less returns than those acquired for mutual cooperation.
Over the years, scholars often focus exclusively on the promotion of cooperation on different spatial structures [11-13]. However, there are not many researches on the dynamic rules of game individual policy adjustment. Different update rules often lead to different results. For example, Sysi-Aho et al.  have modified a more rational updating rule on the basis of the research of Harut and Doebeli , believing that the individuals have original intelligence in the form of local decision-making rule deciding their strategies. In this rule, individuals suppose their neighbors’ strategies retain unchanged and aim at choosing a strategy to maximize their instant returns. This rule results in the density of cooperator at equilibrium which differ tremendously from those resulting from the replicator dynamics rule in the literature , and the cooperation can persist throughout the whole temptation parameter ranges. Li et al.  adopt the unconditional imitation rule used in the Nowak and May’s work  to revise the regulation of replication dynamics owing to its briefness and the ability of according with the psychology of most individuals. It is discovered that in some parameter ranges, the performance of inhibiting cooperative behaviors originally turns to promoting. Xia et al.  compare the effect on cooperation under unconditional imitation rule, replicator dynamics rule and Moran process  respectively, discovering that Moran process promotes cooperation much more than the others. Those facts affirm the status of update rules in the evolution of cooperative behavior.
In the existing research, the dynamic rules of game individual strategy adjustment are mainly unconditional imitation rule, replicator dynamics rule and fermi rule . However, replicator dynamics and fermi rule can not make full use of the game information while unconditional imitation rule can not tolerate individuals’ irrational behavior. An excellent update rule that simulates individual policy updates more perfectly needs to be discovered.
In this paper we propose monte carlo rule, which is an organic combination of the current dynamic rules of individual policy adjustment, not only making full use of information but also reflecting the individual’s bounded rational behavior and the ambivalence between the pursuit of high interest and high risk [21,22]. In addition, it also reflects the individual’s behavioral execution preferences [23,24]. We analyse the effect on cooperative level under monte carlo rule and verify its robustness. Further, we analyse the equilibrium density of cooperator as a function of the temptation to defect and use trigonometric curves to characterize it. It is confirmed that the trigonometric fitting effect is better than the power-law fitting in the pioneer’s work . We also investigate the relationship between temptation to defect and the average returns of cooperators and defectors. Additionally, the insensitivity of cooperation level to the initial density of cooperators and the sensitivity to the social population have been both demonstrated by numerical simulation.
2 Prison’s dilemma game model with monte carlo rule
The prison’s dilemma represents a class of game models that its Nash equilibrium only falls on the non-cooperation. In this game model, each individual can adopt one of two strategies C (cooperation) and D (defection). The returns depend on the strategy of both sides. Enormous temptation forces rational individuals to defect. However, if both sides choose the strategy of D simultaneously, each of them will get less returns than those acquired for mutual cooperation.
Game individuals are located on the nodes of the grid, and the edges indicate the connections between one individual and another. Along the footsteps of Nowak et al. , the game returns can be simplified as the matrix below:
Where parameter characterizes the interests of the temptation to defect (). The larger the value of , the greater the temptation of defection to the individuals.
Individuals gain returns by playing prisoner’s dilemma game with their nearest neighbors. During the evolutionary process, each individual adopts one of neighbors’ strategies whose returns is more than or equal to himself’s in the way of roulette, or just insists the original strategy:
Where represents individual’s own returns and represent the returns of the nearest neighbors respectively, expressing the dimension of the rule network (that is, =4 in this paper) and
showing the probability that individualwould imitate .
is a characteristic function:
Where represents individual ’s returns and () represents the returns of individual who is one of ’s neighbors.
3 Results and Discussions on Simulation Experiment
For the sake of investigating the influence of spatial structure to cooperative behavior, we apply classical Mean Field Theory  which is insensitive to the topology, to preliminary predict the density of cooperators characterized by . Under this circumstance, the average returns of cooperators can be expressed as:
Where characterizes the dimension of the rule network. In the same way, the average returns of defectors can be expressed as:
Where the characterizes the temptation to defect.
Following the monte carlo rule, we have obtained the differential equation of :
Where indicates the probability that a defector transforms into a cooperator and indicates the probability that a cooperator transforms into a defector. is the game round.
On the basis of the differential equation, it turns out that as the game progresses, the density of the cooperator () decreases monotonically until it approaches 0, that is, all the cooperators would go extinct (dotted line in Fig. 1). However, in the space rule networks, cooperators can survive in the form of clusters where they can get support from peers (solid line in Fig. 1), reaffirming the positive role of spatial structure in cooperative behavior.
|Update strategy||Information utilization||Irrationality|
|unconditional imitation||full||not exists|
|replicator dynamics||not full||exists|
Similar to the unconditional imitation rule, monte carlo rule makes full use of the the game information when compared with fermi rule and replicator dynamics rule (Table 1). That is, during the policy update phase, individuals would collect game information from all the neighbors to determine the most satisfactory strategy in the next game round, instead of just randomly selecting a neighbor to decide whether to imitate or not. This behavior reflects the rigor of individuals. Also, individual’s psychology of pursuing return growth is reflected vividly in the monte carlo rule. Therefore, the game system under the rule can support the germination of cooperation to a large extent. Figure 2 shows this fact: again the social population =10000 and =1.10, yet there is only one very small cooperative group in the middle of the network at t=0, cooperative behavior can still spread promptly. The middlemost subgraph in Fig. 2 shows the track of function
, and other subgraphs display the distribution of cooperators and defectors at several important simulate moments. At the beginning, the small cooperative group spreads the cooperative behavior in the form of clusters with irregular shapes. At the edge of the clusters, the cooperators resist the temptation of the outside world through the support of peers in the clusters. Owing to the protection of marginal cooperators, the individuals within the clusters are undoubtedly the loyal defenders of cooperation. This model of mutual support enables cooperators to survive in the society. As the game progresses, the rate of propagation continues to increase, reaching the peak at around t=500. Then the rate is slowly reduced to 0, achieving dynamic balance at around t=800. It is clearly that driven by the idea of pursuing progress and full game information, the cooperative behavior quickly spreads throughout the whole network.
Meanwhile, monte carlo strategy continues the inclusiveness of fermi rule and moran process for irrational behavior . It considers the individual’s game returns as his fitness for the society, and the higher the game returns, the more the one can adapt to the society. In general, monte carlo rule is an organic combination of the current dynamic rules of individual policy adjustment, so that it not only makes full use of information, but also reflects the individual’s bounded rational behavior and the ambivalence between the pursuit of high interest and high risk.
Further, we investigate the role of update rules in the prisoner’s dilemma game on the grid. In this paper, we mainly compare three classic update rules with monte carlo rule: unconditional imitation rule, replicator dynamics rule and fermi rule. Here, we briefly outline them:
1. Unconditional imitation rule: during the evolutionary process, the individual would compare his own returns with those of all the neighbors and choose the game strategy with the highest returns as his strategy in next round of game.
2. Replicator dynamics rule: during the evolutionary process, the individual randomly choose a neighbor for returns comparison. If ’s game returns is greater than his own game returns , the individual would imitate ’s strategy with probability in the next game round:
Where , represent the returns of individual , in the previous game round respectively, and , are the number of neighbors they have. Parameter is the difference between the largest and the smallest parameters in the game matrix (i.e. ).
3. Fermi rule: during the evolutionary process, the individual randomly choose a neighbor for returns comparison, imitating ’s strategy with a certain probability which depends on the difference between the two:
where represents the noise effect according to the rationality of individuals.
We takes =0.0625 in fermi rule for the reason that, in this case the track of is closest to those under monte carlo rule with the guidance of Mean Field Theory. This setting makes it more fair to compare the effects on cooperation level. Since individuals cannot judge the priority of the two strategies at the beginning, their initial strategies are all based on coin toss. That is, cooperators and defectors occupy 50% of the grid, respectively. The result are showed in Fig. 3. During the PD game, the tends to be stable promptly. We observe that the aggregate cooperation level between individuals is largely elevated under unconditional imitation rule or monte carlo rule, when compared to fermi rule and replicator dynamics. It is clearly that update rules play an important role in the evolutionary theory .
Although cooperation level under unconditional imitation rule is higher than that of monte carlo rule, its mechanism of just imitating the best without any hesitation makes individuals no longer consider any risks behind the high-yields, so there is a reason to believe that it is not of robustness. To confirm this fact, we consider an extreme situation. All the individuals on the grid are cooperators, and at some point, a group deliberately change its game strategy (defect). After that, we simulate enough time steps for accommodation of the defective invasion. The results are shown in Fig. 4. Each row corresponds to the different number of defectors (), which are around 4%, 6%, 11% of the society, respectively. The first column shows the strategic distribution when the invasion occurs, the second column shows the game equilibrium results under the unconditional imitation rule, and the third column shows the game equilibrium results under the monte carlo rule. As the invasion of the defection, defectors’ neighbors did not hesitate to imitate the defective strategy for higher returns under unconditional imitation, formed the defective core, which could not update its own strategy, resulting in a significant reduction in cooperation level. However, under monte carlo rule, individuals not only seek for high returns, but also concern the high risks behind the high returns and would make a balance between the two. So no matter how serious the invasion, the cooperation will be balanced over time (), not extinct. It is overwhelmingly clear that the robustness of monte carlo rule is significantly better than unconditional imitation rule.
Next, we investigate the fraction of cooperators () as a function of the temptation parameter () under monte carlo rule on the grid since is a pretty meaningful quantity in evolutionary games. The results are showed in Fig.5(a). The monte carlo rule always performs a cooperative society when temptation is low. As increases, the balanced pattern has been broke. On the border, the internal support can not conquer the loss caused by defectors, and the cluster begins to collapse layer by layer until a new balance appears. The cooperators go extinct ultimately at =1.31, for the reason that the huge temptation prevents any form of clusters from overcoming the loss on the border.
According to Szabo’s outstanding work, the cooperator density data refer to a power-law behavior in prisoner’s dilemma game under the fermi rule on the grid, that is , in which is the threshold of the disappearance of cooperation, once , then all the cooperators would go extinct. With the guidance of the goodness of fit () for nonlinear regression and Root Mean Squared Error () displayed below:
where represents the original data, is on behalf of the fitting data, and is the length of . We find that power-law fitting originally performing very well under the fermi rule is unsatisfactory under monte carlo rule, which can be seen in Fig.5(a). The solid line represents the best result of the power-law fitting, where =1.31 and =0.923. After further observing the original data, we choose quadratic and trigonometric curves to fit them. Their expressions can be respectively indicated as:
The fitting effects are also showed in Fig.5(a), and the specific fitting value can be seen in Table 2. The pluses represent the result of quadratic fitting where the most optimal parameter are =-2.7363, =4.8644 and =-1.7205. The squares represent trigonometric’s fitting effect where the best parameters are =-0.2568, =7.2462, =-2.5603 and =0.1314. The fitting effects of the two are better than that of power-law fitting based on the smaller and the better . Furthermore, trigonometric fitting has a better result than quadratic fitting, so we say that the cooperator density data refer to a trigonometric behavior under the monte carlo rule on the grid, that is .
|Fitting method||Root Mean Squared Error||Goodness of Fit|
We have also observed the relationship between average returns of individuals and the temptation parameter . The results are showed in Fig.5(b). Macroscopically, the strategy of cooperation is obviously better than defection due to the better returns. It is worth noting that the average returns of cooperators is insensitive to the temptation parameter . Furthermore, it is surprised to find that the returns of the defectors decreases as the growth of , for the reason that excessive defectors gatherings have greatly reduced the success of defection. It is clear that cooperation makes individuals live better in society.
In the investigation above, we choose 50% as the initial density of cooperators (). In this section we would illustrate the effect of different values of on the cooperators density at equilibrium (). Figure 6(a) shows the real-time fraction of cooperators () for different initial cooperator densities (). Those data obtained by taking the average value after 3000 times through the same experiment are simulated on the grid of 100x100 at =1.1 and =10000. As we see, different value of can only appreciably affect the the time it takes for the game to reach equilibrium, but do not change the fraction of cooperators at equilibrium. Thus we can say that the fraction of cooperators at equilibrium is insensitive to the initial fraction of cooperators.
Finally, we investigate the effect of the social population () on cooperation level. We fix and , then change social population to obtain corresponding fraction of cooperators at equilibrium. The results are showed in Fig.6(b). For , does not depend on the social population. However, for , decreases with smaller . It is clear that a highly cooperative society depends on a sufficient social population.
We introduce a new dynamic rule of game individual strategy adjustment, that is, monte carlo rule and investigate the prison’s dilemma game under it on the grid. Monte carlo rule not only promotes cooperative behavior, but also has higher robustness when compared with unconditional imitation rule, replicator dynamics rule and fermi rule. Under this rule, spatial structure plays a positive role in cooperative behavior, and the equilibrium density of cooperator as a function of the temptation to defect can be perfectly characterized by the trigonometric behavior instead of the power-law behavior discovered in the pioneer’s work under the fermi rule. The society obviously welcomes the cooperation: cooperators can obtain higher and stabler returns than defectors throughout the whole temptation parameter ranges. In addition, the cooperation level is insensitive to the initial density of cooperators but enough social population is needed to maintain a high cooperation level.
- (1) J.W. Weibull, Evolutionary Game Theory, MIT Press, Cambridge, MA (1995).
- (2) W.D. Hamilton, J. Theor, Biol. 7, 17 (1964).
- (3) E. Fehr, U. Fischbacher, Nature (London) 425, 785 (2003).
- (4) R. Axelrod, W.D. Hamilton, Science 211, 1390 (1981).
- (5) R. Axelrod, The Evolution of Cooperation, Basic Book, New York (1984).
- (6) L.A. Dugatkin, Cooperation Among Animals, Oxford University Press, Oxford (UK) (1997).
- (7) J.V. Neumann, O. Morgenstern, Theory of Games and Economic Behaviour, Princeton University Press (1944).
- (8) J. Maynard Smith, G. Price, Nature 246, 15 (1973).
- (9) D. Fudenberg, D.K. Levine, The Theory of Learning in Games, The MIT Press (1998).
- (10) A. Rapoport, A.M. Chammah, Prisoner’s Dilemma, University of Michigan Press, Ann Arbor (1970).
- (11) J. Vukov, G. Szabo, A. Szolnoki, Cooperation in the noisy case: Prisoner’s dilemma game on two types of regular random graphs, Phy. Rev. E 73, 067103 (2006).
- (12) C. Hauert, G. Szabo, Game theory and physics, Am. J. Phys. 73, 405 (2005).
- (13) M. Tomassini, E. Pestelacci, L. Luthi, Social Dilemmas and Cooperation in Complex Networks, International Journal of Modern Physics C 18, 1173-1185 (2007).
- (14) M. Sysi-Aho, J. Saramaki, J. Kertesz, K. Kaski, Spatial snowdrift game with myopic agents, Eur. Phys. J. B 44, 129 C135 (2005).
- (15) C. Hauert, M. Doebeli, Spatial structure often inhibits the evolution of cooperation in the snowdrift game, Nature 428, 643-646 (1973).
- (16) P.P. Li, J.H. Ke, Z.Q. Lin, Cooperative behavior in evolutionary snowdrift games with the unconditional imitation rule on regular lattices, Phys. Rev. E 85, 021111 (2012).
- (17) M.A. Nowak, R.M. May, Evolutionary games and spatial chaos, Nature 359, 826-829 (1992).
- (18) C.Y. Xia, J. Wang, L. Wang, S.W. Sun, J.Q. Sun, J.S. Wang, Role of update dynamics in the collective cooperation on the spatial snowdrift games: Beyond unconditional imitation and replicator dynamics, Chaos, Solitons & Fractals 45, 1239-1245 (2012).
- (19) M. PAP, The statistical process of evolutionary theory, Oxford, UK: Clarendon Press (1962).
- (20) G. Szabo, C. Toke, Evolutionary prisoner’s dilemma game on a square lattice, Phys. Rev. E 58, 69-73 (1998).
- (21) D.X. Huang, Game Analysis of Demand Side Evolution of Green Building Based on Revenue-Risk, Journal of Civil Engineering (Chinese edition), 50(2), 110-118 (2017).
- (22) L.J. He, L.P. Yang, Travel agency income Nash equilibrium model influenced by risk preference, Guangdong University of Technology (Chinese edition), 36 (3), 56-67 (2019).
- (23) J.C. Coultas, When in Rome: An evolutionary perspective on conformity, Group Processes Intergroup Relations, 7(4), 317-331 (2004).
- (24) C. Efferson, R. Lalive, P.J. Richerson, et al. Conformists and mavericks: the empirics of frequency - dependent cultural transmission, Evulution and Human Behavior, 29(1), 56-64 (2008).
- (25) M.A. Nowak and R.M. May, Int. J. Bifurcation Chaos Appl, Sci. Eng. 3, 35 (1993).
- (26) Z.X. Wu, Complex network and evolutionary game research on it (Chinese edition), Lanzhou: Lanzhou University, 2007.
- (27) E. Lieberman, C Hauert, M.A. Nowak, Evolutionary dynamics on graphs, Nature, 433(7023), 312-316 (2005).