Institutional Incentives for the Evolution of Committed Cooperation: Ensuring Participation is as Important as Enhancing Compliance

10/25/2021
by   The Anh Han, et al.
Teesside University
0

Both conventional wisdom and empirical evidence suggests that arranging a prior commitment or agreement before an interaction enhances the chance of reaching mutual cooperation. Yet it is not clear what mechanisms can promote the participation in and compliance with such a commitment, especially when the former is costly and deviating from the latter is profitable. Prior work either considers regimented commitments where compensation is assumed enforceable from dishonest committers, or assume implicit commitments from every individual (so they are all in and thus being treated as such). Here we develop a theory of participation and compliance with respect to an explicit prior commitment under institutional incentives where individuals, at first, decide whether or not to join a cooperative agreement to play a one-shot social dilemma game. Using a mathematical model, we determine when participating in a costly commitment and complying with it, is an evolutionary stable strategy (ESS) when playing against all other possible strategies, and results in high levels of cooperation in the population. We show that, given a sufficient budget for providing incentives, reward of commitment compliant behaviours better promotes cooperation than punishment of non-compliant ones. Moreover, by sparing part of this budget for rewarding those who are willing to participate in a commitment, the overall frequency of cooperation can be significantly enhanced, for both reward and punishment. Finally, we find that, surprisingly, the presence of errors in a participation decision favours evolutionary stability of commitment compliant strategies and higher levels of cooperation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 31

04/13/2021

Signalling boosts the evolution of cooperation in repeated group interactions

Many biological and social systems show significant levels of collective...
06/06/2020

Evolution of Cooperation in the Presence of Higher-Order Interactions: from Networks to Hypergraphs

Many real systems are strongly characterized by collective cooperative p...
01/22/2020

Signalling Acts of Punishment Promotes the Emergence of Cooperation and Enhanced Social Welfare in Evolutionary Games

Social punishment has been suggested as a key approach to ensuring high ...
03/21/2021

Effects of Dynamic-Win-Stay-Lose-Learn model with voluntary participation in social dilemma

In recent years, Win-Stay-Lose-Learn rule has attracted wide attention a...
01/22/2019

Knowing the past improves cooperation in the future

Cooperation is the cornerstone of human evolutionary success. Like no ot...
04/09/2018

How costly punishment, diversity, and density of connectivity influence cooperation in a biological network

It has been an old unsolved puzzle to evolutionary theorists on which me...
07/25/2019

Exploring optimal institutional incentives for public cooperation

Prosocial incentive can promote cooperation, but providing incentive is ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Abstract

Both conventional wisdom and empirical evidence suggests that arranging a prior commitment or agreement before an interaction enhances the chance of reaching mutual cooperation. Yet it is not clear what mechanisms can promote the participation in and compliance with such a commitment, especially when the former is costly and deviating from the latter is profitable. Prior work either considers regimented commitments where compensation is assumed enforceable from dishonest committers, or assume implicit commitments from every individual (so they are all in and thus being treated as such). Here we develop a theory of participation and compliance with respect to an explicit prior commitment under institutional incentives where individuals, at first, decide whether or not to join a cooperative agreement to play a one-shot social dilemma game. Using a mathematical model, we determine when participating in a costly commitment and complying with it, is an evolutionary stable strategy (ESS) when playing against all other possible strategies, and results in high levels of cooperation in the population. We show that, given a sufficient budget for providing incentives, reward of commitment compliant behaviours better promotes cooperation than punishment of non-compliant ones. Moreover, by sparing part of this budget for rewarding those who are willing to participate in a commitment, the overall frequency of cooperation can be significantly enhanced, for both reward and punishment. Finally, we find that, surprisingly, the presence of errors in a participation decision favours evolutionary stability of commitment compliant strategies and higher levels of cooperation.

Keywords: Commitment, reward, punishment, evolution of cooperation, social dilemma, evolutionary dynamics.

1 Introduction

Commitments, such as contracts and agreements, are fundamental components of many social and economic interactions, ranging from personal, to institutional, to political or religious ones, in order to ensure a mutually beneficial outcome for parties involved (Irons, 2001, Nesse, 2001, Han, 2013, Akdeniz and van Veelen, 2021, Frank, 1988, Cherry and McEvoy, 2013, Sasaki et al., 2012). They can be in the form of formal (legal) contracts as well as informal social norms and non-binding promises (Shelton, 2003, Nesse, 2001). Arranging a prior commitment from all parties involved before an interaction improves the chance that people can reach mutual cooperation when individual interests are in conflict (Chen and Komorita, 1994, Kerr et al., 1997, Balliet, 2010). In most modern societies, institutions are created to enforce formal contracts and enhance cooperation, through suitable incentive structures such as punishment for wrongdoing (Zumbansen, 2007, Ostrom, 1990, Nesse, 2001). People joining a religion share certain norms and expectations and might expect certain reward and punishment for a given behaviour (Johnson and Bering, 2006, Irons, 2001). Commitments even found their applications in the context of computerised multi-agent systems, where they are formalised and engineered to ensure agents’ norm compliance and positive behaviour (Singh, 2013).

Evolutionary game theory (EGT)

(Sigmund, 2010) provides an appropriate tool to study the evolution of cooperative behaviour in social dilemmas, as they are governed by institutional incentives (Sasaki et al., 2012, Sigmund et al., 2010, Chen et al., 2015, Wang et al., 2019, Duong and Han, 2021, Cimpeanu et al., 2021, Góis et al., 2019, Sun et al., 2021) and prior commitments (Han et al., 2013, 2017, Sasaki et al., 2015, Akdeniz and van Veelen, 2021). However, prior work has not addressed how incentives influence commitment-based behaviours and how they can be used to efficiently ensure high levels of compliance and cooperation. On the one hand, existing EGT models of commitments assume that commitments are regimented where compensation is assumed enforceable from dishonest committers (i.e. those who committed but then defect in the interaction) (Anh et al., 2015, Han et al., 2013, Martinez-Vaquero et al., 2015, Han and Lenaerts, 2016). Since individuals can decide whether not to honour an adopted commitment—with abundant evidence of commitment breaching in both controlled experiments and real-world scenarios (Nesse, 2001, Dannenberg, 2016, Kerr et al., 1997, Nguyen et al., 2019)—these works did not explain what mechanisms can ensure high levels of participation in and compliance with an adopted commitment. For example, as will be studied in this paper, one might ask whether positive (reward) or negative (punishment) incentive is more efficient for achieving this.

On the other hand, EGT studies exist that describe commitment or agreement based interactions but do not model the formation process explicitly, thus omitting behaviours conditioned on the presence or absence of a commitment (Krapohl et al., 2021, Santos and Pacheco, 2011, Sasaki et al., 2012, Sigmund et al., 2010). Moreover, where incentives are concerned, they only target behaviours in the interaction; they have not addressed incentives for improving participation in and compliance with an adopted commitment (Sasaki et al., 2012, Sigmund et al., 2010, Chen et al., 2015, Wang et al., 2019, Han and Tran-Thanh, 2018, Góis et al., 2019). In this work, we close this gap by studying how to use an incentive budget effectively for improving both participation and compliance, thereby improving the overall cooperation. Since joining a commitment usually involves a cost, e.g. initial time and effort to setting up the commitment (Tappin et al., 2015) and/or membership fees (Heidar, 2006), and requires the involved parties to follow certain restrictive terms and conditions, participation might need to be encouraged. Indeed, examples of incentives to encourage participation are abundant in real world scenarios, including climate change agreement (Barrett and Stavins, 2003), healthcare programs such as for smoking cessation in pregnancy (Tappin et al., 2015) and diabetes (Bruni et al., 2009). Lottery was used as a form of reward for participation in Covid-19 vaccination in many countries including the US (Sehgal, 2021).

Here we investigate, theoretically, the role of both institutional reward and punishment for ensuring commitment compliance in the context of the one-shot Prisoners’ Dilemma (PD) game, in which players can either cooperate (C) and defect (D) (see Methods). Before a PD interaction, individuals can choose whether or not to join a commitment to cooperate in the interaction. The commitment stands when all parties agree. Otherwise, they just interact using the regular PD, in absence of a commitment. In the former case, the committed players share a cost , as a fee for maintaining the institution that provides incentives.

We will examine which type of incentive is more efficient for enhancing commitment compliance and cooperation. We study when commitment compliant behaviours are evolutionarily viable, by considering competition among all possible strategies. Furthermore, given the cost of participation, we hypothesise that, if a fraction of the incentive budget is used to encourage players to join the commitment, the overall cooperation outcome can be improved.

Our analysis will be based on two well-adopted, complementary approaches in EGT, namely, evolutionary stable strategies (ESS analysis) (Maynard-Smith, 1982, Otto and Day, 2007) and finite population dynamics (Nowak and Sigmund, 2005, Sigmund, 2010). While the former allows simple assessment of when a strategy can resist invasion from all other strategies in the population (i.e. being an ESS), it does not capture the detailed stochastic dynamics among all strategies in co-presence. This drawback is well captured with the latter approach. We will determine when participating in a costly commitment and complying with it, is ESS and promotes the evolution of cooperation. We will examine which incentive, reward of commitment compliant behaviours or punishment of non-compliant ones, is more efficient for promoting the evolution of cooperation and when. Next, we will study whether this cooperation outcome can be enhanced by sparing part of the incentive budget for encouraging participation in a commitment despite reducing the incentive budget for commitment compliant behaviours. Finally, we will examine the impact of noise in a participation decision on the stability of commitment compliant strategies and the overall levels of cooperation.

2 Results

2.1 Strategies and payoffs

We consider that, before an interaction, players can choose whether or not to join a commitment to cooperate in the interaction. The commitment stands when all parties agree. Otherwise, players just interact using the regular PD, in absence of a commitment. In the former case, the committed players share a cost .

A strategy is defined by three decisions: i) she accepts (A) or not (N) to join the commitment/agreement; ii) cooperates (C) or defects (D) in the PD if the commitment is formed; iii) cooperates (C) or defects (D) in the PD if the commitment is not formed. Thus, there are eight possible strategies in total, denoted as ACC, ACD, ADC, ADD, NCC, NCD, NDC and NDD. They are summarised in Table 1.

Strategies Accept agreement? Cooperate in presence of agreement? Cooperate in absence of agreement?
ACC Yes Yes Yes
ACD Yes Yes No
ADC Yes No Yes
ADD Yes No No
NCC No Yes Yes
NCD No Yes No
NDC No No Yes
NDD No No No
Table 1: The eight strategies with commitment/agreement formation
Figure 1: Both commitment compliant and non-compliant strategies can be ESS. We show which strategies can be ESS and their frequencies across the parameters space: (increment 0.05), (increment 0.05), (increment 0.05) and (increment 0.02), i.e. the total number of configurations is thus .) We show the number of times (if any) each strategy is an ESS. We observe that for reward, three strategies ACD, ADD and NDD can be ESS, while in case of punishment, NCD can also be an ESS. Other parameters: .
Figure 2: For commitment compliance to be evolutionarily stable, it is necessary that noise is non-negligible and the incentive budget is sufficiently high. Depicted are the frequency of ESS strategies for varying (panel A), (panel B), (panel C) and (panel D). We observe that for reward, three strategies ACD, ADD and NDD can be ESS, while in case of punishment, NCD can also be an ESS. In particular, in both cases, commitment compliant behaviour (ACD) can be ESS only if is sufficiently large, is not too large, and is greater than zero. Other parameters: .

Now, assume that there is a per capita budget available for providing incentives. It can be used to reward those who honour an adopted commitment (i.e. ACC and ACD players), or punish those dishonour it (i.e. ADC and ADD players). Moreover, a fraction of this budget, (), for rewarding those who are willing to participate in a commitment (i.e. AXY players for ), increasing the chance a commitment being formed. The remaining budget, , is used for rewarding commitment compliant players or punishing non-compliant ones, as before. When , it means the budget is used only for incentivising commitment compliant behaviours (i.e., pure reward and pure punishment scenarios). As we consider reward and punishment separately, without loss of generality, we assume that all the incentives described above are equally cost efficient, where the incentive recipient’s increased or decreased amount (corresponding to reward and punishment, respectively) equals the institution’s cost.

Finally, we also consider that, with some small probability

, an error might occur when players choose whether to join a commitment. It can be due to fuzzy mind of trebling hands, as as as miscommunication or environmental noise. We show that this type of noise strongly influences the evolutionary dynamics, even in favor of cooperation and commitment compliance.

The derivation of all the payoff matrices is provided in Methods.

Figure 3: Pure incentives promote high levels of commitment compliance and cooperation. Depicted is the frequency of strategies when pure reward or punishment () is applied, for different values of the per capita budget (). Both reward and punishment allow ACD to prevail when is small. ACC is more frequent in case of reward. ACD is frequent for a larger range of in case of reward than punishment. These together lead to higher levels of cooperation in case of reward than punishment, for the whole range of . Reward helps better suppress non-committers when is high, explaining its success. Other parameters: population size , , .

2.2 When complying to a cooperative commitment can be an evolutionarily stable strategy (ESS Analysis)

In Figure 1, we study which strategies can be ESS (see Methods) across the parameter space, namely for , , and . We observe that, for reward, three strategies ACD, ADD, NDD can be ESS, while NCD can also be ESS in the case of punishment. Thus, ACD is the only ESS that leads to an overall cooperative outcome (i.e. cooperative ESS), where individuals choose to accept a prior commitment and comply with it. All other possible ESS lead to a defective outcome; namely, ADD commits to cooperate but then dishonors the commitment and defects in the interaction. Both NDD and NCD refuse to commit and defect in the interaction.

To gain further insights into when ACD is an ESS, we show in Figure 2 the number of times each strategy is an ESS, for varying , , and separately. We observe that, in the absence of errors when deciding whether to join a commitment (i.e. ), none of the strategies can be ESS (Figure 2D), for both types of incentive. It is because for any strategy, there is always another mutant strategy with an equivalent behaviour in absence of errors (e.g. for ACD, it is ACC; for NDD, it is NDC) and thus, the same fitness. When the error is non-negligible (i.e. ), ACD, ADD, NDD can be ESS in case of reward, while NCD can also be ESS in case of punishment (besides these three strategies).

Focusing on ACD, it can be ESS only when a sufficient budget is available (at least equal the cost of cooperation, ), for both types of incentive (see Figure 2A). It can be ESS for the whole range of being considered (Figure 2B). Moreover, a sufficient fraction of the budget (i.e. not too large ) needs to be spent for rewarding the compliant strategies or sanctioning the non-compliant ones (see Figure 2C). While the frequency of ACD to be ESS decreases with for reward, it peaks at some intermediate value of for punishment. This suggests that it might be more important to incentivise participation for punishment than for reward.

Overall, the ESS analysis provides us with some initial insights regarding when commitment compliance (ACD) can be an evolutionarily viable strategy. However, it does not show the detailed dynamics of the whole system and is limited in quantitative characterisation, e.g. the overall cooperation level in the population for a given parameters’ configuration. Also, as studied below using a stochastic evolutionary dynamics approach, ACD and some other strategies can be the most frequent strategy in the population even when they not ESS, for example, when noise is negligible.

Figure 4: Reward leads to a higher level of cooperation than punishment provided a sufficient per capita budget (the same for both incentives). (A) Frequency of cooperation as a function of when either reward of commitment compliant strategies (ACC and ACD) or punishment of non-compliant ones (ADC and ADD), is applied. The black dotted line corresponds to a reference scenario where no policy is applied. To ensure a high frequency of cooperation, a sufficient budget for providing incentives () is required. Cooperation is reduced when increases, for both reward and punishment. Nevertheless, reward leads to a higher level of cooperation than punishment in most cases, except for when both and are sufficiently low (see and ). (B) Contour plot showing the difference between cooperation obtained through reward and punishment, for varying and . The larger and are, the greater the difference is. Other parameters: population size , , , (no reward for participation).

2.3 Reward vs punishment for promoting frequent committed cooperation

We comparatively study the capability of institutional reward and punishment for promoting the evolution of commitment compliant behaviour and cooperation. We first focus on clarifying the effects of pure incentives, considering . We then study the effect varying , i.e. when part of the per capita incentive budget is used to reward participation in a commitment before an interaction. We also study the impact of having some small non-negligible error probability ().

2.3.1 Pure incentives

In Figure 3, we compute the long-term frequency (i.e. stationary distribution) of the eight strategies (see Methods) under the pure reward and punishment policies, for varying and different values of . Both reward and punishment can enable ACD to prevail when is small, given a sufficiently budget ( and 2). In this case, ACC is more frequent in case of reward. Moreover, ACD is frequent and even dominates the population, for a larger range of in case of reward. As such, a higher level of overall population cooperation is achieved in case of reward, for the whole range of . When is small (), either ADD or NDD dominates the population, leading to the dominance of defection. NDD also dominates even when is larger if is high, in case of punishment.

In Supporting Information (SI), we show that the threshold of for which ACD is risk-dominant (see Methods) against all other strategies, except for ACC to which it is neutral to. For , for a sufficient budget (namely, ), the thresholds are and , respectively, for reward and punishment. Thus, reward allows for a larger range of for which ACD is an evolutionarily viable strategy, having a high long-term frequency. Intuitively, reward enables a better suppression of non-committing strategies such as NDD because the latter do not suffer punishment since they do not commit, while reward provides a payoff advantage for committing strategies. Notably, the thresholds in numerical results are in accordance with these theoretical observations. For example, when , the threshold is for punishment and for reward. For , since ACD is also neutral to defective committers (ADC and ADD), ACD is most frequent in the population up to a slightly slower threshold than the theoretical one (i.e. for punishment for reward).

In Figure 4, we compare the total cooperation frequency in the population obtained through applying pure reward and pure punishment, and also when no policy is applied. As can be seen, reward leads to a higher level of cooperation than punishment in most cases, except for when both and are rather low. Also, we observe that the larger and are, the more efficient reward is compared to punishment in terms of cooperation promotion (see Figure 4B).

Figure 5: Rewarding participation can improve cooperation despite reducing the budget for incentivising commitment compliant behaviour. Depicted is the frequency of cooperation as a function of the fraction of the budget for rewarding of participation (), for different values of the cost of commitment participation . Punishment of non-compliant strategies (panels A, C) or reward of complaint ones (panels B, D), are applied as before, using the remaining budget after rewarding participation. We consider scenarios with a large (, top row, panels A, B) and small (, bottom row, panels C, D) per capita budget. When , it reproduces the results for pure punishment and reward. For both types of incentive, rewarding participation can improve the overall cooperation, especially for the larger . For a larger , a larger fraction of the budget should be used for rewarding participation to reach an optimal level of cooperation. Other parameters: , , .
Figure 6: Rewarding participation promotes higher levels of cooperation given a sufficient budget (). We compare pure reward and punishment (solid red and blue lines) and when an optimal fraction () of the budget is used for rewarding participation (red and blue dashed lines). When is small (left panel), the improvement occurs when is sufficiently large. When is large (right panel), participation can provide a large improvement for the whole range of . The improvement obtained through rewarding participation is more significant for punishment than for reward, for both small and large . Other parameters: , , .

2.3.2 Incentives with rewarding participation

We show in Figure 5 the frequency of cooperation obtained through reward or punishment when a fraction of the per capita budget is used for rewarding those who agree to participate. Note that the scenario where reproduces the above described pure punishment and pure reward scenarios. For both types of incentive, reward of participation in a prior commitment can improve the overall cooperation, especially for a larger available budget for incentive supply (compared top and bottom rows). We observe that, the larger the cost of commitment participation is, a larger fraction of the budget should be used for rewarding participation to reach the highest overall frequency of cooperation in the population. Only when both and are sufficiently small, it is better off not rewarding participation (see and ).

Denoting by the value of leading to the highest level of cooperation, in Figure 6 we compare the level of cooperation obtained at and when participation is not incentivised (i.e. ), and also when no policy is in place. We observe that when is small (), reward of participation leads to an improvement when is sufficiently large. When is larger (), a significant improvement is observed for the whole range of being considered. In addition, we find that the improvement obtained through the reward of participation is greater for punishment than for reward, for both small and large . This observation is in line with the ESS analysis above.

These notable results can be explained by looking at the frequency of strategies as a function of , in Figure S3 in SI. As approaches , for both reward and punishment, the frequency of NDD decreases and those of ACD and ACC increase. This increase is more significant for punishment than for reward. When , ADD frequency starts to increase quickly and becomes dominant in the population since the remaining budget for incentivising commitment-compliant behaviours becomes insufficient. Figure S4 in SI shows that this observation is robust for other values of .

Figure 7: Commitment compliance and cooperation prevails in the presence of noise. Depicted are the frequency of strategies and the total level of cooperation for varying the error probability of decision making at the commitment stage (). We consider both pure punishment (left column) and pure reward (right column), for different values of (top row, ; bottom row, ). The frequency of commitment-compliant strategy (ACD) benefits significantly from having some noise since it is now risk-dominant against ACC, which is not the case in absence of noise. Other parameters: population size , , , , .

2.3.3 Non-negligible noise in commitment participation

We study how an non-negligible probability of error when deciding whether to participate in a commitment (i.e., an AXY player would refuse to commit and act in the same way as a NXY player, and vice versa) impacts the evolutionary dynamics. When it is negligible (), ACD can not be an ESS nor risk-dominant against all other strategies in the population; it is always neutral to ACC. Interestingly, whenever this error probability is larger than zero, ACD becomes always risk-dominant against ACC (see SI), thus making it likely to be more frequent. Indeed, as shown in Figure 7 depicting the frequency of the strategies and the overall level of cooperation as a function of , ACD becomes more frequent and ACC less frequent, as increases. The impact is more significant in case of reward than punishment. It leads to a slight increase in the overall level of cooperation in both cases. These observations are robust for other values of and (see SI, Figure S5).

3 Discussion

It has been suggested that human specialised capacity for commitment might have been shaped by natural selection (Nesse, 2001, Frank, 1988, Akdeniz and van Veelen, 2021). Arranging a commitment from all parties involved prior to an interaction can increase the chance of reaching mutual cooperation (Cherry and McEvoy, 2013, Chen and Komorita, 1994, Dannenberg, 2016, Sasaki et al., 2015), enabling individuals to clarify preferences or intentions from their partners before committing to a potentially costly course of actions (Chen and Komorita, 1994, Han et al., 2015, Tomasello et al., 2005, Sterelny, 2012). Theoretical models of commitments have shown that this is indeed the case in social dilemma settings (Han et al., 2013, 2015, Anh et al., 2015, Ogbo et al., 2021, Barrett, 2016, Sasaki et al., 2015). Corresponding models, however, assume commitments are regimented, wherein compensation can always be enforced from those who dishonour an adopted commitment.

While the assumption of regimented commitments is an useful idealisation, it can be too strict in many applications. Commitment violators might refuse to compensate, are not capable of compensating/paying fine, or even attempt to escape enforcement. Relaxing this assumption, herein we have comparatively explored institutional punishment of commitment violators and reward of commitment fulfillers as suitable mechanisms to enhance commitment compliance and thus the overall cooperation in the population. We have shown that, given the same, sufficiently high, per capita budget for supplying incentives, reward results in a higher level of commitment compliance and cooperation than punishment. This is an useful observation for practical applications since reward does not have the above-mentioned enforcement issues, while punishment does. However, it might be more costly for the institutions to provide rewards when compliant behaviour is frequent. An interesting direction is to consider how to combine reward and punishment in a cost efficient way, as have been done in the context of social dilemmas (without considering commitment-based behaviours) (Chen et al., 2015, Duong and Han, 2021, Sasaki et al., 2012, Góis et al., 2019, Sun et al., 2021).

Participating in a commitment can be quite costly and that might discourage players to join the commitment in the first place. We hypothesised that by spending part of the per capita budget to incentivise participation before the interaction, higher levels of compliance and cooperation might be achievable. Indeed, we have shown that the larger the cost of participation () is, the greater fraction of this budget should to be used for encouraging participation to achieve an optimal level of cooperation. This observation confirms the importance of studying incentives in models considering an explicit process of commitment formation, which has been omitted in extant models of institutional incentives (Sigmund et al., 2010, Chen et al., 2015, Wang et al., 2019, Duong and Han, 2021, Góis et al., 2019, García and Traulsen, 2019). These works have not considered commitment-based interactions nor incentives for encouraging participation in the interaction (and its impact on the overall cooperation).

In a cooperative interaction, the environmental noise is usually expected to lead to a detrimental impact on the emergence and stability of cooperation (Nowak, 2006, Sigmund, 2010) and thus requires additional supporting mechanisms such as apology and forgiveness (Martinez-Vaquero et al., 2015, McCullough, 2008). Surprisingly, we have shown here that the presence of some noise that causes errors when deciding to participate in a prior commitment, can stabilise commitment compliance and cooperation, enabling ACD to become an ESS and risk-dominant against all other strategies. A non-negligible level of noise enables ACD to break ties with the commitment-accepting unconditional cooperators (ACC), who cooperate even when a commitment is not formed and thus can be easily exploited by non-committing defectors. In SI, we also considered execution noise that happens during the PD game. We showed that it has insignificant effects on the evolutionary dynamics and ESS analysis (e.g. none of the strategies can be an ESS for this type of noise).

A drawback of pro-social incentives for promoting cooperation is the possibility of antisocial reward and punishment where defectors might punish cooperators or reward other defectors, hence hindering the evolution of cooperation (Herrmann et al., 2008, Rand and Nowak, 2011, Han, 2016, Dos Santos and Peña, 2017, Szolnoki and Perc, 2015). We argue that this issue disappears when a prior commitment is arranged since it will become clear what behaviour is expected from all parties involved during the interaction. Only those who commit to cooperate can be punished for defection or rewarded for cooperation. It is not deemed justifiable to punish defectors or reward cooperators if they did commit in the first place. That is, commitments enable the freedom of choice from players, which can be important in cases where it might be contestable whether a behaviour is good, or when players might not be capable of cooperation for example due to other commitments or lack of resources to carry it out.

Evolutionary modelling and analysis of voluntary participation has been considered in several studies (De Silva et al., 2010, Mathew and Boyd, 2009, Sasaki et al., 2012, Hauert et al., 2007, Sigmund et al., 2010). However, these works did not consider strategies conditioned on the formation of a commitment (nor incentives for encouraging the participation in it). Typically only a subset of unconditional strategies were considered, including cooperators, defectors and non-participants. Here we have examined a full set of strategies (Table 1). In contrast to these studies, we have shown that the evolutionary stable strategies often exhibit behaviours conditional on the formation of commitment being formed; e.g., see ESS strategies ACD and NDC in Figure 1. Given this crucial limitation of previous works, our model here provides a more complete picture of how prior commitments such as formal and informal contracts and agreements, provides an efficient mechanism for promoting the evolution of cooperation.

Methods

Prisoner’s Dilemma (PD)

The one-shot Prisoner’s Dilemma (PD) game is defined by the following payoff matrix

(1)

Once the interaction is established and both players have decided to play C or D, both players receive the same reward (penalty ) for mutual cooperation (mutual defection). Unilateral cooperation provides the sucker’s payoff for the cooperative player and the temptation to defect for the defecting one. The payoff matrix corresponds to the preferences associated with the PD when the parameters satisfy the ordering, (Sigmund, 2010). For the sake of a simple representation, we sometimes use the Donor game (Sigmund, 2010), a special case of the PD, with , where and stand for the benefit and cost of cooperation, respectively.

Payoff derivation

In absence of participation errors.

First, when incentives are not in use, i.e. the no policy scenario, the payoff matrix for the eight strategies (see Table 1), reads (for row player)

(2)

An observation is that NDD and NCD are equivalent, so are NCC and NDC, because an agreement is only formed when both players agree to join. Thus, when one of these strategists are involved in an interaction, a commitment is never formed and only the move in absence of a commitment matters. Moreover, ACC and ACD are neutral, so are ADC and ADD.

When a per capita budget is available to reward commitment compliant behaviours, with a fraction of it being used for rewarding participation, the payoff matrix reads

(3)

When a per capita budget is available to punish commitment non-compliant behaviours, with a fraction of it being used for rewarding participation reads

(4)

where we denote , , , and , just for the purpose of a neat presentation.

In presence of participation errors.

We assume that with a small probability , players made an error in the decision whether or not to join an agreement in the commitment formation stage (e.g. due to fuzzy mind or trembling hands). All the payoff matrices above can be re-written as follows. Denote , , and . For , the payoff when a player against , , can be written as

Evolutionary Stable Strategies (ESS)

As common assumptions in ESS analysis (Otto and Day, 2007), we assume that i) mutations are rare and thus, there is at most one mutant strategy at a time in a population of individuals with resident strategy , and ii) the mutant’s effect is negligible on the dynamics. To know if a strategy can be invaded or not by another, we need to compute the difference of absolute fitness between a mutant strategy in a population of resident strategy. If the fitness of the mutant is greater than that of the resident, the mutant invades the population and becomes resident. If the fitness of the mutant is lower, the mutant disappears and the resident resists invasion. When the two values of fitness are equal, the resident also resists invasion because in an infinitely large population, a mutant strategy can not invade by drift. A strategy is ESS if it resists invasion from all other strategies.

Evolutionary Dynamics in Finite Population

All the analysis and numerical results are obtained using evolutionary game theory (EGT) methods for finite populations (Nowak et al., 2004, Imhof et al., 2005). In such a setting, agents’ payoff represents their fitness or social success, and evolutionary dynamics is shaped by social learning (Hofbauer and Sigmund, 1998, Sigmund, 2010), whereby the most successful agents will tend to be imitated more often by the other agents. In the current work, social learning is modeled using the so-called pairwise comparison rule (Traulsen et al., 2006), a standard approach in EGT, assuming that an agent with fitness adopts the strategy of another agent with fitness with probability given by the Fermi function, The parameter represents the ‘imitation strength’ or ‘intensity of selection’, i.e., how strongly the agents base their decision to imitate on fitness difference between themselves and the opponents. For , we obtain the limit of neutral drift – the imitation decision is random. For large , imitation becomes increasingly deterministic.

In the absence of mutations or exploration, the end states of evolution are inevitably monomorphic: once such a state is reached, it cannot be escaped through imitation. We thus further assume that, with a certain mutation probability, an agent switches randomly to a different strategy without imitating another agent. In the limit of small mutation rates, the dynamics will proceed with, at most, two strategies in the population, such that the behavioral dynamics can be conveniently described by a Markov Chain, where each state represents a monomorphic population, whereas the transition probabilities are given by the fixation probability of a single mutant

(Imhof et al., 2005, Nowak et al., 2004). The resulting Markov Chain has a stationary distribution, which characterizes the average time the population spends in each of these monomorphic end states.

Let be the size of the population. Denote the payoff a strategist X obtains in a pairwise interaction with strategist (defined in the payoff matrices). Suppose there are at most two strategies in the population, say, agents using strategy A () and agents using strategies B. Thus, the (average) payoff of the agent that uses A and B can be written as follows, respectively,

(5)

Now, the probability to change the number of agents using strategy A by one in each time step can be written as(Traulsen et al., 2006)

(6)

The fixation probability of a single mutant with a strategy A in a population of agents using B is given by (Traulsen et al., 2006, Nowak et al., 2004)

(7)

Considering a set of different strategies, these fixation probabilities determine a transition matrix , with and

, of a Markov Chain. The normalized eigenvector associated with the eigenvalue 1 of the transposed of

provides the stationary distribution described above (Imhof et al., 2005), describing the relative time the population spends adopting each of the strategies.

Risk-dominance

An important measure to compare the two strategies A and B is which direction the transition is stronger or more probable, an A mutant fixating in a population of agents using B, , or a B mutant fixating in the population of agents using A, . It can be shown that the former is stronger, in the limit of large , if (Nowak et al., 2004, Sigmund, 2010)

(8)

Acknowledgements

This research is supported by a Leverhulme Research Fellowship entitled “Incentives for Commitment Compliance” (RF-2020-603/9), awarded to The Anh Han.

References

  • A. Akdeniz and M. van Veelen (2021) The evolution of morality and the role of commitment. Evolutionary Human Sciences, pp. 1–53. Cited by: §1, §1, §3.
  • H. Anh, L. M. Pereira, and T. Lenaerts (2015) Avoiding or Restricting Defectors in Public Goods Games?. J. Royal Soc Interface 12 (103), pp. 20141203. External Links: Document Cited by: §1, §3.
  • D. Balliet (2010) Communication and cooperation in social dilemmas: a meta-analytic review. Journal of Conflict Resolution 54 (1), pp. 39–57. Cited by: §1.
  • S. Barrett and R. Stavins (2003) Increasing participation and compliance in international climate change agreements. International Environmental Agreements 3 (4), pp. 349–376. Cited by: §1.
  • S. Barrett (2016) Coordination vs. voluntarism and enforcement in sustaining international environmental cooperation. Proceedings of the National Academy of Sciences 113 (51), pp. 14515–14522. Cited by: §3.
  • M. L. Bruni, L. Nobilio, and C. Ugolini (2009) Economic incentives in general practice: the impact of pay-for-participation and pay-for-compliance programs on diabetes care. Health policy 90 (2-3), pp. 140–148. Cited by: §1.
  • X. Chen and S. S. Komorita (1994) The effects of communication and commitment in a public goods social dilemma. Organizational Behavior and Human Decision Processes 60 (3), pp. 367–386. Cited by: §1, §3.
  • X. Chen, T. Sasaki, Å. Brännström, and U. Dieckmann (2015) First carrot, then stick: how the adaptive hybridization of incentives promotes cooperation. Journal of The Royal Society Interface 12 (102), pp. 20140935. Cited by: §1, §1, §3, §3.
  • T. L. Cherry and D. M. McEvoy (2013) Enforcing compliance with environmental agreements in the absence of strong institutions: an experimental analysis. Environmental and Resource Economics 54 (1), pp. 63–77. Cited by: §1, §3.
  • T. Cimpeanu, C. Perret, and T. A. Han (2021) Cost-efficient interventions for promoting fairness in the ultimatum game. Knowledge-Based Systems 233, pp. 107545. External Links: ISSN 0950-7051, Document, Link Cited by: §1.
  • A. Dannenberg (2016) Non-binding agreements in public goods experiments. Oxford Economic Papers 68 (1), pp. 279–300. Cited by: §1, §3.
  • H. De Silva, C. Hauert, A. Traulsen, and K. Sigmund (2010) Freedom, enforcement, and the social dilemma of strong altruism. Journal of Evolutionary Economics 20 (2), pp. 203–217. Cited by: §3.
  • M. Dos Santos and J. Peña (2017) Antisocial rewarding in structured populations. Scientific Reports 7 (1), pp. 1–14. Cited by: §3.
  • M. H. Duong and T. A. Han (2021) Cost efficiency of institutional incentives in finite populations. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. Note: (In Press) Cited by: §1, §3, §3.
  • R. H. Frank (1988) Passions Within Reason: The Strategic Role of the Emotions. Norton and Company. Cited by: §1, §3.
  • J. García and A. Traulsen (2019) Evolution of coordinated punishment to enforce cooperation from an unbiased strategy space. Journal of the Royal Society Interface 16 (156), pp. 20190127. Cited by: §3.
  • A. R. Góis, F. P. Santos, J. M. Pacheco, and F. C. Santos (2019) Reward and punishment in climate change dilemmas. Sci. Rep. 9 (1), pp. 1–9. Cited by: §1, §1, §3, §3.
  • T. A. Han and T. Lenaerts (2016) A synergy of costly punishment and commitment in cooperation dilemmas. Adaptive Behavior 24 (4), pp. 237–248. Cited by: §1.
  • T. A. Han, L. M. Pereira, F. C. Santos, and T. Lenaerts (2013) Good agreements make good friends. Scientific reports 3 (2695). Cited by: §1, §3.
  • T. A. Han, F. C. Santos, T. Lenaerts, and L. M. Pereira (2015) Synergy between intention recognition and commitments in cooperation dilemmas. Scientific reports 5 (9312). Cited by: §3.
  • T. A. Han, L. M. Pereira, and T. Lenaerts (2017) Evolution of commitment and level of participation in public goods games. Autonomous Agents and Multi-Agent Systems, pp. 1–23. Cited by: §1.
  • T. A. Han and L. Tran-Thanh (2018) Cost-effective external interference for promoting the evolution of cooperation. Scientific reports 8 (1), pp. 1–9. Cited by: §1.
  • T. A. Han (2013)

    Intention recognition, commitments and their roles in the evolution of cooperation: from artificial intelligence techniques to evolutionary game theory models

    .
    Vol. 9, Springer SAPERE series. External Links: ISBN 978-3-642-37511-8 Cited by: §1.
  • T. A. Han (2016) Emergence of social punishment and cooperation through prior commitments. In AAAI’2016, pp. 2494–2500. Cited by: §3.
  • C. Hauert, A. Traulsen, H. Brandt, M. A. Nowak, and K. Sigmund (2007) Via freedom to coercion: the emergence of costly punishment. Science 316, pp. 1905–1907. Cited by: §3.
  • K. Heidar (2006) Party membership and participation. Handbook of party politics, pp. 301–315. Cited by: §1.
  • B. Herrmann, C. Thöni, and S. Gächter (2008) Antisocial Punishment Across Societies. Science 319 (5868), pp. 1362–1367. Cited by: §3.
  • J. Hofbauer and K. Sigmund (1998) Evolutionary games and population dynamics. Cambridge University Press. Cited by: Evolutionary Dynamics in Finite Population.
  • L. A. Imhof, D. Fudenberg, and M. A. Nowak (2005) Evolutionary cycles of cooperation and defection. Proc. Natl. Acad. Sci. U.S.A. 102, pp. 10797–10800. Cited by: Evolutionary Dynamics in Finite Population, Evolutionary Dynamics in Finite Population, Evolutionary Dynamics in Finite Population.
  • W. Irons (2001) Religion as a hard-to-fake sign of commitment. In Evolution and the capacity for commitment, R. M. Nesse (Ed.), pp. 292–309. Cited by: §1.
  • D. Johnson and J. Bering (2006) Hand of god, mind of man: punishment and cognition in the evolution of cooperation. Evolutionary psychology 4 (1), pp. 147470490600400119. Cited by: §1.
  • N. L. Kerr, J. Garst, D. A. Lewandowski, and S. E. Harris (1997) That still, small voice: commitment to cooperate as an internalized versus a social norm. Personality and social psychology Bulletin 23 (12), pp. 1300–1311. Cited by: §1, §1.
  • S. Krapohl, V. Ocelík, and D. M. Walentek (2021) The instability of globalization: applying evolutionary game theory to global trade cooperation. Public Choice 188 (1), pp. 31–51. Cited by: §1.
  • L. A. Martinez-Vaquero, T. A. Han, L. M. Pereira, and T. Lenaerts (2015) Apology and forgiveness evolve to resolve failures in cooperative agreements. Scientific reports 5 (10639). Cited by: §1, §3.
  • S. Mathew and R. Boyd (2009) When does optional participation allow the evolution of cooperation?. Proceedings of the Royal Society B: Biological Sciences 276 (1659), pp. 1167–1174. Cited by: §3.
  • J. Maynard-Smith (1982) Evolution and the theory of games. Cambridge University Press, Cambridge. Cited by: §1.
  • M. McCullough (2008) Beyond revenge: the evolution of the forgiveness instinct. John Wiley & Sons. Cited by: §3.
  • R. M. Nesse (2001) Evolution and the capacity for commitment. Foundation series on trust, Russell Sage. External Links: ISBN 9780871546227, LCCN 2001041781 Cited by: §1, §1, §3.
  • H. K. Nguyen, R. Chiong, M. Chica, R. Middleton, and D. Thi Kim Pham (2019) Contract farming in the mekong delta’s rice supply chain: insights from an agent-based modeling study. Journal of Artificial Societies and Social Simulation 22 (3), pp. 1. External Links: ISSN 1460-7425, Link, Document Cited by: §1.
  • M. A. Nowak, A. Sasaki, C. Taylor, and D. Fudenberg (2004) Emergence of cooperation and evolutionary stability in finite populations. Nature 428, pp. 646–650. Cited by: Risk-dominance, Evolutionary Dynamics in Finite Population, Evolutionary Dynamics in Finite Population, Evolutionary Dynamics in Finite Population.
  • M. A. Nowak and K. Sigmund (2005) Evolution of indirect reciprocity. Nature 437 (1291-1298). Cited by: §1.
  • M. A. Nowak (2006) Evolutionary dynamics: exploring the equations of life. Harvard University Press, Cambridge, MA. Cited by: §3.
  • N. B. Ogbo, A. Elgarig, and T. A. Han (2021) Evolution of coordination in pairwise and multi-player interactions via prior commitments. Adaptive Behavior (In Press). Note: Preprint arXiv:2009.11727 Cited by: §3.
  • E. Ostrom (1990) Governing the commons: the evolution of institutions for collective action. Cambridge university press. Cited by: §1.
  • S. P. Otto and T. Day (2007) A biologist’s guide to mathematical modeling in ecology and evolution. Vol. 6, Princeton University Press, Princeton, NJ. External Links: ISBN 9780691123448 Cited by: §1, Evolutionary Stable Strategies (ESS).
  • D. G. Rand and M. A. Nowak (2011) The evolution of antisocial punishment in optional public goods games.. Nature Communications 2, pp. 434. External Links: ISSN 2041-1723 Cited by: §3.
  • F. C. Santos and J. M. Pacheco (2011) Risk of collective failure provides an escape from the tragedy of the commons. PNAS 108 (26), pp. 10421–10425. Cited by: §1.
  • T. Sasaki, Å. Brännström, U. Dieckmann, and K. Sigmund (2012) The take-it-or-leave-it option allows small penalties to overcome social dilemmas. Proceedings of the National Academy of Sciences 109 (4), pp. 1165–1169. Cited by: §1, §1, §1, §3, §3.
  • T. Sasaki, I. Okada, S. Uchida, and X. Chen (2015) Commitment to cooperation and peer punishment: its evolution. Games 6 (4), pp. 574–587. Cited by: §1, §3.
  • N. K. Sehgal (2021) Impact of vax-a-million lottery on covid-19 vaccination rates in ohio. The American Journal of Medicine. Cited by: §1.
  • D. Shelton (2003) Commitment and compliance: the role of non-binding norms in the international legal system. Oxford University Press on Demand. Cited by: §1.
  • K. Sigmund, H. D. Silva, A. Traulsen, and C. Hauert (2010) Social learning promotes institutions for governing the commons. Nature 466, pp. 7308. Cited by: §1, §1, §3, §3.
  • K. Sigmund (2010) The calculus of selfishness. Princeton University Press. Cited by: §1, §1, §3, Prisoner’s Dilemma (PD), Risk-dominance, Evolutionary Dynamics in Finite Population.
  • M. P. Singh (2013) Norms as a basis for governing sociotechnical systems. ACM Transactions on Intelligent Systems and Technology (TIST) 5 (1), pp. 21. Cited by: §1.
  • K. Sterelny (2012) The evolved apprentice. MIT Press. Cited by: §3.
  • W. Sun, L. Liu, X. Chen, A. Szolnoki, and V. V. Vasconcelos (2021) Combination of institutional incentives for cooperative governance of risky commons. Iscience 24 (8), pp. 102844. Cited by: §1, §3.
  • A. Szolnoki and M. Perc (2015) Antisocial pool rewarding does not deter public cooperation. Proceedings of the Royal Society B: Biological Sciences 282 (1816), pp. 20151975. Cited by: §3.
  • D. Tappin, L. Bauld, D. Purves, K. Boyd, L. Sinclair, S. MacAskill, J. McKell, B. Friel, A. McConnachie, L. De Caestecker, et al. (2015) Financial incentives for smoking cessation in pregnancy: randomised controlled trial. Bmj 350. Cited by: §1.
  • M. Tomasello, M. Carpenter, J. Call, T. Behne, and H. Moll (2005) Understanding and sharing intentions: the origins of cultural cognition. Behavioral and brain sciences 28 (05), pp. 675–691. Cited by: §3.
  • A. Traulsen, M. A. Nowak, and J. M. Pacheco (2006) Stochastic dynamics of invasion and fixation. Phys. Rev. E 74, pp. 11909. Cited by: Evolutionary Dynamics in Finite Population, Evolutionary Dynamics in Finite Population.
  • S. Wang, X. Chen, and A. Szolnoki (2019) Exploring optimal institutional incentives for public cooperation. Communications in Nonlinear Science and Numerical Simulation 79, pp. 104914. Cited by: §1, §1, §3.
  • P. Zumbansen (2007) The law of society: governance through contract. Indiana Journal of Global Legal Studies 14 (2), pp. 191–233. Cited by: §1.

4 Supporting Information (SI)

Parameter Symbol Range/Value Analysed
Population size 100
Intensity of selection {}
Payoff matrix R, S, T, P {}
Per capita budget
Fraction of the budget for rewarding participation
Cost of commitment participation
Error probability
Table S2: Model parameters and parameter space analysed

4.1 Analytical results: risk dominance analysis

4.1.1 In absence of errors

In case of reward: For ACD to be risk dominant against all other strategies (except for ACC to which it is neutral)

In case of punishment: For ACD to be risk dominant against all other strategies (except for ACC to which it is neutral)

When , they are equivalent to

Also, for Donation game, i.e., , the equations are simplified to, respectively,

First equation (for both reward and punishment) suggests that a larger makes it more difficult for ACD to be risk dominant against commitment accepting players. A larger reward for participation provides a greater advantage to defective committers (e.g., ADD, ADC) as a smaller budget available for incentivising commitment compliance. ADD is increasingly more successful as such. Also from this equation, as , the necessary condition is that the budget size must be at least equal to (or in case of the Donation game).

On the other hand, a larger reward (i.e., greater value of ) for participation provides advantage to all committers against non-participating players (see second inequality). It’s equivalent to reducing the cost of participation by for reward and for punishment, respectively, compared to when no participation rewarding is present. Thus, the advantage of increasing is greater for punishment than reward. Thus, one might expect that there is an optimal value of that leads to highest frequency of ACD. This analytical observations are in line with the numerical results reported in the main text.

4.1.2 In presence of errors (at pre-commitment state)

Differently from the scenario without noise, ACD can be risk dominant against ACC if, for both types of incentive,

which always holds given that .

Now, for ACD to be risk dominant against all other strategies, including ACC, it must hold that

for both types of incentive, plus the following condition, differently for (pure) reward and punishment. For reward,

For punishment,

4.2 Errors during the games

In the main text, we focused on errors a the pre-commitment stage. Here we consider that with a small probability an error occurs when an intended action is carried out during the interaction (i.e. PD game). We assume that this probability is the same regardless of the presence of an agreement. All the payoff matrices described in Methods (main text) can be re-written as follows.

Namely, for a payoff matrix , and for and , and denote and , the payoff a player received when playing against another player , i.e., , is given by

when a commitment is formed, i.e. when . Otherwise (), it is given by

We can show that this type of error during the interactions, unlike errors at the pre-commitment stage, does not change the equivalence of NCC - NDC, NCD - NDD nor the neutrality between ACC - ACD, ADC - ADD. Thus, as in the case of no errors, none of the strategies can be ESS. Also, ACD can not be risk-dominant against all other strategies. Given this, we have focused in this paper on the errors in a decision whether to participate in a prior commitment.

Figure S1: (Pure) Reward and punishment in promoting cooperation. Frequency of cooperation when either reward or punishment is applied to those who honour (i.e. ACC and ACD) or dishonour (i.e. ADC and ADD) an adopted commitment, for varying (cost of commitment) and per-interaction budget for incentives (). When , it is equivalent to when no policy is applied, providing a baseline reference. To ensure a high frequency of cooperation, a sufficient budget for providing incentives () is required. However, cooperation always reduces when increases, both both reward and punishment. Reward ensures a similarly high frequency of cooperation for a larger range of these parameters. Other parameters: population size , , .
Figure S2: Frequency of strategies as a function of , when rewarding of participation is applied, besides either punishment or reward as before (Small budget, ). We show results for different values of . Other parameters: population size , , .
Figure S3: Rewarding participation. Frequency of strategies as a function of , when rewarding of participation is applied, besides either punishment (A) or reward (B). For small (smaller than ), the frequency of NDD decreases and those of ACD and ACC increase. This increase is more significant for punishment than for reward. When , ADD frequency starts to increase quickly and become dominant in the population since the remaining budget for incentivising behaviours in the game becomes insufficient. Other parameters: population size , , , , .
Figure S4: Frequency of strategies as a function of , when rewarding of participation is applied, besides either punishment or reward as before (Large budget, ). We show results for different values of . Other parameters: population size , , .
Figure S5: Frequency of strategies for varying error/noise probability (at the pre-commitment decision stage), for pure punishment and reward (). We consider scenarios with a small cost of commitment (, top two rows) and with a large one (, bottom two rows). When the budget is large , ACD benefits significantly from having some noise (since it can dominates ACC, see risk dom analysis). Other parameters: population size , , .