# Tuning Cooperative Behavior in Games with Nonlinear Opinion Dynamics

We examine the tuning of cooperative behavior in repeated multi-agent games using an analytically tractable, continuous-time, nonlinear model of opinion dynamics. Each modeled agent updates its real-valued opinion about each available strategy in response to payoffs and other agent opinions, as observed over a network. We show how the model provides a principled and systematic means to investigate behavior of agents that select strategies using rationality and reciprocity, key features of human decision-making in social dilemmas. For two-strategy games, we use bifurcation analysis to prove conditions for the bistability of two equilibria and conditions for the first (second) equilibrium to reflect all agents favoring the first (second) strategy. We prove how model parameters, e.g., level of attention to opinions of others (reciprocity), network structure, and payoffs, influence dynamics and, notably, the size of the region of attraction to each stable equilibrium. We provide insights by examining the tuning of the bistability of mutual cooperation and mutual defection and their regions of attraction for the repeated prisoner's dilemma and the repeated multi-agent public goods game. Our results generalize to games with more strategies, heterogeneity, and additional feedback dynamics, such as those designed to elicit cooperation.

## Authors

• 1 publication
• 2 publications
• 2 publications
• 2 publications
• 16 publications
02/05/2016

### Strategic disclosure of opinions on a social network

We study the strategic aspects of social influence in a society of agent...
11/15/2018

### Cooperation Enforcement and Collusion Resistance in Repeated Public Goods Games

Enforcing cooperation among substantial agents is one of the main object...
09/24/2019

### Asynchronous and time-varying proximal type dynamics multi-agent network games

In this paper, we study proximal type dynamics in the context of noncoop...
10/22/2018

### Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation

Learning to cooperate is crucially important in multi-agent reinforcemen...
04/29/2009

### Continuous Strategy Replicator Dynamics for Multi--Agent Learning

The problem of multi-agent learning and adaptation has attracted a great...
06/21/2017

### Influence of periodic external fields in multiagent models with language dynamics

We investigate large-scale effects induced by external fields, phenomeno...
07/22/2020

### When to (or not to) trust intelligent machines: Insights from an evolutionary game theory analysis of trust in repeated games

The actions of intelligent agents, such as chatbots, recommender systems...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Sociologists, political scientists, and economists have long argued that reciprocity is key to promoting cooperation [1, 2, 3]. Computer simulations have shown that reciprocal strategies can elicit mutual cooperation in repeated games: the winning strategy for the repeated prisoner’s dilemma in Axelrod’s tournaments was Tit-for-Tat (TFT), where an agent reciprocates the opponent’s strategy in the previous round; more generally, successful strategies were nice, forgiving, provocable, and clear [2]. Subsequent laboratory studies have revealed that humans in fact employ such reciprocity-based rules in repeated interactions [4, 5, 6]. However, the observed reciprocity cannot be recapitulated by game-theoretic models of rational, payoff-maximizing agents, which, in contrast to the experiments, predict convergence toward mutual defection, i.e., the Nash equilibrium in a social dilemma.

Here we investigate the tuning of cooperative behavior, including mutual cooperation or coordination, in repeated games among agents that rely on both rationality and reciprocity. Our first key contribution is a new framework for studying multi-agent repeated games using the nonlinear opinion dynamics model [7] (see also [8]) in which agents’ strategic decisions depend not only on payoffs, as in rationality models [9, 10], but also on social interactions that enable agents to observe strategy preferences (opinions) of other agents. We show how the social interaction term, formulated as a saturation function of observed opinions, provides a representation of reciprocity and a means to tune cooperation (or coordination) in social dilemmas.

Our second key contribution leverages analytical tractability of the model: we prove conditions for bistability of two equilibria for repeated two-strategy games in which multiple agents observe the opinions of others over a fixed network. We also show conditions under which each equilibrium corresponds to all agents favoring one of the two strategies. Our proof relies on a bifurcation analysis that builds on the results of [7]. We prove how the bistability of equilibria and the regions of attraction depend on level of attention to observed opinions (reciprocity), network structure, payoffs, and other model parameters. We apply our theory to the two-agent prisoner’s dilemma and the multi-agent public goods game to present further insights on how mutual cooperation emerges through social interaction (reciprocity) and how the predicted likelihood of cooperation can be tuned. Our results apply analogously to tuning coordination in games like the Stag Hunt. Our analytical results complement the large literature on reciprocity-based decision-making [2] that evaluates agents’ long-term interaction with computer simulations.

Most models of opinion dynamics in the literature use an opinion updating process that relies on a linear weighted average of exchanged opinions, as in the original work of DeGroot [11]. The nonlinear opinion dynamics model of [7] instead applies a saturation function to exchanged opinions, making the updating process fundamentally nonlinear and thus allowing for multistability of equilbria, a key aspect of our project. For a comprehensive review of, and comparison with, other opinion dynamics models see [7]. Our investigation of the means to tune cooperation in social dilemmas is also distinguished from works such as [12, 13] that examine opinion dynamics using game-theoretic approaches.

Our approach is also distinguished from the investigations in [7]: evolving opinions, which represent strategy preferences, depend not only on saturated opinion exchange but also on the payoff mechanism of the game. Our results are also new: they explain the emergence of mutual cooperation (or coordination) in social dilemmas as one of two bistable equilibria that arise through a pitchfork bifurcation.

In §II, we introduce the nonlinear opinion dynamics model and show how it recovers rationality and reciprocity. In §III, for two-strategy games, we prove the bistability of equilibria and expressions for the tunability of those equilibria and their corresponding regions of attraction in terms of system parameters. We apply the theory to the prisoner’s dilemma and public goods game. In §IV we use numerical simulations to illustrate the theoretical predictions on the tuning of cooperation. In §V, we discuss extensions and generalizations.

## Ii Opinion Dynamics in Games

Consider an -agent decision-making problem where each agent selects a strategy, continuously in time , from the set of available strategies. Each agent performs a probabilistic choice of strategy where

is the probability distribution for the strategy selection at time

of agent and

is the probability simplex in

. The -th element of is the probability that agent  selects strategy

. Following convention in game theory

[14], is the mixed strategy of agent  and is the mixed strategy profile, where .

The mixed strategy

is defined by the logit choice function

[10] and depends on agent ’s opinion state at time , , as follows:

 xij=σj(¯zi)=exp(η−1¯zij)∑Nsl=1exp(η−1¯zil), (1)

where the positive constant is called the noise level [15] or rationality parameter [16].111For simplicity, we assume that is identical across the agents. Each entry of represents agent ’s preference for the -th available strategy. The relative opinion state defines an agent’s preferred strategies, i.e., the inequality can be interpreted as the agent favoring strategy  relative to other strategies and the magnitude denotes the level of its preference. Under logit choice (1), the higher relative to other entries of , the more likely agent  selects strategy . (1) can be interpreted as the best response with respect to the opinion state subject to a random perturbation [15].

Given mixed strategy profile , we let be the payoff function for agent . Entry defines agent ’s payoff associated with strategy . The following are examples of multi-agent games.

###### Example 1 (Prisoner’s Dilemma)

Consider two agents, each with two available strategies: cooperate (strategy ) and defect (strategy ). When both agents cooperate or defect, they receive payoff or , respectively. If one defects while the other cooperates, the former receives payoff and the latter receives . The payoff function is

 Ui(x)=(Ui1(x)Ui2(x))=(pCCpCDpDCpDD)x−i, i∈{1,2} (2)

where, as shorthand notation, we let and . The parameters satisfy , which means that the agents have individual incentives to defect and receive , even though they would receive the higher payoff by cooperating.

###### Example 2 (Public Goods Game)

There are agents and strategies. Each agent has a total wealth of and selects a strategy in that corresponds to contributing to a public pool. The total contribution is multiplied by a factor and distributed equally among all agents. The payoff function is

 Uij(x)=a(j−1)+ρNa∑Nak≠ik=1∑Nsl=1a(Ns−l)xkl+ρNaa(Ns−j), i∈{1,⋯,Na}, j∈{1,⋯,Ns}, (3)

where and . According to (3), regardless of the others’ contributions, each agent receives the highest payoff when it makes no contribution to the pool. Hence, the rational agent contributes nothing, i.e., chooses .

We define rate-of-change of agent ’s opinion state in response to payoffs and social interactions, with the continuous-time nonlinear opinion dynamics model [7]222 In III, we explain how (4) relates to its original form presented in [7]. For concise presentation, we omit time dependency of the variables in (4).:

 ˙¯zij=−di(¯zij−ui∑Nak=12R(Ajikzkj)−Uij(x)), (4)

with . is the weight agent places in its evaluation of strategy on its observation of agent ’s opinion of strategy . The constant resistance parameter reflects the speed with which agent ’s opinions change; the attention parameter reflects the weight placed on incentives derived from social interactions, where . Thus, the state of agent , and hence its strategy selection, evolves according to the accumulation over time, with the discount factor , of the payoffs and social incentives .

We define as the saturating function

 R(Ajikzkj)=P(Ajikzkj≥ϵ), (5)

where

is a random variable with a symmetric and unimodal probability density function, e.g., the standard normal distribution. To interpret, suppose

. Then quantifies the influence of noise on inter-agent interactions: the larger , the smaller the effect of noise .333 See §II-C for more discussions on the parameter . Thus, we can interpret (5) as a probabilistic model of agent ’s perception of agent ’s preference for strategy  over other strategies.

### Ii-a Emergence of Cooperative Equilibrium

In this section, using the prisoner’s dilemma as an illustrative example, we provide intuition for how the equilibria of (4) depend on system parameters, and under what parameter regime a cooperative equilibrium emerges. To simplify the presentation, let , , , and if . Let be an equilibrium of (4) that satisfies

 ¯z∗ij =2u(R(αz∗ij)+∑Nack≠ik=1R(γz∗kj))+Uij(x∗), (6)

where .

Note that by (5), in a dense subset of the tangent space of , as the influence of the noise in the social interaction becomes arbitrarily small, i.e., are arbitrarily large, converges to a binary (-valued) function. If are sufficiently large, we can approximate (6) as , where is the number of agents having a positive opinion of strategy  at equilibrium. As the attention increases, each agent tends to favor the most popular strategy even though selecting other strategies would return higher payoffs. It follows that the social interaction incentivizes each agent to reciprocate with other agents in the strategy selection, and the level of reciprocation is determined by the attention parameter and the number of agents preferring the same strategy under consideration.

Example: With two reciprocating agents (, ) playing the prisoner’s dilemma (), the equilibrium satisfies , where indicates whether the opponent cooperates () or defects (). If the attention parameter satisfies , then for sufficiently large , cooperation becomes an equilibrium of (4). Moreover, given any arbitrarily large , there is a minimum value of below which cooperation will not be an equilibrium.

### Ii-B Rationality and Reciprocity in the Model

In this section we show how the model (4) captures a range of features observed in human decision-making, including (bounded) rationality [17] and reciprocity [3, 1]. We begin by showing that (4

) generalizes the exponentially discounted reinforcement learning (EXP-D-RL) model studied in

[9] where every agent makes an individually rational decision by selecting payoff-maximizing strategies. To see this, let for and for which the social interaction becomes constant, i.e., . By translating by constant and since the logit choice function is invariant with respect to the translation of , (4) specializes to

 ˙¯zij=−di(¯zij−Uij(x)),xij=exp(η−1¯zij)∑Nsl=1exp(η−1¯zil),

which is the EXP-D-RL model presented in [9]. In this sense, our model (4) realizes rationality.

To discuss reciprocity of the opinion dynamics, we consider a two-agent two-strategy case. Suppose that if and otherwise, where is the noise level constant in the logit choice function (1). Then, with , we have if and otherwise.

For small , assuming that is arbitrarily small, we can approximate the opinion dynamics model (4) as

 ¯zij(t+h)−¯zij(t)≈−hdi(¯zij(t)−2uix−ij(t)).

For sufficiently large , by evaluating the opinion state at time instant with , we observe that

 ¯zij(t+h)≈2uix−ij(t). (7)

Recall that is the -th entry of the mixed strategy of the opponent of agent . According to (7), with large , it holds that if and only if . In the prisoner’s dilemma, under (7), each agent  decides to cooperate (or defect) if its opponent does so at the previous stage. This behavior resembles TFT, a well-known reciprocity-based strategy in discrete-time iterated games [2]. In this sense, our model (4) realizes reciprocity.

### Ii-C Further Remarks on the Model (4)

Social interaction encourages reciprocity: When for , the social interaction in (4) encourages reciprocity by incentivizing each agent to select the strategies preferred by other agents. As shown in §IV, in the prisoner’s dilemma and public goods game, such a social interaction mechanism leads to decision-making representative of human behavior; notably, the agents conditionally cooperate. This contrasts with the outcomes of rationality-based models where agents fail to cooperate (or coordinate).

Our model and analysis can be readily extended to a more general case, as in [7], where the social interaction term in (4) is given by . In this generalization, agent ’s opinion of strategy may also depend on other agent opinions of strategies .

Network structure: The in (4) define a network structure among agents for strategy . One can specify the presence ( for reciprocal, for antagonistic) or lack () of interaction between agents  and in their selecting strategy . We prove results on the role of network structure in our model in §III. See [7, 18] for more on network structure and the nonlinear opinion dynamics.

## Iii Bistability Analysis of 2-Strategy Games

We present bistability analysis for (4) in two-strategy games with homogeneous parameters.444The proofs of all the theorems are provided in the Appendix. We assume and , with , are simple graphs governing the social interaction and game interaction, respectively, and and are the corresponding adjacency matrices. We assume the payoff function has the form:

 (Ui1(x)Ui2(x))=∑k∈^E(p11p12p21p22)xk+(b1b2), (8)

and the parameters of (4) are given by , , and , and if .

For analysis, we adopt the original form of (4) from [7]:

 ˙zij=Fij(z)−1Ns∑Nsl=1Fil(z), ∑Nsj=1zij(0)=0, (9) Fij(z)=−d(zij−u(S(αzij)+∑k∈ES(γzkj))−Uij(x))

where and the saturation function is given by . The variable denotes the relative opinion state. In Theorem 1, we show that models (4) and (9) are related by the projection , where , and yield the same transient and steady-state mixed-strategy behavior.

###### Theorem 1

The following two statements are true.

i) If is a solution of (4), then , satisfying , is a solution of (9). Conversely, if is a solution of (9), then defined as with satisfies and is a solution of (4).

ii) If is a stable (unstable) equilibrium of (4), then , satisfying , is a stable (unstable) equilibrium of (9). Conversely, if is a stable (unstable) equilibrium of (9) then , defined as with satisfies and is a stable (unstable) equilibrium of (4).

We further assume that satisfies the following conditions:

is odd sigmoidal and it holds that

, , , and .555To simplify the notation, without loss of generality, we make the assumption that , for instance, by scaling . Since , we can simplify the expression (9) as

 ˙z=−d(z−u(S(αz)+AS(γz))−14p^Atanh(η−1z)−14p⊥^A1−(b1−b2)1) (10)

with , , , and .

###### Theorem 2 (Bistability in games)

Consider (10). Let

be the largest-real-part eigenvalue of

and

be its corresponding right (left) eigenvector.

i) Suppose is real and simple, and holds. When , there exists a critical value for which if , the origin is locally exponentially stable, and if , the origin is unstable and two bistable equilibrium solution branches emerge in a symmetric pitchfork bifurcation along a manifold tangent to the span of . When , , and/or are nonzero, the system is an unfolding of the symmetric pitchfork bifurcation, and the parameter

 b=⟨wmax,14dp⊥^A1+d(b1−b2)1⟩ (11)

determines the direction of the unfolding. Furthermore, depends on according to .

ii) Suppose is an irreducible nonnegative matrix.666This holds, e.g., when , , and at least one of , corresponds to a connected graph. Near , for the bistable equilibria, , , i.e., all agents favor the same strategy.

iii) Suppose () is also left (right) eigenvector of both and . Denote by , the eigenvalues of , , respectively, corresponding to . Then , and the unfolding parameter (11) simplifies to

 b=d(14p⊥^λ+b1−b2)⟨wmax,1⟩. (12)

The following theorem shows how the bifurcation depends on degree (number of neighbors) for regular graphs.

###### Theorem 3

Suppose , , and , are undirected, connected, and regular with degrees , , respectively. The bifurcation point and unfolding parameter satisfy , , and .

###### Remark 1

For games with more than strategies and heterogeneous payoff functions, the analysis can be generalized using

In what follows, we discuss implications of Theorems 2,3 in social dilemmas using the prisoner’s dilemma and public goods game. From now on, we take .

Prisoner’s dilemma: Let , , , and so (8) specializes to (2).

###### Corollary 1

For , the following hold: , , and . Hence, we have and .

Figs. 1(a),1(b) show the bifurcation diagram (plot of equilibria as a function of bifurcation parameter ) of the Lyapunov-Schmidt reduction (16) of (9), for two values of . and have a two-fold effect: i) changes the location of the pitchfork bifurcation point in the -axis. ii) Since , the pitchfork bifurcation unfolds favoring the branch of solutions corresponding to mutual defection. For sufficiently large (equivalently, ), a branch of solutions corresponding to mutual cooperation emerges, and the larger the the larger its region of attraction. A small is required for larger , since it decreases incentive to defect.

If , the game is the Stag Hunt where the strategy to hunt a stag replaces cooperation and the strategy to hunt a hare replaces defection. Coordinated stag hunting and coordinated hare hunting are both Nash equilibria, the former payoff-dominating and the latter risk-dominating. The model predicts the larger the , the larger the region of attraction to coordinated stag hunting.

Public goods game: Let , i.e., each agent decides to cooperate and contributes its entire wealth , or defect and contributes nothing . Note that (8) specializes to (3) by selecting , , and with all-to-all graph .

###### Corollary 2

With , it holds that . For and connected graph , the following hold: i) The eigenvectors have all nonzero same-sign entries, and and . ii) When is regular with degree , it holds that , i.e., with larger , bistability requires less attention .

Figs. 1(c),1(d) show the bifurcation diagram for two values of . Since , it has no effect. However, ; hence, for reciprocating agents (), the pitchfork bifurcation unfolds towards the branch of solutions corresponding to no agent contributing to the public pool. Since the strength of the unfolding is proportional to , emergence of the mutually cooperative solution, when all agents contribute, requires a smaller for smaller and for a fixed its region of attraction grows as decreases.

## Iv Numerical Studies

### Iv-a Prisoner’s Dilemma

We set , , , for , and . Consider the payoff matrix (2) given by

 (pCCpCDpDCpDD)=(350−r40+r5) (13)

and is an extra reward (penalty) an agent receives if it defects (cooperates) while its opponent cooperates (defects).

Using simulations, we illustrate limit points of the opinion state trajectories, predicted by the theory. In Fig. 2 each heatmap illustrates the probability of both agents cooperating and the two axes represent the initial opinion states of the agents associated with the cooperation strategy. Since the two agents are reciprocating, for all cases, we observe that the heatmaps for both agents are identical, and hence we only present that of agent 1.

In Figs. 2(b) and 2(c), we can observe that when both agents are nice, i.e., the agents’ initial opinion states for the cooperation are large enough, they can maintain mutual cooperation. Also, a sufficiently nice agent () forgives the exploiting behavior (defection) of its opponent that initially is not nice (). However, when its opponent has a strong intention to defect ( substantially large), the agent also defects to avoid being exploited and is provocable.

An increase in motivates the agents to defect (Fig. 2). When , since , the cooperation strategy is dominated by the defection strategy, and both agents eventually defect (Fig. 2(a)). Thus, as predicted by the theory and illustrated in Figs. 1(a),1(b), when there is a strong enough incentive to defect, the level of attention to opinion exchanges, which translates into the level of reciprocity, may be insufficient to prevent the agents from pursuing individually rational decision-making.

### Iv-B Public Goods Game

For the -strategy public goods game, we adopt the same parameters of (9) as in §IV-A except that , the inter-agent interactions are governed by the Erdös-Rényi graph with parameter (for , with probability and with probability ), and the initial opinion state of each agent is uniformly randomly selected as , where is a bias in favor of cooperation. Let , so (3) is

 Uij(x)=⎧⎪ ⎪⎨⎪ ⎪⎩a10+a10∑20k≠ik=1xk1if j=1a+a10∑20k≠ik=1xk1if j=2. (14)

We evaluate opinion state trajectories over a range of values of , , and to explore how the network structure of the social interaction, initial opinion states, and total wealth tune the emergence of cooperation as predicted by the theory.

Each heatmap in Fig. 3 depicts, for a given , the average number of agents that cooperate at steady-state for a range of . Both network structure, determined by , and agents’ initial preference to contribute to the public pool, determined by , play important roles: The cooperation among the agents is more likely to be sustained if each agent has a greater chance to interact with others ( large) and favors cooperation at the beginning of the game ( large). Interestingly, even if they prefer to cooperate at the beginning ( large), when the agents are interacting less and cannot perceive the opinion state of others ( small), they decide to defect over time. The advantage of large is as for large for regular graphs, as predicted by Corollary 2.

The payoff difference between the two strategies depends on the total wealth and quantifies the incentive for the agents to defect. Consequently, the more wealth agents have, the higher incentive they receive to not contribute. This is illustrated in Fig. 3, where mutual defection (cooperation) is more (less) likely as increases.

## V Final Remarks

We have shown that the nonlinear opinion dynamics model of [7, 18] provides an analytically tractable framework for studying cooperative behavior in repeated multi-agent games, where agents rely on rationality and reciprocity, both of which are central to human decision-making. The opinion update depends on a saturated function of inter-agent opinion exchanges, which allows mutual cooperation (or coordination) to emerge as one of two bistable equilibria in two-strategy games. For the prisoner’s dilemma and multi-agent public goods game, mutual cooperation emerges when the attention to social interaction, and thus reciprocity, is sufficiently strong. The bistability provides a possible mathematical account for how reciprocity enables stable cooperative behavior, as observed in experimental studies, and a principled approach for tuning cooperative behavior.

Building on coupled opinion-attention dynamic analysis of [7, 18], we will design feedback dynamics for to reflect, for instance, agents’ growing appreciation of social interactions. This will allow opportunities to influence behavior, e.g., to elicit cooperation or coordination among agents. We will also leverage the versatility of the model to investigate games with more than two strategies and heterogeneity.

Proof of Theorem 1: i) The first statement is verified by comparing (4) and (9). For the second statement, by the definition of and we get Therefore, is a solution of (9) and hence . Thus, for all and is a solution to (4).

ii) If is an equilibrium of (4) then satisfies and hence is an equilibrium of (9). To prove the second statement, suppose is an equilibrium of (9). As in the proof for i), we can establish that for defined as in the statement. Thus, and is an equilibrium of (4). The stability of the equilibria follows from i).

Proof of Theorem 2: i) When , the neutral state is always an equilibrium of (10). The Jacobian of the linearization of (10) at is

 J(0)=−d((1−uS′(0)α)I−uγS′(0)A−14η−1p^A) (15)

and its eigenvalues take the form where is an eigenvalue of the matrix . By [19], we can derive that , and for any . Hence, there exists a critical value for which if , all eigenvalues of (15) have negative real part, and if , is positive, real, and simple. By Lyapunov-Schmidt reduction [20], the one-dimensional dynamics projected onto span of are

 ˙zc=−2d⟨wmax,~v⟩z3c+dS′(0)⟨wmax,(αI+γA)vmax⟩×~uzc+⟨wmax,14dp⊥^A1+d(b1−b2)1⟩+h.o.t. (16)

where and . By the recognition problem [20, Chapter II, Proposition 9.2], (16) describes an unfolding of the pitchfork bifurcation. The last statement follows by implicit differentiation of .

ii) By the Perron-Frobenius theorem, and have all same-sign entries. The rest follows from part i) and the center manifold theorem.

iii) By the assumptions on (), ,