# Learning and Selfconfirming Equilibria in Network Games

Consider a set of agents who play a network game repeatedly. Agents may not know the network. They may even be unaware that they are interacting with other agents in a network. Possibly, they just understand that their payoffs depend on an unknown state that in reality is an aggregate of the actions of their neighbors. Each time, every agent chooses an action that maximizes her subjective expected payoff and then updates her beliefs according to what she observes. In particular, we assume that each agent only observes her realized payoff. A steady state of such dynamic is a selfconfirming equilibrium given the assumed feedback. We characterize the structure of the set of selfconfirming equilibria in network games and we relate selfconfirming and Nash equilibria. Thus, we provide conditions on the network under which the Nash equilibrium concept has a learning foundation, despite the fact that agents may have incomplete information. In particular, we show that the choice of being active or inactive in a network is crucial to determine whether agents can make correct inferences about the payoff state and hence play the best reply to the truth in a selfconfirming equilibrium. We also study learning dynamics and show how agents can get stuck in non--Nash selfconfirming equilibria. In such dynamics, the set of inactive agents can only increase in time, because once an agent finds it optimal to be inactive, she gets no feedback about the payoff state, hence she does not change her beliefs and remains inactive.

There are no comments yet.

## Authors

• 1 publication
• 1 publication
• 2 publications
• ### The Nash Equilibrium with Inertia in Population Games

In the traditional game-theoretic set up, where agents select actions an...
10/01/2019 ∙ by Basilio Gentile, et al. ∙ 0

• ### Dynamic population games

In this paper, we define a new class of dynamic games played in large po...
04/29/2021 ∙ by Ezzat Elokda, et al. ∙ 0

• ### Bayesian Social Learning in a Dynamic Environment

Bayesian agents learn about a moving target, such as a commodity price, ...
01/06/2018 ∙ by Krishna Dasaratha, et al. ∙ 0

• ### Calibration of Shared Equilibria in General Sum Partially Observable Markov Games

Training multi-agent systems (MAS) to achieve realistic equilibria gives...
06/23/2020 ∙ by Nelson Vadori, et al. ∙ 0

• ### Decentralized Inertial Best-Response with Voluntary and Limited Communication in Random Communication Networks

Multiple autonomous agents interact over a random communication network ...
06/13/2021 ∙ by Sarper Aydın, et al. ∙ 0

• ### A Bayesian Framework for Nash Equilibrium Inference in Human-Robot Parallel Play

We consider shared workspace scenarios with humans and robots acting to ...
06/10/2020 ∙ by Shray Bansal, et al. ∙ 44

• ### Inducing Equilibria in Networked Public Goods Games through Network Structure Modification

Networked public goods games model scenarios in which self-interested ag...
02/25/2020 ∙ by David Kempe, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Imagine an online social network, like Twitter, with many users. Let us consider a simultaneous-moves game, in which each user decides her level of activity in the social network. The payoff that agents get from their activity depends on the social interaction. In particular, active user receives idiosyncratic externalities, that can be positive and negative, from the other users with whom she is in contact in the social network. The externality from user to user is proportional to the time that they both spend on the social network, and . Sticking to a quadratic specification, that allows for linear best replies, let us assume that the payoff of from this game is111This is the class of games originally analyzed by Ballester et al. (2006). Bramoullé et al. (2014) is one of the more recent papers providing results for such linear-qadratic network games, and they discuss also how to generalize to games that have the same best–reply functions. Zenou (2016) surveys many applications.

 ui(ai,a−i)=αiai−12a2i+∑j∈I∖{i}zijaiaj. (1)

In eq. (1), is the set of agents in the social network and is the level of activity of , while represents the individual pleasure of from being active on the social network in isolation, which results in the bliss point of activity in autarchy. Parameter can also be negative, and in this case would not be active in isolation. For each , there is some exogenous level of externality from to denoted by . We say that affects , or that is a peer of , if .

Later on, in this paper, we will also consider an extra global term in the payoff function

 ui(ai,a−i)=αai−12a2i+∑j∈I∖{i}zijaiaj+β∑k∈j∈I∖{i}ak. (2)

We can interpret this extra term as an additional pleasure that gets from being member (even if not active) of an online social network that is overall popular.

In this paper, the network described by the matrix of all the ’s is exogenous. As a first approximation, this fits a directed online social network like Twitter or Instagram, where users cannot decide who follows them. Under this interpretation, receives positive or negative externalities from those who follow her, that are proportional to her activity. acquires popularity from being active or not in the social network. Payoff represents what can indirectly observe about her own popularity (i.e. likes that she receives, people congratulating with her in real world conversations, and so on…). We imagine that cannot choose the style of what she writes, since she just follows her exogenous nature. In this interpretation, represents the amount of tweets that writes, and this can make her more or less popular for those who follow her, according to how her style combines with the (typically unobserved) tastes of each of her followers.

Since we are going to analyze learning dynamics and their steady states, we also have to specify what agents observe after their choices, because this affects how they update their beliefs. Twitter user

typically observes perfectly her own activity level

, but she may not observe the sign of the externalities and the activity of others. However, she gets indirect measures of her level of popularity that come from her conversations and experiences in the real world, where her popularity from Twitter affects her social and professional real life. Players of this game may have wrong beliefs about the details of the game they are playing (e.g. the structure of the network, or the value of the parameters) and about the actions of other players. With this, they update their beliefs in response to the feedback they receive, which will be their (possibly indirectly measured) payoff. This updating process may lead to a learning dynamic that does not converge to a Nash equilibrium of the game.

In this paper we address the following question: Assuming simple updating rules, under what circumstances do learning dynamics converge to a Nash equilibrium of the game and when, instead, do they just converge to a selfconfirming equilibrium where agents best reply to confirmed but possibly wrong beliefs? This question is per se interesting, and with our answers we provide novel theoretical tools for the analysis of network games. However, the application of the model to online social networks that we just anticipated can also help in understanding why we may easily observe apparently non–optimal best responses by economic agents in such an environment, such as agents who get stuck into “inactivity traps.”

Section 2 presents our baseline model. For this setting, we characterize the set of selfconfirming equilibria in Section 3, and we study the learning process in Section 4. In Section 5 we analyze a more general model that accounts for global externalities. Section 6 concludes. We devote appendices to proofs and technical results. Appendix A analyzes properties of feedback and selfconfirming equilibria in a class of games including as special cases the network games that we consider. Appendix B reports existing results in linear algebra, that we use to find sufficient conditions for reaching interior Nash equilibria in network games. Appendix C contains the proofs of our propositions.

## 2 The framework

Consider a set of agents, with cardinality and generic element , located in a network. Let the network be characterized by an adjacency matrix , where entry specifies whether agent is linked to agent and the weight of this link, and we let by convention. In what follows we consider the case of directed networks, so that, given , we allow , and . Externality weights are an unknown parameter of the model. We assume that there are commonly known upper and lower bounds and in the weighted externalities, that can be positive or negative, between players. We let denote the compact set of possible weighted networks . The network game is parametrized by .

Throughout the paper we will play with different properties and specifications of matrix . To simplify the notation we will often decompose it in a way that distinguishes between the actual links, that specify if there is an externality between two players, and the magnitude and the sign of this externality. We call the basic underlying representation of the network, the adjacency matrix whose element specifies whether the action of has an externality on . We think of it as a link from to because is one of ’s peers. is a directed network.

On top of that we build adding weights on the links of . This can be done in several ways, depending on how much heterogeneity we want to allow for. We will write when all links bear the same level of externality . We will write , where is a diagonal matrix, when we want to specify that each player is affected by the same weight from all her peers, but these ’s are heterogeneous. We will also consider the case in which the existing links may have weights of different signs but the same intensity. That is, we write (in which the operator is the Hadamard product), for , and . Finally, when we write simply we consider the case of a directed weighted network . Many of our results will hold for this most general case.

Each agent chooses an action from interval , where the upper bound is “sufficiently large”.222Note that in the network literature it is common to assume . However, for the games we consider, we can always find an upper bound on actions such that the problem is unchanged when actions are bounded above by . For each , denotes the set of feasible action profiles for players different from . Similarly, defining as the set of the neighbors of a given agent , denotes the set of feasible action profiles of ’s neighbors.

For each , we posit a set (interval) of payoff states for , with the interpretation that ’s payoff is determined by her action and by her payoff state according to a continuous utility function . The payoff state is in turn determined by the actions of ’s neighbors and is unknown to at the time of his choice. For each agent and matrix , we consider a parametrized aggregator of the coplayers’ actions of the following form: is continuous, its range is connected, and for each , the section of at is333In principle we can allow for non–linear aggregators, as in Feri and Pin (2017). However, in this paper, we focus on the linear case.

 ℓi,Z: A−i → Xi, a−i ↦ ∑j≠izijaj.

Note, since is the codomain of , we are effectively assuming that, for every ,

 x––i≤∑j∈N−izij¯aj, ¯xi≥∑j∈N+izij¯aj,

where denote the set of neighbors of player that have a negative effect on the payoff state of . Similarly, denotes the set of neighbors of player that have a positive effect on the payoff state of .

The overall payoff function that associates each action profile with a payoff for agent is thus parametrized by the adjacency matrix :

 ui: Ai×A−i×Θ → R, (ai,a−i,Z) ↦ vi(ai,ℓi(a−i,Z)).
(3)

We assume that each agent knows how her payoff depends on her action and her payoff state, that is, we assume that knows function , but we do not assume that knows . Actually, from the perspective of our analysis, agent might even ignore that the payoff state aggregates her neighbors’ activities according to some weighted network structure, because we are not modeling how  reasons strategically.444If the parametrized payoff functions and the parameter space are common knowledge, strategic reasoning according to the epistemic assumptions of rationality and common belief in rationality can be captured by a simple incomplete-information version of the rationalizability concept. See, e.g., Chapter 7 of Battigalli (2018) and the references therein. If is strictly quasi–concave for each , there is a unique best reply to each payoff state . Although the aggregator is linear, if this “proximate” best reply function is non-linear,555More precisely, not affine. then also the best reply is non-linear in . Linearity obtains if and only if is quadratic in and linear in . Without substantial loss of generality, among such utility functions we consider the following form, generalizing equation (1) that we discussed earlier:

 vi: Ai×Xi → R, (ai,xi) ↦ αiai−12a2i+aixi.
(4)

Note that in eq. (4) is continuous and strictly concave in . Thus, , with defined by eqs. (3)-(4), is a parametrized nice game (see Moulin 1984 for a definition of nice game, and Appendix A for a generalization, with results for non-linear-quadratic network games).

We assume that the game is repeatedly played by agents maximizing their instantaneous payoff. After each play agents get some feedback. Let be an abstract set of “messages” (e.g., monetary outcomes). The information obtained by agent at the end of each period is described by a feedback function . Assuming that knows how her feedback is determined by the payoff state given her action, if she receives message after action she infers that the state belongs to the “ex post information set”

 f−1i,ai(m):={x′i∈Xi:fi(ai,x′i)=m}.

This completes the description of the object of our analysis. The structure

 NG=⟨I,Θ,(Ai,Xi,vi,ℓi,fi)i∈I⟩

is a (parameterized) network game with feedback, or simply network game. Our analysis depends on assumptions about the payoff functions and the feedback functions. Here we present the strongest assumptions, the Appendix contains a more general analysis.

###### Definition 1.

A network game with feedback is linear-quadratic if the utility function of each player has the linear-quadratic form (4).

In this case, the proximate best-reply function is

 (5)

Even if agent may play a best reply to the aggregate , it is possible to write the derived best reply to the actions of others as

 (6)
###### Definition 2.

Feedback satisfies observability if and only if player is active (OiffA) if section is injective for each and constant for ; satisfies just observable payoffs (JOP) relative to if there is a function such that

 ∀(ai,xi)∈Ai×Xi, vi(ai,xi)=¯vi(ai,fi(ai,xi))

and the section is injective for each . A network game with feedback satisfies observability by active players if feedback satisfies OiffA, for each player , and it satisfies just observable payoffs if satisfies JOP for each player .

In a game with just observable payoffs, because of injectivity of the feedback function, agents infer their realized payoff from the message they get, but no more than that, that is, inferences about the payoff state can be obtained by looking at the preimages of the payoff function. For example, the feedback could be a total benefit, or revenue function

 fi: Ai×Xi → R, (ai,xi) ↦ αiai+aixi,

with the payoff given by the difference between benefit and activity cost :

 vi: Ai×Xi → R, (ai,xi) ↦ fi(ai,xi)−Ci(ai).

Under the reasonable assumption that agent knows her cost function, when she chooses and then gets message , she infers that her payoff is . Thus, each section () is indeed injective. If the feedback/benefit function is , then it satisifes observability if and only if is active.

###### Remark 1.

If is linear-quadratic and satisfies just observable payoffs, then it satisfies observability by active players. If satisfies observability by active players, then

 (7)

for every agent and action-state pair .

Most of our analysis focuses on linear-quadratic network games with just observable payoffs. This implies that agents who are active get as feedback a message enabling them to perfectly determine the state. Conversely, inactive agents get a completely uninformative message.

To choose an action, subjectively rational agents must have some deterministic or probabilistic conjecture about the payoff state . We refer to conjectures about the state as shallow conjectures, as opposed to deep conjectures, which concern the specific network topology and the actions of other players (). In linear-quadratic network games (more generally, in nice games with feedback), it is sufficient to focus on deterministic shallow conjectures. Indeed, for every probabilistic conjecture , there exists a deterministic conjecture that justifies the same action as the unique best reply (see the discussion in A.1).

### 2.1 Selfconfirming equilibrium

We analyze a notion of equilibrium which is broader than Nash equilibrium. Recall that our approach allows for the possibility of agents who are unaware of the full game around them. In equilibrium, agents best respond to conjectures consistent with the feedback that they receive, which is not necessarily fully revealing. We believe that this approach fits well to a networked environment where agents’ knowledge and the information they receive are only local.666In a context of endogenous strategic network formation, McBride (2006) applies the conjectural equilibrium concept, which is essentially the same as selfconfirming equilibrium for games with feedback (see Battigalli et al. (1992) and the discussions in Battigalli et al. 2015). More recently, also Lipnowski and Sadler (2017) and Frick et al. (2018) have adopted self–confirming equilibrium notions to describe network games. Their assumptions and their results are different and independent from ours.

###### Definition 3.

A profile of actions and (shallow) deterministic conjectures is a selfconfirming equilibrium (SCE) at if, for each ,

1. (subjective rationality) ,

2. (confirmed conjecture) .

The two conditions require that 1) each agent best responds to her own conjectures; 2) the conjectures in equilibrium must belong to the ex-post information set so that the expected feedback coincides with the actual feedback at . We say that is a  selfconfirming action profile at if there exists a corresponding profile of conjectures such that is a selfconfirming equilibrium at , and we let denote the set of such profiles. Also, for any adjacency matrix , we denote by the set of (pure) Nash equilibria of the (nice) game determined by , that is,

 ANEZ:={a∗∈×i∈IAi:∀i∈I,a∗i=ri(ℓi(a∗−i,Z))}.

Nice games satisfy all the standard assumptions for the existence of Nash equilibria.777Since the self-map is continuous on the convex and compact set , by Brouwer’s Theorem it has a fixed point. Hence, we obtain the existence of selfconfirming equilibria for each . Indeed a Nash equilibrium is a selfconfirming equilibrium with correct conjectures. To summarize:

###### Remark 2.

For every , there is at least one Nash equilibrium, and every Nash equilibrium is a selfconfirming profile of actions:

 ∀ Z∈Θ, ∅≠ANEZ⊆ASCEZ.

## 3 A characterization of SCE

In this section we characterize the set of selfconfirming equilibrium profiles of actions in linear-quadratic network games with just observable payoffs. All our proofs are derived from the results in Appendix A and Appendix B, which refer to the case of generic network games without the restriction to linear best replies, and are stated in Appendix C. We start with the simplest case in which every agent necessarily finds it subjectively optimal to be active (that is, being inactive is dominated – see Lemma A in Appendix A).

###### Proposition 1.

Consider a network game satisfying observability by active players. Assume that, for every and for every , . Then, for each , .

Assume that (from eqs (4) and (5)) is such that . Assume further that , with and that . This represents the standard case of local complementarities studied by Ballester et al. (2006). If there is a unique Nash equilibrium which is also interior. Our proposition states that, in this case, if being inactive is not justifiable as a best reply to any shallow conjecture, then there is only one selfconfirming equilibrium action profile, which necessarily coincides with the unique Nash equilibrium.

We now consider a more general case in which agents may be inactive. Let denote the set of players for whom being inactive is justifiable. Note that, by Lemma A in Appendix A,

 I0={i∈I:minri(Xi)=0}.

Also, for each and non–empty subset of players , let denote the set of Nash equilibria of the auxiliary game with player set obtained by imposing for each , that is,

 ANEJ,Z={a∗J∈×j∈JAj:∀j∈J,a∗j=rj(ℓj(a∗J∖{j},0I∖J,Z))},

where is the profile that assigns to each . If , let by convention, where is the peudo-action profile such that .888As we do in set theory with the empty set, when we consider functions whose domain is a subset of some index set , it is convenient to have a symbol for the pseudo-function with empty domain. For example, if , such functions are (finite and countably infinite) sequences, or subsequences, and is the empty sequence. We relate the set of selfconfirming equilibria to the sets of Nash equilibria of such auxiliary games.

###### Proposition 2.

Suppose that network game with feedback is linear-quadratic and satisfies just observable payoffs. Then, for each , the set of selfconfirming action profiles is

 ASCEZ=⋃I∖J⊆I0ANEJ,Z×{0I∖J},

that is, in each SCE profile , a subset of players for whom being inactive is justifiable choose , and every other player chooses the best reply to the actions of her coplayers. Therefore, in each SCE profile and for each player ,

 a∗i = 0⇒x––i≤−αi, a∗i > 0⇒(αi+∑j∈Izija∗j>0 ∧a∗i=min{¯ai,αi+∑j∈Izija∗j}). (8)

In every SCE we can partition the set of agents in two subsets. Agents in are active, i.e., they choose a strictly positive action, agents in instead choose the null action. Start considering the latter. Since they play , they get null payoff independently of others’ actions. But, since every conjecture is consistent with this payoff, their conjecture is (trivially) consistent with their feedback. As for agents in , since they choose a strictly positive action , they receive a message that enables them to infer the true payoff state ; with this, they necessarily choose the objective best reply to their neighbours actions, whether or not they are aware of them. Note that, if being inactive is justifiable for every agent (), then for every .

This implies that the set of selfconfirming equilibria can be characterized by means of the sets of Nash equilibria of the auxiliary games in which only active agents are considered. If, for example, there is a unique interior Nash equilibrium for the auxiliary game corresponding to every subset of active players, then , that is, there are exactly SCE action profiles. A.3 discusses the equilibrium characterization for the generalized case of non linear-quadratic network games.

###### Example 1.

Consider Figure 1, representing a network between 4 nodes. We set for each player . Let us first assume that each arrow represents a positive externality of (and arrows point to the source of the externality). In this case we have one NEs, but 16 possible SCEs, one for each subset of the players that we allow to be active. Table 1 reports the action of players in each case (we omit redundant pairs and singletons). Note that player 3, when active, always plays the same action , because she is not affected by any externality. Other players, instead, play differently when active, according to who else is active.

Consider now the same network, but assume that each arrow represents a negative externality of . In this case we have more NEs (there is not a NE where all players are active, but there are 3 NEs), but less than 16 SCEs (there are 13), because for some subset of players (such as ) there is no SCE in which all its elements are active. Table 2 reports the actions of players in each case (we omit redundant pairs and singletons).∎

This simple example shows that moving from a case of full complementarity to a case of full substitability, we may increase the number of Nash Equilibria and decrease the number of SCEs. However, even in the limiting case where substitution effects are extremely strong, the two sets of equilibria will not coincide, because the strategy profile in which everyone is inactive will be a an SCE but not an NE.

### 3.1 Assumptions about the network

Next, we focus on the network . We list below some properties of matrix that are not maintained assumptions. In different parts of the paper we will use some of these assumptions to have sufficient conditions for the existence and stability of selfconfirming equilibria. We refer to Appendix B for a deeper discussion on these assumptions and their implications.

###### Assumption 1.

Matrix of size has bounded values, i.e. for all and .

###### Assumption 2.

Matrix has the same sign property i.e., for every , , where the function can have values , or .999The sign condition is the one used in Bervoets et al. (2016) to prove convergence to Nash equilibria in network games, under a particular form of learning.

###### Assumption 3.

Matrix is negative, i.e. for all and ,

We recall here that the spectral radius of

is the largest absolute value of its eigenvalues.

###### Assumption 4.

Matrix is limited, i.e. .

In Section 2 we discussed how, in some cases, we can write as , where is a diagonal matrix, and is the basic underlying representation of the network. When this is possible, matrix represents a basic network combined with an additional idiosyncratic effect by which every agent weights the effects of the others on her. This effect is modeled by the parameter .101010Then the payoff of at a given profile of the original game is

The next assumption adds an additional condition on .

###### Assumption 5.

Matrix is symmetrizable, i.e. it can be written as , with diagonal and symmetric. Moreover, has all positive entries in the diagonal.

Note that if is symmetrizable then all its eigenvalues are real. Moreover, since has all positive entries, Assumption 5 implies the sign condition from Assumption 2.
Our final assumption is discussed in Bramoullé et al. (2014) and combines Assumptions 4 and 5 above.

###### Assumption 6.

is symmetrizable-limited, i.e. is symmetrizable and, for every , , is limited.

Our previous results from Section 3, about the characterization of selfconfirming equilibria, state that we can choose any subset of agents and have them inactive in a SCE. However we cannot ensure that the other agents are active, because their best response in the reduced game could be null. The next result goes in the direction of specifying under which sufficient conditions this does not happen. Given the matrix , and given , we call the submatrix who has only rows and columns corresponding to the elements of .

###### Proposition 3.

Consider a set . Let us assume that satisfies at least one of the three conditions below:

1. it has bounded values (Assumption 1),

2. it is negative and limited (Assumptions 3 and 4),

3. or it is symmetrizable–limited (Assumption 6).

Then, we have the two following results:

1. , such that ;

2. There exists such that

Proposition 3 provides sufficient conditions to have an arbitrary set of active and inactive players in a selfconfirming equilibrium. In this case the set of selfconfirming equilibria has cardinality equal to the cardinality of the power set , that is .

We provide here below two examples, one with all positive externalities, the other with mixed externalities.

###### Example 2.

Consider players, and a randomly generated network between them, of the type , generated by the following generating process. is undirected, generated by an Erdos and Rényi (1960) process for which each link is i.i.d., and such that its expected number of overall links (i.e., counted in both directions) is , for some . This means that the expected number of links for each player is . It is well known that this model predicts, as goes to infinity, that will have no clustering and, when , a connected giant component.

is a diagonal matrix, such that each element in the diagonal is positive and is generated by some i.i.d. random process with mean

and variance

.
In this case, Füredi and Komlós (1981) prove that the expected highest eigenvalue of , as grows, is

 E(λi)=kμ+σ2μ+O(1√n) .

From Proposition 3, under Assumption 6, as tends to infinity, is symmetrizable–limited if , which implies that

 μ−σ2μ2>k .

Clearly, a necessary condition for previous inequality to hold is that .
When this happens, as grows to infinity, we will always have a unique NE of the game where all players are active.
Note that this limiting result excludes the possibility (because the expected clustering of goes to ) that there is a subset of players, that have a dense sub–network between them, and a high realization of ’s, such that there does not exist , for which In fact, if this was the case, because of only positive externalities, we would not even have an all active equilibrium for the whole population of agents. ∎

###### Example 3.

Proposition 3 provides alternative conditions, that are only sufficient, for interior NE in an auxiliary game in which only agents in are considered. Figure 2 provides an example of game that do not satisfy any of them, but still has a unique interior NE. We set for each player . Every blue arrow stands for a positive externality of (so, the blue arrows represent just the first case from Example 1). The two red arrows stand for a negative externality of . This network game has a unique NE, and 16 SCE. Table 3 shows them all (redundant couples and singletons are omitted).

## 4 Learning process

We have not considered any dynamics yet. Definition 3 of selfconfirming equilibrium, characterized also by the conditions stated in Proposition 2, identifies steady states: If agents happen to have selfconfirming conjectures and play accordingly, then they have no reason to move away from it. However we may wonder how agents get to play SCE action profiles, and if these profiles are stable.

We first notice that SCE has solid learning foundations.111111See, for example, Battigalli et al. (1992), Battigalli and Marinacci (2016), Fudenberg and Kreps (1995), and the references therein. The following result is specifically relevant for this paper (see Gilli (1999) and Chapter 6 of Battigalli (2018)). Consider a sequence in time of action profiles, given by . Then, if  is consistent with adaptive learning121212In a finite game, a trajectory is consistent with adaptive learning if for every , there exists some such that, for every and , is a best reply to some deep conjecture

that assigns probability

to the set of action profiles consistent with the feedback received from through .  The definition for compact, continuous games is a bit more complex (cf Milgrom and Roberts (1991)), who assume perfect feedback). and , it follows that  must be a selfconfirming equilibrium action profile.

Of course, the limit of the trajectory may or may not be a Nash equilibrium. Let us now consider a best response dynamics. This generates trajectories that—by construction—are consistent with adaptive learning. With this, we prove convergence (under reasonable assumptions), hence convergence to an SCE.

To ease the analysis we consider best reply dynamics for shallow conjectures. For each period and each agent , is the best reply to . After actions are chosen, given the feedback received, agents update their conjectures. If conjectures are confirmed then an agent keeps past conjecture, otherwise she updates using as new conjecture the conjecture that would have been correct in the past period. In details,

 ^xi,t+1={^xi,tif ai,t=0,ℓi(a−i,t,Z)if a∗i,t>0, (9)

and, from (5) (considering that the upper bound is set so that it is never reached) we have simply

Coherently with the previous analysis, this update rule states that if an agent at time is inactive (), past conjectures are confirmed and thus kept. If instead the agent is active (), feedback is such that agents can perfectly infer the payoff state , and so they update conjectures according to (9). This is one possible adaptive learning dynamics. The result cited above implies that if the dynamics described above converges, then it must converge to a selfconfirming equilibrium, i.e., a rest point where players keep repeating their choices.

In this section we analyze the stability of such rest points in the simplest possible case of robustness to small perturbations, as in Bramoullé and Kranton (2007). However, we will not consider perturbations to the strategy profile, but perturbations on the profile of conjectures.

###### Definition 4 (Learning process).

Each player starts at time

with a belief, and beliefs are represented by a vector of shallow deterministic conjectures

. In each period   players best reply to their conjectures: for each , .
At the beginning of each period each player keeps his -period shallow conjecture if he was inactive, and updates his conjecture to period- revealed payoff state if he was active, that is, .

Even if we consider the case of linear best replies, from equations (8) and (9), the system is not linear because

 ^xi,t+1={^xi,t if ^xi,t≤−αi ,∑j∈Izijaj,t if ^xi,t>−αi ,

and for every other player , we have that .

Clearly an SCE of the game, as defined in the beginning of Section 3, is always a rest point of this learning dynamic. We now consider the stability of such rest points . Say that a profile of conjectures is consistent with if for every .

###### Definition 5.

A selfconfirming action profile is locally stable if there are a profile of conjectures and consistent with such that the learning dynamics starting from any with converges back to .

### 4.1 Results

Each SCE is characterized by a set of active agents. So, given a strategy profile , let denote the set of active players. With this, for each action profile , denotes the submatrix with rows and columns corrsponding to players who are active in . This allows us to characterize locally stable selfconfirming equilibria.

###### Proposition 4.

Consider . is locally stable if

• Assumption 4 holds for matrix ;

• for some consistent with and every , .

Intuitively, consider a sufficiently small perturbation of players’ conjectures. The first condition ensure that active players keep being active and their actions converge back to the Nash equilbirium of the auxiliary game with player set . The second condition ensures that inactive players keep being inactive. Next, we provide alternative sufficient conditions that allow to characterize the subsets of active agents associated to SCEs.

###### Proposition 5.

Consider a selfconfirming strategy profile . If satisfies at least one of the three conditions below:

1. it has bounded values (Assumption 1),

2. it is negative and limited (Assumptions 3 and 4),

3. or it is symmetrizable–limited (Assumption 6),

then is locally stable and, for every , there exists a locally stable selfconfirming equilibrium such that

1. , with ;

The proof is based on results from linear algebra. In fact, if an adjacency matrix satisfies one of the conditions from Proposition 5, then also every submatrix of that matrix satisfies that property.

We know that there may be SCEs that are not Nash equilibria, because some agents are inactive even if this is not a best response to the actions of the others. Proposition 5 tells us two additional things. Under the stated conditions, for any given SCE with set of active agents , any subset of those agents is associated to a stable SCE where all agents in are active, and the other agents are inactive. Second, since the empty subset of agents is trivially associated to the stable SCE where every agent is inactive, for every network game there is always a subset of agents associated to a stable SCE where all and only the agents in are active.

### 4.2 Examples and discussion

The following example shows that we can reach SCEs that are not NE also if the initial beliefs induce all positive actions at the beginning of the learning dynamic.

###### Example 4.

Consider the case of players, with the network matrix shown in Figure 2, and, for every , . This is a case of general externalities, that can be positive or negative. Figure 3 shows the learning dynamics of actions and beliefs that start from different initial conditions. In one case (left panels) we converge to the unique Nash equilibrium of this game (the dotted lines), in the other (right panels) the learning dynamics put, after 2 rounds, one player out from the active agents, and the remaining 3 converge to a selfconfirming equilibrium which is not Nash. ∎

The next example (which does also not satisfy the local stability conditions of Proposition 5) shows that convergence may not occur even in a simple case of positive externalities.

###### Example 5.

Now consider again the network from Example 1 (Figure 1), with 4 nodes. Even if there are only positive externalities, the magnitude of may imply convergence or not. If , there is convergence. If instead there is divergence. Figure 4 shows two cases, with and respectively, starting from the same initial beliefs. Note that, nodes/players 1 and 4 reinforce each other, and this gives rise to an oscillating behavior of their beliefs.∎

Our notion of stability with respect to conjectures relates to the standard notion of stability with respect to actions in the following way. First of all, since played actions are justified by some conjectures, the only reason for these actions to change is a perturbation of the surrounding conjectures, but this is not a sufficient condition. If all agents are active, the two definitions have the same the consequences in terms of stability, since a perturbation with respect to actions happens if and only if every agent’s conjecture is perturbed. However, if a selfconfirming equilibrium has inactive agents, then those inactive agents who play a corner solution do not show perturbation in actions when their conjectures are perturbed. This implies that if an action profile is stable with respect to actions perturbations, then it is also stable under conjectures perturbations, but the converse does not hold.

## 5 Local and Global externalities

As anticipated when discussing Eq. (2), we consider now an extension to the case of equation (4), in which we add a global externality term with no strategic effects. For each , we posit an interval , a coefficient , and we consider the following aggregator:131313This aggregator sums up the actions of all the agents in the network except agent . We could have considered agent as well, but we opted for this specification so as not to change the first order condition with respect to the case with just local externalities.

We assume that every agent knows . Then, we let and we maintain the assumption that . The new parametrized utility function is

 (10)

where both and are unknown. The general form of the feedback function is

 fi:Ai×Xi×Yi→M.

Deterministic shallow conjectures for each are now determined by the pair . We provide now the definition of selfconfirming equilibrium for games with global externalities.

###### Definition 6.

A profile of actions and (shallow) deterministic conjectures is a selfconfirming equilibrium at and of a linear quadratic network game with feedback and global externalities if, for each ,

1. (subjective rationality) ,

2. (confirmed conjecture) .

Notice that the rationality condition is unchanged with respect to the case of only local externalities since best-reply conditions are not affected by the global externality term. To compare this game with the linear-quadratic network game with only local externalities, we consider the case of just observable payoffs. Then, without loss of generality we can assume that for every . With this, we can characterize the SCE set as follows:

###### Proposition 6.

Fix and . Every selfconfirming equilibrium profile of a linear-quadratic network game with global externalities and just observable payoffs is such that, for every ,

1. if , then , ;

2. if , then , .

We discuss how the presence of the global externality term in the utility function changes radically the characterization of selfconfirming equilibria. As before, we assume that players observe their own realized payoffs. Yet, when global externalities are present, observability by active players does not hold anymore. Inactive players have correct conjectures about the global externality, but may have correct or incorrect conjectures about the local externality. Active players, on the other hand, are not able to determine precisely the magnitude of the local effects with respect to the global effects. Given any strictly positive action , the confirmed conjectures condition yields . Then, in equilibrium, if agent overestimates (underestimates) the local externality, she must compensate this error by underestimating (overestimating) the global externality. Then, compared to the case of only local externalities, we have that: (i) active agents choose a best response to a (typically) wrong conjecture about ; thus, (ii) it is not possible to characterize SCE by means of Nash equilibria of the auxiliary games restricted to the active players.

We present now a simple example showing how wrong conjectures about local and global externalities may have a big effect on equilibrium actions.

###### Example 6.

Consider three agents in a line network. Let agent 2 be at the center of the line. Then, for every , is proportional to , always with the same ratio, while this is not true for agents 1 and 3. We assume that each agent thinks to be playing in a complete network, so every thinks that is always proportional to , with the same ratio. In this case agents 1 and 3 think to be more central than what they actually are. Table 4 provides the Nash equilibria for the actual network and for the complete network, and the selfconfirming equilibrium actions for the case described above.

Simulations show that if agents overestimate the impact of local externalities this generates a multiplier effect that makes equilibrium actions increase at a level even larger that what would be predicted in a complete network by Nash equilibrium. This is the result of how agents misinterpret their feedbacks. In details, thinking to be in a complete network makes agents 1 and 2 overestimate local externalities. Take for instance agent 1. Given any , she chooses a best reply higher than the Nash equilibrium one since she overestimates the local externality. This high action has the effect of increasing the global externality term for agent 3. Agent 3, by overestimating local externality, partly attributes this higher global externality to the local externality term, and chooses an action larger than predicted by Nash equilibrium. The choice of agent 3 increases in turns the global externality perceived by agent 1, and so on. At the same time agent 2, as neighbors choose higher actions, increases her own action level. This effect goes on and a multiplier effect seems to be at place. In the limit, selfconfirming equilibrium actions are almost ten times larger than the complete network Nash equilibrium. ∎

### 5.1 Learning with Global Externalities

We now consider the learning process that originates from an adaptive updating of conjectures, as we did for the case of only local externalities. For an easy reference, we rewrite here Eq. (2) as a payoff function that depends on players’ actions, with the time index and specifying and as functions of co-players’ actions:

 ui,t(ai,t,a−i,t)=αai,t−12a2i,t+ai,t∑j∈I∖{i}zijaj,txi,t+β∑