# Dynamics and Coalitions in Sequential Games

We consider N-player non-zero sum games played on finite trees (i.e., sequential games), in which the players have the right to repeatedly update their respective strategies (for instance, to improve the outcome wrt to the current strategy profile). This generates a dynamics in the game which may eventually stabilise to a Nash Equilibrium (as with Kukushkin's lazy improvement), and we argue that it is interesting to study the conditions that guarantee such a dynamics to terminate. We build on the works of Le Roux and Pauly who have studied extensively one such dynamics, namely the Lazy Improvement Dynamics. We extend these works by first defining a turn-based dynamics, proving that it terminates on subgame perfect equilibria, and showing that several variants do not terminate. Second, we define a variant of Kukushkin's lazy improvement where the players may now form coalitions to change strategies. We show how properties of the players' preferences on the outcomes affect the termination of this dynamics, and we thereby characterise classes of games where it always terminates (in particular two-player games).

## Authors

• 13 publications
• 6 publications
• 2 publications
• 11 publications
• ### Dynamics on Games: Simulation-Based Techniques and Applications to Routing

We consider multi-player games played on graphs, in which the players ai...
09/30/2019 ∙ by Thomas Brihaye, et al. ∙ 0

• ### On the robustness of learning in games with stochastically perturbed payoff observations

Motivated by the scarcity of accurate payoff feedback in practical appli...
12/20/2014 ∙ by Mario Bravo, et al. ∙ 0

• ### Efficiency of equilibria in random binary games

We consider normal-form games with n players and two strategies for each...
07/15/2020 ∙ by Matteo Quattropani, et al. ∙ 0

• ### Best response dynamics on random graphs

We consider evolutionary games on a population whose underlying topology...
11/25/2020 ∙ by Jordan Chellig, et al. ∙ 0

Motivated by applications of multi-agent learning in noisy environments,...
07/14/2020 ∙ by Sarah H. Q. Li, et al. ∙ 0

• ### Strategic Prediction with Latent Aggregative Games

We introduce a new class of context dependent, incomplete information ga...
05/29/2019 ∙ by Vikas K. Garg, et al. ∙ 0

• ### Learning to Correlate in Multi-Player General-Sum Sequential Games

In the context of multi-player, general-sum games, there is an increasin...
10/14/2019 ∙ by Andrea Celli, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Since the seminal works of Morgenstern and von Neuman in the forties [vNM44]

, game theory has emerged as a prominent paradigm to model the behaviour of rational and selfish agents acting in a competitive setting. The first and main application of game theory is to be found in the domain of economics where the agents can model companies or investors who are competing for profits, to gain access to market, etc. Since then, game theory has evolved into a fully developed mathematical theory and has recently found many applications in computer science. In this setting, the agents usually model different components of a computer system (and of its environment), that have their own objective to fulfil. For example, game theory has been applied to analyse peer-to-peer file transfer protocols

[Nisan] where the agents want to maximise the amount of downloaded data in order to obtain, say, a given file; while minimising the amount of uploaded data to save bandwidth. Another application is that of controller synthesis where the two, fully antagonistic, agents are the system and its environment, and where the game objective models the control objective.

The most basic model to describe the interaction between the players is that of games in strategic form (aka matrix games), where all the players chose simultaneously an action (or strategy) from a finite set, and where they all obtain a payoff (usually modelled by a real value) which depends on their joint choice of actions (or strategy profile). Strategic games are one-shot games, in the sense that the players play only one action, simultaneously. An alternative model where players play in turn is that of sequential games, which we consider in this work. Such a game is played on a finite tree whose inner nodes are labelled by players, whose edges correspond to the possible actions of the players, and whose leaves are associated with outcomes. The game starts at the root, and, at each step of the game, the player who owns the current node picks an edge (i.e. an action) from this node, and the game moves to the destination node of the edge. The game ends when a leaf is reached, and the players obtain the outcome associated with that leaf. Moreover, each player has a preference over the possible outcomes, and each player’s aim is thus to obtain the best outcome according to this preference relation.

Arguably the most natural question about these games is to determine how rational and selfish players would act (assuming that they have full knowledge of the other players’s possible actions and of the outcomes). Several answers to this question have been provided in the literature, such as the famous notion of Nash Equilibrium [Nash50], which is a situation (strategy profile) in which no player has an incentive to change his choice of action alone (because such a choice would not be profitable to him). Apart from Nash equilibria, other solution concepts have been proposed like Subgame Perfect Equilibria. This traditional view on game theory can be qualified as static, in the sense that all players chose their strategies (thereby forming a strategy profile that can qualify as one of the equilibria listed above), and the game is then played once according to these strategies. It is, however, also natural to consider a more dynamic setting, in which the players can play the game repeatedly, updating their respective strategies after each play, in order to try and improve the outcome at the next play.

#### Contribution

In this paper, we continue a series of works that aim at connecting these static and dynamic views. That is, we want to study (in the case of extensive form games) the long term behaviour of the dynamics in which players update repeatedly their strategies and characterise when such dynamics converge, and to what kind of strategy profiles (i.e., do these stable profiles correspond to some notions of equilibria?). Obviously, this will depend on how the players update their strategies between two plays of the game. Our results consist in identifying minimal conditions on the updates (modelling potential rational behaviours of the players) for which we can guarantee convergence to some form of equilibria after a bounded number of updates. Our work is an extension of a previous paper by Le Roux and Pauly [LeRouxDynamics], where they study extensively the so-called Lazy Improvement Dynamics. Intuitively, in this dynamics, a single player can update his strategy at a time, in order to improve his outcome, and only in nodes that have an influence on the final outcome (lazy update). Their main result is that this dynamics terminate on Nash equilibria when the preferences of the players are acyclic. Our contribution consists in considering a broader family of dynamics and characterising their termination. More precisely:

• We start (Section 3) by considering all dynamics that respect the subgame improvement property, where players update their strategies only if this yields a better outcome in all subgames that are rooted in a node where a change has been performed. We argue that this can be regarded as a rational behaviour of the players: improving in the subgames can be regarded as an incentive, even when this does not improve the global outcome of the game. Indeed, such an improvement can turn out to be profitable if one of the other players later deviates from its current choice (this is the same intuition as the one behind the notion of Subgame Perfect Equilibrium). Note that such dynamics have not been considered at all in [LeRouxDynamics]. We show that, in all games where the preferences of the players are acyclic, these dynamics terminate and the terminal profiles are exactly the Subgame Perfect Equilibria of the game.

• Then, we consider (Section 4) all the dynamics that respect the improvement property, where all players that change their respective strategy improve the outcome (from the point of view of their respective preferences). Among these dynamics are several ones that have already been studied by Le Roux and Pauly [LeRouxDynamics] such as the Lazy Improvement Dynamics. We complete the picture (see Table 1), in particular we consider the dynamics that satisfies the improvement and the laziness properties but does not restrict the update to be performed by a single player, contrary to the Lazy Improvement Dynamics of Le Roux and Pauly. Thus in our dynamics, players play lazily but are allowed to form coalitions to obtain an outcome which is better for all the players of the coalition. We give necessary and sufficient conditions on the preferences of the players, to ensure termination (on Strong Nash Equilibria), in several classes of games (among which 2 player games).

#### Related works

The most related works is the paper by Le Roux and Pauly [LeRouxDynamics] that we extend here, as already explained. This work is inspired by the notions of potential and semi-potential introduced respectively by Monderer and Shapley [MONDERER1996124]; and Kukushkin [KUKUSHKIN2002306]. Note also that the idea of repeatedly playing a game and updating the players strategies between two plays is also behind evolutionary game theory, but in this case, the rules governing the updates are inspired from Darwinian evolution [MP73, Wei95].

## 2 Preliminaries

#### Sequential games

We consider sequential games, which are -player non-zero sum games played on finite trees. Figure 1 shows an example of such a game, with two players denoted and . Intuitively, each node of the tree is controlled by either of the players, and the game is played by moving a token along the branches of the tree, from the root node, up to the leaves, which are labelled by a payoff (in this case, , or ). The tree edges are labelled by the actions that the player controlling the origin node can play. For example, in the root node, Player can chose to play , in which case the game reaches the second node, controlled by Player , who can chose to play . The payoff for both players is then . We also associate a preference relation with each player that indicates how he ranks the payoffs. In the example of Figure 1, Player  prefers to and to (noted ), and Player  prefers to and to . Let us now formalise the basic notions about these games. The definitions and notations of this section are inspired from [Osborne].

###### Definition 1.

A sequential or extensive form game is a tuple where:

• is a non-empty finite set of players;

• is a non-empty finite set of actions;

• is a finite set of finite sequences of which is prefix-closed. That is, the empty sequence is a member of ; and implies that for all . Each member of is called a node. A node is terminal if , . The set of terminal nodes is denoted by .

• is the non-empty set of outcomes,

• associates a player with each nonterminal node;

• associates an outcome with each terminal node;

• For all : is a binary relation over , modelling the preferences of Player . We write and when and respectively.

From now on, we fix a sequential game . Then, we let be the set of nodes belonging to player . A strategy of player is a function associating an action with all nodes belonging to player , s.t. for all : , i.e., is a legal action from . Then, a tuple associating one strategy with each player is called a strategy profile and we let be the set of all strategy profiles in . For all strategy profiles , we denote by the outcome of , which is the outcome of the terminal node obtained when all players play according to . Formally, where is s.t. for all : implies . Let be a strategy profile, and let be a Player strategy. Then, we denote by the strategy profile where all players play , except Player who plays . Let be a strategy profile, and let be a nonterminal node. Then, we let: (1) be the subtree of from ; (2) be the substrategy profile of from which is s.t. : ; and (3) be the subgame of from . Since a strategy profile fixes a strategy for all players, we abuse notations and write, for all nodes , to denote the action , i.e. the action that the owner of plays in according to . Then, we say that a node lies along the play induced by if and , . As an example, let us consider again the game in Figure 1. In this game, both players can chose either or in the nodes they control. So, possible strategies for Player and are s.t. and s.t. respectively. Then, . Observe that, in our examples, we denote a profile of strategy by the actions taken by the players. For example, we denote the profile by . With this notation, this game has four strategy profiles : , , and .

#### Equilibria

Now that we have fixed the notions of games and strategies, we turn our attention to three classical notions of equilibria. First, a strategy profile is a Nash Equilibrium (NE for short) if for all players , for all strategies of player : It means that, in an NE , no player has interest to deviate alone from his current choice of strategy (because no such possible deviation is profitable to him). On the other hand, a strategy profile is a Subgame Perfect Equilibrium (SPE for short) if, for all players , for all strategies of player , for all nonterminal nodes of player : In other words, is an NE in every subgame of . Finally, in [Aumann59], Aumann defines a Strong Nash Equilibrium (SNE for short) as a strategy profile in which there is no coalition of players that has an incentive to deviate, because such a deviation would allow all players of the coalition to improve the outcome. Formally, a strategy profile is an SNE if, for all coalitions of players , for all strategy profiles of the coalition , there is s.t.: . Thus, the notion of SNE is stronger than the notion of NE. Actually, the notion of SNE has sometimes been described as ‘too strong’ in the literature, because there are few categories of games in which SNEs are guaranteed to exist (contrary to NEs and SPEs). This has prompted other authors to introduce alternative solution concepts such as Coalition Proof Equilibria [BERNHEIM19871].

For example, in the game in Figure 1, the only NE is . Moreover, it is also an SPE because is an NE in the only subgame of . However, is not an SNE, because if both players decide to form a coalition and play the action, they obtain as outcome which is better than for both of them. There is actually no SNE in this game. If we consider the same game with following preferences for Player : , then is still an NE, but not an SPE. The only SPE of this game is (which is also an NE and an SNE).

#### Dynamics: general definitions

Let us now introduce the central notion of the work, i.e. dynamics in extensive form games. As explained in the introduction, we want to study the behaviour of the players when they are allowed to repeatedly update their current strategy in a strategy profile, and characterise the cases where such repeated updates (i.e. dynamics) converge to one of the equilibria we have highlighted above. More specifically, we want to characterise when a dynamics terminates for a game , i.e., when players cannot infinitely often update their strategy.

Formally, for a sequential game , a dynamics is a binary relation over , where, as usual, we write whenever . Intuitively, means that the dynamics under consideration allows the strategy profile to be updated into , by the change of strategy of , or several players. When the context is clear, we drop the name of the game and write instead of . Given this definition, it is clear that a dynamics corresponds to a directed graph , where is the set of graph nodes, and is its set of edges. For example, the graphs in Figure 2 are some possible graphs representing dynamics associated to the game in Figure 1. Then, we say that the dynamics  terminates if there is no infinite sequence of strategy profiles such that , . Equivalently, terminates iff its corresponding graph is acyclic. Intuitively, a dynamics terminates if players can not update their strategy infinitely often, which means that the game will eventually reach stability. We say that a strategy profile is terminal iff there is no s.t. (i.e., is a deadlock in the associated graph). Finally, given a pair of strategies and , we let (resp. ) be the set of nodes (resp. the set of nodes belonging to player ) where the player who owns the node plays differently according to and . Moreover, is the natural extension of to subsets of .

Let us now identify peculiar families of dynamics (of whom we want to characterise the terminal profiles), by characterising how players update their strategies from one profile to another.

###### Definition 2 (Properties of strategy updates).

Let be two strategy profiles of a game . Then:

1. verifies the Improvement Property, written if : implies . That is, every player that changes his strategy improves his payoff.

2. verifies the Subgame Improvement Property, written , if , : . That is, every player that changes his strategy improves his (induced) payoff in all the subgames rooted at one of the changes.

3. verifies the Laziness Property, written , if , lies along the play induced by . Intuitively, we consider such updates as lazy because we require that the players do not change their strategy in nodes which do not influence the payoff.

4. verifies the One Player Property, written , if such that , . That is, at most one player updates its strategy (but he can perform changes in as many nodes as he wants).

5. verifies the Atomicity Property, written , if such that , . That is, the change affects at most one node of the tree.

Note that, except for the first two properties, those requirements do not depend on the outcome. We argue that the first three properties (Improvement, Subgame Improvement and Laziness) correspond to some kind of rational behaviours of the players, who seek to improve the outcome of the game, while performing a minimal amount of changes. On the other hand, the One Player Property is relevant because such updates do not allow players to form coalitions to improve their outcomes. Finally, the Atomicity Property is interesting per se because it corresponds to some kind of minimal update, where a single choice of a single player can be changed at a time. Because of that, dynamics respecting this Atomicity will be useful in the rest of the papers to establish general results on Dynamics.

Based on these properties, we can now define the dynamics that we will consider in this paper. For all , we define the -Dynamics as the set of all pairs s.t. . Intuitively, the -Dynamics is the most permissive dynamics where the players update their strategies respecting . For a set , we define the -dynamics as the intersection of all the -dynamics for . Throughout the paper, we denote by the -Dynamics.

Observe that, following Definition 2, any update satisfying the Atomicity Property also satisfy the One Player Property. However, no such implication exist in general between the other properties:

###### Lemma 3.

Let and be two strategies of a game . Then, implies that .

On the other hand, for all s.t. and , there exists a pair of strategies and and a game s.t.: and .

###### Proof.

The first point follows from Definition 2. Indeed, since only one move can be made between and (if ), then only one player can have updated his strategy.

Let us give a counter example for the lack of implication between and (the other cases follow immediately from Definition 2). Let us consider the game in Figure 1. Then, we claim that but . Indeed, for both players, , but . On the other hand, but , because . ∎

As a consequence, we obtain direct inclusions between some dynamics. For instance in all games. In [LeRouxDynamics], Le Roux and Pauly consider the so-called Lazy Improvement Dynamics which corresponds to our -Dynamics. The underlying idea is to disallow players from making changes in nodes that are irrelevant (because they will not appear along the play generated by the profile), while ensuring that the payoff improves. In [LeRouxDynamics], Le Roux and Pauly prove that this dynamics terminates for all games that do not have cyclic preferences and that the terminal profiles are exactly the Nash Equilibria.

Examples of graphs associated with dynamics of particular interest (for the game in Figure 1) are displayed in Figure 2: the -Dynamics (or Lazy Improvement Dynamics) on the left; the -Dynamics in the middle (which will be particularly relevant to the discussion in Section 3); and the -Dynamics on the right (which will be particularly relevant in Section 4).

The rest of the paper will be structured as follows: in Section 3, we will consider dynamics which are subsets of the -Dynamics (like the -Dynamics). In Section 4, we will consider dynamics which are subsets of the -Dynamics, in order to complete the results obtained by Leroux and Pauly in [LeRouxDynamics].

## 3 Subgame Improvement Dynamics

In this section we will focus on dynamics that respect the Subgame Improvement Property (see Definition 2), i.e., dynamics which are subsets of the -Dynamics (note that these dynamics have not been considered at all in [LeRouxDynamics]). More precisely, we will consider all the -Dynamics s.t.. Let us notice that we do not consider here the property, because we argue that there is little interest in the -Dynamics. Indeed, let us consider the game in Figure 1, with the following preferences instead: and . Then, we can update into with the -Dynamics. Observe that in this update, both players update their strategy and thus agree to form a coalition to perform it. However, this update is not profitable for player who obtains a worse outcome: instead of . This example shows that the -Dynamics yields strategy updates that are not rational.

The central result of this section is that all those dynamics terminate when the preferences of the players are acyclic111Actually, as we show at the end of the section, in the presence of players who have cyclic preferences and play lazily, the players who have acyclic preferences are still guaranteed to perform a finite number of updates only. and converge to subgame perfect equilibria, as stated in the follow theorem:

###### Theorem 4.

Let be respectively a set of players, a set outcomes and preferences, and let be s.t. . Then, the two following statements are equivalent:

1. In all games built over the -Dynamics terminates;

2. The preferences are acyclic.

Moreover, for all , the set of terminal profiles of the -Dynamics is the set of SPEs of the game.

An important consequence of this theorem is that acyclic preferences form a sufficient condition to ensure termination (on SPEs) of all -dynamics with . Observe that this condition is very weak because it constrains only the preferences, and not the structure (tree) of the game. We argue that this is also very reasonable, as acyclic preferences allow to capture many if not most rational preferences222Although some authors argue that cyclic preferences can be realistic, see Larami and Zakir for an example [LZ, p. 12]. This condition is, however, not necessary: Theorem 4 tells us that when the preferences are cyclic, the dynamics does not terminate in all games embedding those preferences. Actually, one can find examples of games with cyclic preferences where the dynamics still terminate, and examples where they do not. If we consider the game in Figure 1, such that preferences of Player 1 are now (and Player 2 as before), then the dynamics will terminate. However, if we let Player 2 have cyclic preferences too, with: , then he will infinitely often change his strategy, because he improves the outcome (according to his preferences) by changing from to and from to .

This section will be mainly devoted to proving Theorem 4. Our proof strategy works as follows. First of all, we establish termination of the -Dynamics when the preferences are acyclic (Proposition 5). This guarantees that all the -Dynamics terminate when , since all these dynamics are more restrictive. Second, we show that all SPEs appear as terminal profiles of the -Dynamics (Proposition 6), without guaranteeing, at that point that all terminal profiles are SPEs. Finally, we show that all the terminal profiles of the -Dynamics are SPEs (Proposition 7). We then argue, using specific properties of this dynamics, and relying on Definition 2, that this implies that the set of terminal profiles of all the dynamics we consider coincide with the set of SPEs.

#### The {Si}-Dynamics

As announced, we start by two properties of the -Dynamics. The first proposition states that, for a fixed set of preferences, the -Dynamics terminates in all games built on these preferences iff the preferences are acyclic. The second shows that, in these cases, the SPEs are contained in the terminal profiles of the -Dynamics.

###### Proposition 5.

Let be respectively a set of players, a set outcomes and preferences. Then, the two following statements are equivalent: (1) In all games built over , the -Dynamics terminates; (2) The preferences are acyclic.

###### Sketch of proof.

Given a cyclic preference, we build a one-player game where all the outcomes can be reached from the root (in one step). Since the preference is cyclic, the -Dynamics does not terminate.

On the other hand, to prove that when preferences are acyclic, the dynamics terminates in all games, we make an induction over the number of nodes of the game. We consider a game with nodes, and in which we have an infinite sequence of strategy profiles such that Then, we consider a node , such that every successor of is in . By the Subgame Improvement Property, if player changes in this place from to , he will never come back to (as preferences are acyclic and he has to improves his payoff at this place). Moreover, as the number of successors of is finite, we can consider that, after a finite number of steps in the sequence, Player  will not change anymore in node , and definitely choose some successor of . Then we can replace by this successor . We are then in a game with nodes, in which the dynamics terminates by induction hypothesis. The key ingredient is to prove that the dynamics coincide in the two games. ∎

###### Proposition 6.

Let be a sequential game. Then, all SPEs of are terminal profiles of .

###### Sketch of proof.

The property follows directly from the Subgame Improvement Property, and the definition of SPEs. Indeed, the SI Property requires to improves the outcome in the subgame where players change; and SPEs require the players to choose the best response in all subgames. Then, when a profile is an SPE, it cannot be updated without violating SI. ∎

#### The {Si,a}-Dynamics

We now turn our attention to the -Dynamics, and show that all its terminal profiles are necessarily SPEs. As announced, this result will be sufficient to establish that the other dynamics we consider here terminate on SPEs too.

###### Proposition 7.

Let be a sequential game. Then, all terminal profiles of are SPEs of .

###### Sketch of proof.

The proof is by contradiction: consider a strategy profile which is not an SPE. Then, there is a node where a player does not make the best choice for his payoff. He can thus update his choice in this node, and this unique update will improve the outcome, it is thus allowed in the -Dynamics. Thus, all profiles which are not SPEs are not terminal for this dynamics. ∎

Although this will not serve to prove Theorem 4, we highlight now an interesting property of the -Dynamics: roughly speaking, it weakly simulates the -Dynamics, in the sense that, in all games, every update of the -Dynamics can be split into a sequence of updates of -Dynamics.

###### Lemma 8.

For all sequential games , for all s.t. : there are s.t. .

###### Sketch of proof.

The proof is by induction over , the number of changes between and . First of all, we prove that there is such that nothing changes in , and there is no node higher than where we leave the path towards between and . Formally, if , and , (), such that and . The point of the existence of such a is that, for such that and for , we have , and . Then, as , we conclude by ind. hyp. ∎

#### Proof of Theorem 4

Equipped with these three propositions, we can now prove our theorem:

###### Proof of Theorem 4.

Let us consider a set of players , a set of outcomes , preferences and s.t. . The -Dynamics terminates for every game built over if and only if the preferences are acyclic by Proposition 5, because by definition.

Next, let us consider s.t. . By definition, and using the fact that Property implies Property (see Lemma 3), we have: . Let be a terminal profile of . Then, it is also terminal in since . By proposition 7, is thus an SPE of . On the other hand, let be an SPE of . Then by Proposition 6, is a terminal node of . Since , is also a terminal node of . Thus, we have shown that all SPEs of are terminal nodes of and vice-versa. ∎

#### Termination in the presence of ‘cyclic’ players

We close this section by answering the following question: ‘what happens when some players have cyclic preferences and some have not?’ We call cyclic the players who have cyclic preferences and show that, although their presence is sufficient to prevent termination of the whole dynamics, players with acyclic preferences can still be guaranteed a bounded number of updates in their choices, provided that the cyclic players play lazily. Thus, in this case, any infinite sequence of updates will eventually be made up of updates from the cyclic players only. This provides some robustness to our termination result.

Let us consider a set of players partitioned into the sets and of cyclic and acyclic players respectively; a set of outcomes; and preferences . Let us consider the dynamics such that iff: (i) either and ; (ii) or and 333Observe that when , the set is necessarily a singleton, because of the -Property.. It means that the acyclic players play according to the -Dynamics, while the cyclic players have to play according to the -Dynamics. In this case, we say that the dynamics terminates for acyclic player if there is no infinite sequence of strategy profiles such that: (1) for all : and (2) for all there is s.t. (i.e. the acyclic players change infinitely often). Then:

###### Proposition 9.

Let , , be a set of players (with the set of cyclic players), a set of outcomes and a set of preferences. Then, the dynamics terminates for acyclic players in all games built over .

###### Sketch of proof.

In [LeRouxDynamics, Section 5], Le Roux and Pauly provide an alternative proof of the termination for the Lazy Improvement Dynamics (here -Dynamics). They associate a function per player which decrease when the associated player update his strategy, and does not otherwise (i.e. when other players update). The termination of the -Dynamics is then a consequence of the decrease of the functions, together with the finiteness of strategies (per player). When cyclic players are added, the -Dynamics still terminates for acyclic players as they do not affect the acyclic players.

In order to obtain the desired result, we need adapt the proof of [LeRouxDynamics, Section 5], by introducing a global function (i.e. for all acyclic players). This new global function has the property to decrease when acyclic players update their strategies. Moreover, one also shows that the latter function is not affected by the updates of the cyclic players, as their strategies follows the -Dynamics. This implies that acyclic players do not update their strategy an infinite number of time. ∎

Finally, we note that, if we allow cyclic players to play with the -Dynamics, the result does not hold. Consider the game in Figure 1, and consider Player  as the cyclic one. Then, the graph associated to the -Dynamics is represented in Figure 3 (left), where dotted lines represent updates of the cyclic player (Player ). Clearly, this graph contains a cycle in which Player  updates infinitely often.

## 4 Improvement dynamics and coalitions

While Section 3 was devoted to characterising the -Dynamics with , we turn now attention to those where . Recall that Le Roux and Pauly have studied in [LeRouxDynamics] the Lazy Improvement Dynamics, which corresponds to our -Dynamics and shown that it terminates when the preferences of the players are acyclic, and terminates to Nash Equilibria. Their study of the -Dynamics was motivated by the fact that less restrictive dynamics (that are still contained in ) do not always terminate, namely the -Dynamics and the -Dynamics. These results appear as the grey lines in Table 1.

Our contribution in the present section is to fill in Table 1 by the following results. First, for all the -Dynamics for , we show that acyclic preferences guarantee termination, and that the final profiles contain the Nash Equilibria.

Second, we consider the -Dynamics which can be regarded as a coalition dynamics, where several players can change their strategies at the same time to obtain a better outcome for all players taking part to the coalition. For example, in Figure 1, the two players can make a coalition to change from to , as they prefer to . We characterise families of games and conditions on the preferences where termination of the -Dynamics is guaranteed (with the terminal profiles being exactly the Strong Nash Equilibria in the sense of Aumann [Aumann59]).

### 4.1 The {I,a}-Dynamics

To complete Table 1, we consider now the first line which represents the -Dynamics with . All these Dynamics can be considered at once thanks to the next proposition:

###### Proposition 10.

All the -Dynamics with are equal.

###### Proof.

To understand these equalities, we must focus on -Dynamics. This dynamics allows only one update between two profiles, and the outcome must be better for the player that has changed his strategy. If we want that the outcome of the game changes, the atomic move must have occurred along the play induced by the strategy. Thus, -Dynamics verifies the Lazy Property (). Moreover, by Lemma 3, it also verifies the One Player Property (). Finally, as the outcome is improved, and only one change has been done, in particular the payoff is improved in the subgame rooted at the change. Thus, the -Dynamics also verifies the Subgame Improvement Property (). ∎

By Theorem 4, as the -Dynamics verifies the Subgame Improvement Property, it terminates for every game over some if and only if the preferences are acyclic:

###### Corollary 11.

Let be respectively a set of players, a set of outcomes and preferences. Then, the two following statements are equivalent: (1) in all games built over , the -Dynamics terminates; (2) the preferences are acyclic.

Now that we have established termination, let us turn our attention to the terminal profiles. It turns out that they contain all Nash Equilibria of the game:

###### Proposition 12.

Let be a sequential game. Then all NEs of are terminal profile of .

###### Proof.

If is an NE, no player can improve the outcome alone from , hence is terminal in . ∎

Let us notice that some non-NE profiles can also be terminal profiles of the -Dynamics. For example, in Figure 3 (right), is a terminal profile, but not an NE.

### 4.2 The {I,l}-Dynamics

Let us now turn our attention to the -Dynamics. First, we observe that the terminal profiles of this dynamics are exactly the Strong Nash Equilibria (see Section 2 for the definition). We believe the observation is interesting, since, as already stated, the concept of SNE has sometimes been deemed ‘too strong’ in the literature.

As can be seen from Table 1, the conditions to ensure termination are more involved and require a finer characterisation of the preferences. We start by discussing these conditions.

#### Orders

Let us begin with the definition of strict linear and weak order. A strict linear order over a set is a total, irreflexive and transitive binary relation over . This is a natural way to see orders: for example, the usual orders over or are strict linear orders. A strict weak order over a set is an irreflexive and transitive binary relation over that provides the transitivity of incomparability. Formally, for , if and , we say that and are incomparable, and we write (this can happen because a strict weak order is not a total relation). Then, in a strict weak order, for all , , , we have that and implies .

We write if or , i.e. if . Sometimes we will denote strict weak orders by to emphasise the possibility of incomparability, but this does not means that the relation is reflexive. We argue that strict weak orders are quite natural to consider in our context. Indeed, the incomparability of two outcomes for a player reflects the indifference of the player regarding these outcomes. We can easily imagine two outcomes and such that Player 1 prefers to but Player  has no preference. This kind of vision justifies the transitivity of incomparability. Indeed, if a player can neither choose between and , nor between and , it would not seem natural that for example, he prefers to . Finally, we say that is a strict linear extension of a strict weak order if it is a strict linear order and : implies .

#### Layerability

While the notions of order we have defined above make sense in our context, they are unfortunately not sufficient to ensure termination of -Dynamics. Indeed, considering the game in Figure 1 and his associated graph with -Dynamics in Figure 2 (right), we can see that, even when players have strict linear preferences, the dynamics does not terminate. We thus need to introduce more restrictions on outcomes and preferences to ensure termination. In [LeRouxPattern], Le Roux considers a pattern over outcomes and preferences, and proves that the absence of this pattern induces some structure on the outcomes that we call layerability (in the case of strict linear order). Layerability, in turn, can be used to prove termination of dynamics as we are about to see. Our first task is thus to generalise the pattern and the definition of layerability of outcomes in this case in the case of strict weak orders.

Let be a finite set of outcomes, be a finite set of players and be strict weak orders over . For a strict weak order, we say that is: (i) out of main patternfor if it satisfies:

(1)
(ii) out of secondary patternfor if it satisfies:
(2)
and
(iii) out of patternfor if it is out of main and secondary pattern. Notice that, when is a strict linear order, then is out of pattern for if and only if is out of main pattern for .

Then, we say that can be layered for if there is a partition of (whose elements are called layers) and a strict total order on (i.e., the layers are totally ordered) s.t.:

1. implies that ; and

2. , , : .

The intuition between the notion of layer is as follows: Point 1 tells us that the ordering of the layers is compatible with the preference relation of all players. That is, if we pick in some layer and in some layer , with (i.e. is ‘better’ than ), then all players will prefer to . However, with this point alone in the definition, one could put all the outcomes in the same layer (i.e., the partition would be trivially ). Point 2 ensures that the disagreement of the players on some outcomes is also reflected in the layers. That is, in all layers , we cannot find two players and that agree on a pair of elements and from (because they both prefer to ) but disagree on pair of elements and from (because player prefers to but player prefers to ).

For example, consider the set of outcomes and four preference orders , , and depicted in Table 2, where each order corresponds to a column, and the order of the rows indicate decreasing preference order (for example, since, in column , occurs in the second row, while occurs in the first row). With these orders, the outcomes can be distributed in three layers. In other words the partition is , and the order on the layers is . On this example, the intuition given above can be verified: while the four players do not agree on the , and outcomes, they agree that these outcomes are all better than , which is always better than and .

Given this intuition, it is not surprising that the notion of layer has a strong link with the presence of the pattern that prevents termination. Indeed, the following result, from [LeRouxPattern] shows that these notions are equivalent in the case of strict linear orders:

###### Proposition 13 ([LeRouxPattern]).

Let be a finite set of outcomes, a finite set of players and strict linear orders. Then is out of pattern for if and only if can be layered for .

As explained above, we seek to extend this result to strict weak orders, as this will be ancillary to establishing termination. Unfortunately, the result does not hold immediately in this case, as shown by the following example. Consider the followings preferences : , and and . These preferences are out of pattern (they satisfy both (1) and (2)), but they cannot be layered. However, if we restrict ourselves to two players, the absence of our main and secondary pattern is sufficient to ensure that the preferences can be layered, in the case of strict weak orders, as shown in the next Proposition. For the sake of clarity, let us denote (resp. ) by (resp. ). Then:

###### Proposition 14.

Let be a finite set of outcomes, , and strict weak orders. The following statements are equivalent: (1) is out of pattern for ; (2) there exists , a strict linear extension of that can be layered for ; (3) can be layered for .

#### Termination of the {I,l}-Dynamics

Equipped with these preliminary results, we can now characterise termination of the -Dynamics, as shown in Table 1. As we have seen with the example in Figure 1, the termination of the -Dynamics is not as simple as the termination of -Dynamics or -Dynamics. When we consider that the preferences of the game are strict weak orders, we will see that ‘can be layered’ is a necessary condition, and ‘out of pattern’ is a sufficient condition. However, none of these characterisation are sufficient and necessary condition.

###### Theorem 15.

Let be a finite set of outcomes, be a finite set of players and be strict weak orders. We have the following implications: , and where:

1. can be layered for .

2. The -Dynamics terminates in all games built over , and .

3. All games built over , and admit an SNE.

4. is out of pattern for .

###### Sketch of proof.

For this point, the idea consists in reducing to a two player game where both players play lazily and do not form coalitions. This is the point of the proof where we exploit heavily the properties of layers: as can be layered for , we know that no player or coalition of players will make a change in order to reach an outcome which is in a lower layer than the current outcome. Indeed, by definition of layers, every player prefers every outcome of an upper layer to any outcome of a lower layer. Then, we can consider that, at some point along the dynamics, we will reach a layer and never leave it, because the game is finite. Let us denote it by .

Moreover, in that layer, we can make two teams of players. Those who agree with player , and the others. Indeed, inside a layer, we have for all and for all . We can then regard the first team as a single player and build a game with two players that never make coalitions, and play lazily. By [LeRouxDynamics], we know that this dynamics terminates for the game and conclude that the -Dynamics terminates for the game .

We know that the terminal profiles are the SNEs, by definition. Thus, if the dynamics terminates, the exists an SNE in the game.

If is not out of the main pattern, let us consider the game in Figure 1. His associated graph, given by Figure 2 (right), has no terminal node. It means that there is no SNE. In the case where is not out the secondary pattern, we consider the game and his associated graph in Figure 4 (left). The graph has no terminal node, so the game has no SNE.

A counter-example to this implication goes as follows. Consider , and s.t. , and . Clearly, these preferences cannot be layered, but we claim that, in all games built over , and , the -Dynamics terminates. Indeed, observe that, as player can not make any coalition, he will update his strategy only a finite number of times. After that, either player and player will form a coalition in order to reach , then will stop updating their strategy changing; or they will not. In this case, we can consider that they update according to -Dynamics. Then, by [LeRouxDynamics, Theorem 10], this dynamics terminate.

The game in Figure 4 (right) is an example where the preferences are out of pattern and the graph associated to the -Dynamics has no terminal node, hence the game admits no SNE. ∎

Let us now finish by considering our two particular cases, in which we know that the ‘layerability’ of the order is equivalent to the absence of pattern, namely when preferences are strict linear order (Proposition 13), in two player games (Proposition 13). Then:

###### Corollary 16.

Let be a finite set of outcomes, be a finite set of players and be preferences. When either and are strict weak orders; or are strict linear orders, the following are equivalent: (1) The -Dynamics terminates in all games built over , and ; (2) All game built over , and admit an SNE; (3) is out of pattern for .