The canonical model to formalize the reactive synthesis problem are two-player win/lose perfect information games played on finite (directed) graphs [PnRo89, AbadiLW89]. In recent years, more general objectives and multiplayer games have been studied (see e.g. [kupferman] or [DBLP:conf/lata/BrenguierCHPRRS16] and additional references therein). When moving beyond two-player win/lose games, the traditional solution concept of a winning strategy needs to be updated by another notion. The game-theoretic literature offers a variety of concepts of rationality to be considered as candidates.
The notion we focus on here is admissibility: roughly speaking, judging strategies according to this criterion allows to deem rational only strategies that are not worse than any other strategy (ie, that are not dominated). In this sense, admissible strategies represent maximal elements in the whole set of strategies available to a player. One attractive feature of admissibility, or more generally, dominance based rationality notions is that they work on the level of an individual agent. Unlike e.g. to justify Nash equilibria, no common rationality, shared knowledge or any other assumptions on the other players are needed to explain why a specific agent would avoid dominated strategies.
The study of admissibility in the context of games played on graphs was initiated by Berwanger in [Berwanger07] and subsequently became an active research topic (e.g. [BRS14, BrenguierPRS16, BGRS17, pauly-raskin2, DBLP:journals/acta/BrenguierRS17], see related work below). In [Berwanger07], Berwanger established in the context of perfect-information games with boolean objectives that admissibility is the good criterion for rationality: every strategy is either admissible or dominated by an admissible strategy.
Unfortunately, this fundamental property does not hold when one considers quantitative objectives. Indeed, as soon as there are three different possible payoffs, one can find instances of games where a strategy is neither dominated by an admissible strategy, nor admissible itself (see Example 2.1). This third payoff actually allows for the existence of infinite domination sequences of strategies, where each element of the sequence dominates its predecessor and is dominated by its successor in the chain. Consequently, no strategy in such a chain is admissible. However, it can be the case that no admissible strategy dominates the elements of the chain. In the absence of a maximal element above these strategies, one may ask why they should be discarded in the quest of a rational choice. They may indeed represent a type of behaviour that is rational but not captured by the admissibility criterion.
To formalize this behaviour, we study increasing chains of strategies (Definition 3.1). A chain is weakly dominated by some other chain, if every strategy in the first is below some strategy in the second. The question then arises whether every chain is below a maximal chain. Based on purely order-theoretic argument, a sufficient criterion is given in Theorem 3.1. However, Corollary 3.2 shows that our sufficient criterion does not apply to all games of interests. We can avoid the issue by restricting to some countable class of strategies, e.g. just the regular, computable or hyperarithmetic ones (Corollary 3.3).
We test the abstract notion in the concrete setting of generalised safety/reachability games (Definition 4). Based on the observation that the crucial behaviour captured by chains of strategies, but not by single strategies is Repeat this action a large but finite number of times, we introduce the notion of a parameterized automaton (Definition 4.2), which essentially has just this ability over the standard finite automata. We then show that any finite memory strategy is below a maximal chain or strategy realized by a parameterized automaton (Theorem 4.2).
Finally, we consider some algorithmic properties of chains and parameterized automata in generalised safety/reachability games. It is decidable in PTime whether a parameterized automaton realizes a chain of strategies (Theorem 4.3). It is also decidable in PTime whether the chain realized by one parameterized automaton dominates the chain realized by another (Theorem 4.3).
As mentioned above, the study of dominance and admissibility for games played on graphs was initiated by Berwanger in [Berwanger07]. Faella analyzed several criteria for how a player should play a win/lose game on a finite graph that she cannot win, eventually settling on the notion of admissible strategy [Faella09].
Admissibility in quantitative perfect-information sequential games played on graphs was studied in [BrenguierPRS16]. Concurrent games were considered in [BGRS17]. In [pauly-raskin2], games with imperfect information, but boolean objectives were explored. The study of decision problems related to admissibility (as we do in Subsection 4.3) was advanced in [BRS14]. The complexity of decision problems related to dominance in normal form games has received attention, see [pauly-dominance] for an overview. For the role of admissibility for synthesis, we refer to [DBLP:journals/acta/BrenguierRS17].
Our Subsection 3.1 involves an investigation of cofinal chains in certain quasi-ordered sets. A similar theme (but with a different focus) is present in [shangzhi].
2.1 Games on finite graphs
A turn-based multiplayer game on a finite graph is a tuple where:
is the non-empty finite set of players of the game,
where the finite set of vertices of is equipped with a -partition , and is the edge relation of ,
for each player in , is a payoff function that associates to every infinite path in a payoff in .
Outcomes and histories.
An outcome of is an infinite path in , that is, an infinite sequence of vertices , where for all , . The set of all possible outcomes in is denoted . A finite prefix of an outcome is called a history. The set of all histories in is denoted . For an outcome and an integer , we denote by the history . The length of the history , denoted is . Given an outcome or a history and a history , we write if is a prefix of , and we denote by the unique outcome (or history) such that . Given an outcome or a history and (respectively ), we denote by (respectively ) the -th vertex of (respectively of ). For a history , we define the last vertex of to be and its first vertex . For a vertex , its set of successors is .
Strategy profiles and payoffs.
A strategy of a player is a function that associates to each history such that , a successor state . A tuple of strategies where , one for each player in is called a profile of strategies. Usually, we focus on a particular player , thus, given a profile , we write to designate the collection of strategies of players in , and the complete profile is written . The set of all strategies of player is denoted , while is the set of all profiles of strategies in the game and is the set of all profiles of all players except Player . As we consider games with perfect information and deterministic transitions, any complete profile yields, from any history , a unique outcome, denoted . Formally, is the outcome such that and for all , for all , its holds that if . The set of outcomes (resp. histories) compatible with a strategy of player after a history is (resp. ). Each outcome yields a payoff for each Player . We denote with the payoff of a profile of strategies after a history .
Usually, we consider games instances such that players start to play at a fixed vertex. Thus, we call an initialized game a pair of a game and a vertex . When the initial vertex is clear from context, we speak directly from , and instead of , and .
In order to compare different strategies of a player in terms of payoffs, we rely on the notion of dominance between strategies: A strategy is weakly dominated by a strategy at a history compatible with and , denoted , if for every , we have . We say that is weakly dominated by , denoted if , where is the initial state of . A strategy is dominated by a strategy , at a history compatible with and , denoted , if and there exists , such that . We say that is dominated by , denoted if , where is the initial state of . Strategies that are not dominated by any other strategies are called admissible: A strategy is admissible (respectively from ) if (resp. ) for every .
Antagonistic and Cooperative Values
To study the rationality of different behaviours in a game , it is useful to be able to know, for a player , a fixed strategy and any history , the worst possible payoff Player can obtain with from (i.e., the payoff he will obtain assuming the other players play antagonistically), as well as the best possible payoff Player can hope for with from (i.e., the payoff he will obtain assuming the other players play cooperatively). The first value is called the antagonistic value of the strategy of Player at history in and the second value is called the cooperative value of the strategy of Player at history in . They are formally defined as and .
Prior to any choice of strategy of Player , we can define, for any history , the antagonistic value of for Player as and the cooperative value of for Player as . Furthermore, one can ask, from a history , what is the maximal payoff one can obtain while ensuring the antagonistic value of . Thus, we define the antagonistic-cooperative value of for Player as . From now on, we will omit to precise when it is clear from the context. Given a history , we say that a strategy of Player is worst-case optimal if , that it is cooperative optimal if and worst-case cooperative optimal if and .
An initialized game is well-formed for Player if, for every history , there exists a strategy such that , and a strategy such that . In other words, at every history , Player has a strategy that ensures the payoff , and a strategy that allows the other players to cooperate to yield a payoff of .
In the following, we will always focus on the point of view of one player , thus we will sometimes refer to him as the protagonist and assume it is the first player, while the other players can be seen as a coalition and abstracted to a single player, that we will call the antagonist. Furthermore, we will omit the subscript to refer to the protagonist when we use the notations , etc..
Consider the game depicted in Figure 1. The protagonist owns the circle vertices. The payoffs are defined as follows for the protagonist :
Let us first look at the possible behaviours of the protagonist in this game, when he makes no assumption on the payoff function of the antagonist. He can choose to be “optimistic” and opt to try (at least for some time, or forever) to go to in the hope that the antagonist will cooperate to bring him to , or settle from the start and go directly to , not counting on any help from the antagonist. We denote by the strategy that prescribes to choose as the successor vertex at the first visits of , and at the -th visit, while denotes the strategy that prescribes at every visit of . Note that at history , the strategy is cooperative optimal but not worst-case optimal (as the protagonist takes the risk to get a payoff of by staying forever in the loop ), while the strategy that goes directly to is worst-case optimal but not cooperative optimal. On the other hand, strategies for are worst-case cooperative optimal at : they allow the antagonist to help reaching but also ensure the payoff by not letting the protagonist loop indefinitely in . Fix . Then, : Indeed, for all , if , then there exists such that . As and agree up to , we have that , thus as well. Furthermore, consider a strategy such that for all and . Then while . Finally, consider the strategy such that for all . Then . Hence, . In addition, we observe that is admissible: for any strategy , the strategy of the antagonist that moves to at the -th visit of yields a payoff of against strategy but against strategy . Thus, for any .
Quantitative vs Boolean setting.
Remark that in the boolean variant of the Help-me? game considered in Example 2.1, where the payoff associated with the vertex is and the payoff associated with the vertex is , every strategy for is in fact dominated by , as and both yield payoff against such that for all . In fact, Berwanger in [Berwanger07], showed that boolean games with -regular objectives enjoy the following fundamental property: every strategy is either admissible, or dominated by an admissible strategy. The existence of an admissible strategy in any such game follows as an immediate corollary.
Let us now illustrate how admissibility fails to capture fully the notion of rational behaviour in the quantitative case. Firstly, recall that the existence of admissible strategies is not guaranteed in this setting (see for instance the examples given in [BrenguierPRS16]). In [BrenguierPRS16], the authors identified a class of games for which the existence of admissible strategies (for Player ) is guaranteed: well-formed games (for Player ). However, even in such games, the desirable fundamental property that holds for boolean games is not assured to hold anymore. In fact, this is already true for quantitative well-formed games with only three different payoffs and really simple payoff functions. Indeed, consider again the Help-me? game in Figure 1. Remark that it is a well-formed game for the protagonist. We already showed that any strategy is dominated by the strategy . Thus, none of them is admissible. The only admissible strategy is . It is easy to see that for any : Let be such that for all . Then . To sum up, we see that there exists an infinite sequence of strategies such that none of its elements is dominated by the only admissible strategy . However, the sequence is totally ordered by the dominance relation. Based on these observations, we take the approach to not only consider single strategies, but also such ordered sequences of strategies, that can represent a type of rational behaviour not captured by the admissibility concept.
2.2 Order theory
In this paragraph we recall the standard results from order theory that we need (see e.g. [markowsky]).
A linear order is a total, transitive and antisymmetric relation. A linearly ordered set is a well-order, if every subset of has a minimal element w.r.t. . The ordinals are the canonical examples of well-orders, in as far as any well-order is order-isomorphic to an ordinal. The ordinals themselves are well-ordered by the relation where iff order-embeds into . The first infinite ordinal is denoted by , and the first uncountable ordinal by .
A quasi order is a transitive and reflexive relation. Let be a quasi-ordered set. A chain in is a subset of that is totally ordered by . An increasing chain is an ordinal-indexed family of elements of such that . If we only have that implies , we speak of a weakly increasing chain. We are mostly interested in (weakly) increasing chains in this paper, and will thus occasionally suppress the words weakly increasing and only speak about chains.
A subset of a quasi-ordered set is called cofinal, if for every there is a with . A consequence of the axiom of choice is that every chain contains a cofinal increasing chain, which is one reason for our focus on increasing chains. It is obvious that having multiple maximal elements prevents the existence of a cofinal chain, but even a lattice can fail to admit a cofinal chain. An example we will go back to is (cf. [markowsky]).
If admits a cofinal chain, then its cofinality (denoted by ) is the least ordinal indexing a cofinal increasing chain in . The possible values of the cofinality are or infinite regular cardinals (it is common to identify a cardinal and the least ordinal of that cardinality). In particular, a countable chain can only have cofinality or . The first uncountable cardinal is regular, and .
We will need the probably most-famous result from order theory:
[Zorn’s Lemma] If every chain in has an upper bound, then every element of is below a maximal element.
3 Increasing chains of strategies
3.1 Ordering chains
In this subsection, we study the quasi-order of increasing chains in a given quasiorder . We denote by the set of increasing chains in . Our intended application will be that is the set of strategies for the protagonist in a game ordered by the dominance relation. However, in this subsection we are not exploiting any properties specific to the game-setting. Instead, our approach is purely order-theoretic.
We introduce an order on by defining:
Note that is a partial order. Let denote the corresponding equivalence relation. We will occasionally write short for .
Inspired by our application to dominance between strategies in games, we will refer to both and as the dominance relation, and might express e.g. as is dominated by , or dominates . There is no risk to confuse whether or is meant, since iff . Continuing the identification of and , we will later also speak about a single strategy dominating a chain or vice versa.
The central notion we are interested in will be that of a maximal chain: is called maximal, if for implies .
We desire situations where every chain in is either maximal or below a maximal chain. Noting that this goal is precisely the conclusion of Zorn’s Lemma (Lemma 2.2), we are led to study chains of chains; for if every chain of chains is bounded, Zorn’s Lemma applies. Since is a quasiorder just as is, notions such as cofinality apply to chains of chains just as they apply to chains. We will gather a number of lemmas we need to clarify when chains of chains are bounded.
In a slight abuse of notation, we write iff . Clearly, implies . We can now express cofinality by noting that is cofinal in iff and . We recall that the cofinality of (denoted by is the least ordinal such that there exists some which is cofinal in .
If , then there is some with and .
For each , let . By assumption, the set on the right hand side is non-empty; and as it is a set of ordinals, it has a minimum. The set is well-ordered by (as a subset of a well-ordered set). Hence it can be turned into an increasing chain . By construction, we have and . By transitivity, it follows that .
It remains to argue that . For that, consider the map . This map is well-defined, injective and preserves . Thus, it constitutes an order-homeomorphism from to , and witnesses that . ∎
is equal to the least ordinal such that there exists with .
Since any increasing chain that is cofinal in is equivalent to , it follows that is an upper bound.
Conversely, if , by Lemma 3.1 there is some with and . By the definition of cofinality, we have that , so in particular, also . ∎
For every chain there exists an equivalent chain such that or is an infinite regular cardinal. In particular, if is countable, then is equivalent to a singleton or some chain .
If and , then there exists such that
As in the proof of Lemma 3.1: For each , pick some with . The set is well-ordered by (as a subset of a well-ordered set), and has cardinality at most . Hence there is some with , and .
By assumption, cannot be cofinal in . Thus, there has to be some such that for no we have that . But as is totally ordered, this implies that for all we have , i.e. . The claim follows by transitivity of . ∎
We briefly illustrate the concepts introduced so far in the game setting. Notice that for a game and a Player , the pair is indeed a quasi-ordered set. We can thus consider the set of increasing chains of strategies in .
Recall the Help-me? game of Figure 1 and consider the set of strategies of the protagonist ordered by the weak dominance relation. Any single strategy is an increasing chain, indexed by the ordinal . We already noted that the strategy is admissible, thus the chain consisting of is maximal with respect to . Furthermore, the sequence of strategies is an increasing chain. Indeed, we know that for any , we have . It is a maximal one: in fact, since the set of strategies of the protagonist solely consists of the strategies of this chain and , and as for any , we get that any chain such that satisfies . Thus, . Let be an increasing chain indexed by the ordinal . First, remark that . If , then the cofinality of is as is equivalent to the strategy . If , then the cofinality of is : As for every finite chain with , there exists such that , and thus is not (weakly) dominated by . Moreover, we have that and is thus maximal. Indeed, since is a chain that is not a singleton, we already know that , that is . Let now . As is an increasing chain and , we have that there exists and such that . Thus, since is an increasing chain. Hence, we also have .
Now we are ready to prove the main technical result of this section 3.1, which identifies the potential obstructions for each chain in to have an upper bound:
The following are equivalent:
If is an increasing chain in , then it has an upper bound in .
If is an increasing chain in with , and , then it has an upper bound in .
It is clear that is a special case of . We thus just need to show that any potential obstruction to can be assumed to have the form in .
By replacing each with some suitable cofinal increasing chain if necessary, we can assume that for all .
Consider . If this set is cofinal in , then for each inside that set pick some witness , and let be the witness obtained from Lemma 3.1. Now is the desired upper bound.
If the set from the paragraph above is not cofinal, then there exists some such that for we always have that . As the are ordinals, decreases can happen only finitely many times. Thus, by moving to a suitable cofinal subset we can safely assume that all are equal to some fixed .
Again by moving to a suitable cofinal subset, we can assume that . If , the statement is trivial. If , then is the desired upper bound. It remains to handle the case .
We construct some function , such that the desired upper bound is of the form . We proceed as follows: Set . Once has been defined for all , pick for each some such that and . As , it cannot be that is cofinal in . Thus, it has some upper bound, and we define such that is such an upper bound. ∎
Let us illustrate the problem of extending Lemma 3.1 by an example:
[[markowsky, Example 1]] Let , i.e. the product order of the first uncountable ordinal and the first infinite ordinal. Consider the chain of chains given by , this corresponds to the case , in Lemma 3.1. If this chain of chains had an upper bound, then would need to admit a cofinal chain. However, this is not the case.
However, we can guarantee the existence of a maximal chain above any chain when there is no uncountable increasing chain of increasing chains.
If all increasing chains of elements in (i.e., increasing chains of increasing chains of elements of ) have a countable number of elements, then for every there exists a maximal with .
We first argue that Condition 2 in Lemma 3.1 is vacuously true. As all increasing chains in are countable, the only possible value for is . As embeds into , if all chains in are countable, then so are all chains in . This tells us that the only possible value for is . But then cannot be satisfied.
A small modification of the example shows that we cannot replace the requirement that has only countable increasing chains in Theorem 3.1 with the simpler requirement that has only countable increasing chains: Let , and let iff and . Then has only countable increasing chains, but still has the chain of chains given by as in Example 3.1.
3.2 Uncountably long chains of chains
Unfortunately, we can design a game such that there exists an uncountable increasing chain of increasing chains. Thus the existence of a maximal element above any chain is not guaranteed by Theorem 3.1. In fact, we will see that the chain of chains of uncountable length we construct is not below any maximal chain.
We consider a variant of the Help-me? game (Example 2.1), depicted in Figure 1(a). The strategies of the protagonist in this game can be described by functions describing how often the protagonist is willing to repeat the second loop (between and ) given the number of repetitions the antagonist made in the first loop (at ). With the same reasoning as in Example 2.1 we find that the strategy corresponding to a function dominates the strategy corresponding to iff and .
Let denote the set of functions . For , let denote that .
There is an embedding of into the strategies of the game in Example 3.2 ordered by dominance such that no strategy in the range of embedding is dominated by a strategy outside the range of the embedding.
Proposition (111This result is adapted from an answer by user Deedlit on math.stackexchange.org [stackexchange].).
For every chain in there exists a chain of chains of length with .
For each countable limit ordinal , we fix222We have no computability or other uniformity requirements to satisfy, and can thus just invoke the axiom of choice. Otherwise, as discussed e.g. in [promel, Section 3.1] this approach would fail. some fundamental sequence of ordinals with and .
Let . Let , and for limit ordinals , let .
Claim: If , then .
It suffices to show that if , then for all greater than some . If , this is immediate already for . For a limit ordinal, we note that for .
The claim then follows by induction over . Recall that if is a limit ordinal and , then there is some with . Since for any given , the ordinals between and we will need to inspect in the induction form a decreasing chain, there are only finitely many such ordinals. In particular, the maximum of all thresholds we encounter is well-defined. ∎
Claim: If , then .
Due to transitivity of and the previous claim, it suffices to show that . Write . Assume the contrary, i.e. that for all there exists some such that for all and for all we have that . In particular, for we would have that , and then setting , that , which is a contradiction. ∎
The game in Example 3.2 has uncountably long chains of chains not below any maximal chains.
3.3 Chains over countable quasiorders
Our proof of Proposition 3.2 crucially relied on functions of type with arbitrarily high rate of growth. In concrete applications such functions would typically be unwelcome. In fact, for almost all classes of games of interest in (theoretical) computer science, a countable collection of strategies suffices for the players to attain their attainable goals. Restricting to computable strategies often makes sense. Many games played on finite graphs are even finite-memory determined (see [paulyleroux4] for how this extends to the quantitative case), and thus strategies implementable by finite automata are all that need to be considered.
Restricting consideration to a countable set of strategies indeed circumvents the obstacle presented by Proposition 3.2. The reason is that the cardinality of the length of a chain of chains cannot exceed that of the underlying quasiorder :
For any increasing chain in we find that .
Let . We find that for any as a direct consequence of