While two-player zero-sum games played on graphs are the most studied model to formalize and solve the reactive synthesis problem [PnueliR89], recent work has considered non-zero-sum extensions of this mathematical framework, see e.g. [KHJ06, FismanKL10, BRS14, KupfermanPV14, BRS-concur15, brenguier_et_al:LIPIcs:2016:6877, DBLP:conf/icalp/ConduracheFGR16, brenguier_et_al:LIPIcs:2017:7806, DBLP:conf/csl/BassetJPRB18], see also the surveys [GU08, BrenguierCHPRRS16, Bruyere17]. In the zero-sum game approach, the system and the environment are considered as monolithic and fully adversarial entities. Unfortunately, both assumptions may turn to be too strong. First, the reactive system may be composed of several components that execute concurrently and have their own purpose. So, it is natural to model such systems with multiplayer games with each player having his own objective. Second, the environment usually has its own objective too, and this objective is usually not the negation of the objective of the reactive system as postulated in the zero-sum case. Therefore, there are instances of the reactive synthesis problem for which no solution exists in the zero-sum setting, i.e. no winning strategy for the system against a completely antagonistic environment, while there exists a strategy for the system which enforces the desired properties against all rational behaviors of the environment pursuing its own objective.
While the central solution concept in zero-sum games is the notion of winning strategy, it is well known that this solution concept is not sufficient to reason about non-zero-sum games. In non-zero-sum games, notions of equilibria are used to reason about the rational behavior of players. The celebrated notion of Nash equilibrium (NE) [nash50] is one of the most studied. A profile of strategies is an NE if no player has an incentive to deviate, i.e. change his strategy and obtain a better reward, when this player knows that the other players will be playing their respective strategies in the profile. A well-known weakness of NE in sequential games, which include infinite duration games played on graphs, is that they are subject to non-credible threats: decisions in subgames that are irrational and used to threaten the other players and oblige them to follow a given behavior. To avoid this problem, the concept of subgame perfect equilibria (SPE) has been proposed, see e.g. [osbornebook]. SPEs are NEs with the additional property that they are also NEs in all subgames of the original game. While it is now quite well understood how to handle NEs algorithmically in games played on graphs [Ummels08, UmmelsW11, BPS13, DBLP:conf/icalp/ConduracheFGR16], this is not the case for SPEs.
In this paper, we provide an algorithm to decide in polynomial space the constrained existence problem for SPEs in quantitative reachability games. A quantitative reachability game is played by players on a finite graph in which each player has his own reachability objective. The objective of each player is to reach his target set of vertices as quickly as possible. In a series of papers, it has been shown that SPEs always exist in quantitative reachability games [DBLP:journals/corr/abs-1205-6346], and that the set of outcomes of SPEs in a quantitative reachability game is a regular language which is effectively constructible [BrihayeBMR15]. As a consequence of the latter result, the constrained existence problem for SPEs is decidable.
Unfortunately, the proof that establishes the regularity of the set of possible outcomes of SPEs in [BrihayeBMR15] exploits a well-quasi order for proving termination and it cannot be used to obtain a good upper bound on the complexity for the algorithm. Here, we propose a new algorithm and we show that this set of outcomes can be represented using an automaton of size at most exponential. It follows that the constrained existence problem for SPEs can be decided in PSPACE. We also provide a matching lower-bound showing that this problem is PSPACE-complete.
Our new algorithm iteratively builds a set of constraints that exactly characterize the set of SPEs in quantitative reachability games. This set of constraints is obtained by iterating an operator that reinforces the constraints up to obtaining a fixpoint. A careful inspection of the computation allows us to establish PSPACE membership.
Algorithms to reason on NEs in graph games are studied in [Ummels08] for -regular objectives and in [UmmelsW11, BPS13] for quantitative objectives. Algorithms to reason on SPEs are given in [Ummels06] for -regular objectives. Quantitative reachability objectives are not -regular objectives. Reasoning about NEs and SPEs for -regular specifications can also be done using strategy logics [DBLP:journals/iandc/ChatterjeeHP10, mogavero_et_al:LIPIcs:2010:2859].
Other notions of rationality and their use for reactive synthesis have been studied in the literature: rational synthesis in cooperative [FismanKL10] and adversarial [KupfermanPV14] setting, and their algorithmic complexity has been studied in [DBLP:conf/icalp/ConduracheFGR16]. Extensions with imperfect information have been investigated in [DBLP:conf/lics/FiliotGR18]. Synthesis rules based on the notion of admissible strategies have been studied in [berwanger07, BRS14, BRS-concur15, brenguier_et_al:LIPIcs:2016:6877, brenguier_et_al:LIPIcs:2017:7806, DBLP:conf/csl/BassetJPRB18]. Weak SPEs have been studied in [BrihayeBMR15, Bruyere0PR17, DBLP:journals/corr/abs-1809-03888] and shown to be equivalent to SPEs for quantitative reachability objectives.
Structure of the paper
In Section 2, we recall the notions of graph games and (very weak/weak) SPEs, we introduce the notion of extendend games and we state the studied constrained existence problem. In Section 3, we provide a way to characterize the set of plays that are outcomes of SPEs and give an algorithm to construct this set. This algorithm relies on the computation of a sequence of labeling functions until reaching a fixpoint such that the plays which are -consistent are exactly the plays which are outcomes of SPEs. In Section 4, given a labeling function , we introduce the notion of counter graph in which infinite paths correspond to consistent plays. We also show that each such graph has an exponential size. In Section 5, using counter graphs, we prove the PSPACE-easyness of the constrained existence problem. We conclude by proving the PSPACE-hardness of this problem.
In this section, we recall the notions of quantitative reachability game and subgame perfect equilibrium. We also state the problem studied in this paper and our main result.
2.1 Quantitative reachability games
An arena is a tuple where is a finite set of players, is a finite set of vertices, is a partition of between the players, and is a set of edges such that for all there exists such that . Without loss of generality, we suppose that .
A play in is an infinite sequence of vertices such that for all , . A history is a finite sequence with defined similarly. The length of is the number of its edges. We denote the set of plays by and the set of histories by (when it is necessary, we use notation and to recall the underlying arena ). Moreover, the set is the set of histories such that their last vertex is a vertex of player , i.e. .
Given a play and , the prefix of is denoted by and its suffix is denoted by . A play is called a lasso if it is of the form with . Notice that is not necessary a simple cycle. The length of a lasso is the length of .
Given an arena , we denote by the set of successors of , for , and by the transitive closure of .
A quantitative game is an arena equipped with a cost function profile such that each function assigns a cost to each play. In a quantitative game , an initial vertex is often fixed, and we call an initialized game. A play (resp. a history) of is then a play (resp. a history) of starting in . The set of such plays (resp. histories) is denoted by (resp. ). We also use notation when these histories end in a vertex .
In this article we are interested in quantitative reachability games such that each player has a target set of vertices that he wants to reach. The cost to pay is equal to the number of edges to reach the target set, and each player aims at minimizing his cost.
[Quantitative reachability game] A quantitative reachability game is a tuple such that
is an arena;
for each , is the target set of player ;
for each and each , is equal to the least index such that , and to if no such index exists.
In the sequel of this document, we simply call such a game a reachability game. Notice that the cost function used for reachability games can be supposed to be continuous in the following sense [DBLP:journals/corr/abs-1205-6346]. With endowed with the discrete topology and with the product topology, a sequence of plays converges to if every prefix of is a prefix of all except, possibly, finitely many of them. A cost function is continuous if whenever , we have that . In reachability games, the function can be transformed into a continuous one as follows: if , and otherwise.
Given a quantitative game , a strategy of a player is a function . This function assigns to each history , with , a vertex such that . In an initialized game , needs only to be defined for histories starting in . A play is consistent with if for all we have that . A strategy is positional if it only depends on the last vertex of the history, i.e., for all . It is finite-memory if it can be encoded by a finite-state machine where is a finite set of states (the memory of the strategy), is the initial memory state, is the update function, and is the next-action function. The machine defines a strategy such that for all histories , where extends to histories as expected. The size of the strategy is the size of its machine . Note that is positional when .
A strategy profile is a tuple of strategies, one for each player. It is called positional (resp. finite-memory) if for all , is positional (resp. finite-memory). Given an initialized game and a strategy profile , there exists an unique play from consistent with each strategy . We call this play the outcome of and it is denoted by . Let , we say that is a strategy profile with cost or that has cost if for all .
2.2 Solution concepts and constraint problem
In the multiplayer game setting, the solution concepts usually studied are equilibria (see [GU08]). We here recall the concepts of Nash equilibrium and subgame perfect equilibrium.
Let be a strategy profile in an initialized game . When we highlight the role of player , we denote by where is the profile . A strategy is a deviating strategy of player , and it is a profitable deviation for him if .
The notion of Nash equilibrium is classical: a strategy profile in an initialized game is a Nash equilibrium (NE) if no player has an incentive to deviate unilaterally from his strategy, i.e. no player has a profitable deviation. Formally, is an NE if for each and each deviating strategy of player , we have .
When considering games played on graphs, a useful refinement of NE is the concept of subgame perfect equilibrium (SPE) which is a strategy profile being an NE in each subgame. It is well-known that contrarily to NEs, SPEs avoid non-credible threats [GU08]. Formally, given a quantitative game , an initial vertex , and a history , the initialized game is called a subgame of such that and for all and . Notice that is subgame of itself. Moreover if is a strategy for player in , then denotes the strategy in such that for all histories , . Similarly, from a strategy profile in , we derive the strategy profile in .
[Subgame perfect equilibrium] Let be an initialized game. A strategy profile is a subgame perfect equilibrium in if for all , is an NE in .
It is proved in [DBLP:journals/corr/abs-1205-6346] that there always exists an SPE in reachability games. In this paper, we are interested in solving the following constraint problem.
[Constraint problem] Given (
an initialized reachability game and two threshold vectors, the constraint problem is to decide whether there exists an SPE in with cost such that , that is, for all .
Our main result is the following one:
The constraint problem for initialized reachability games is PSPACE-complete.
The sequel of the paper is devoted to the proof of this result. Let us first illustrate the introduced concepts with an example.
A reachability game with two players is depicted in Figure 1. The circle vertices are owned by player whereas the square vertices are owned by player . The target sets of both players are respectively equal to (grey vertices), (double circled vertices).
The positional strategy profile is depicted by double arrows, its outcome in is equal to with cost . Let us explain that is an NE. Player 1 reaches his target set as soon as possible and has thus no incentive to deviate. Player 2 has no profitable deviation that allows him to reach . For instance if he uses a deviating positional strategy such that , then the outcome of is equal to with cost which is not profitable for player .
One can verify that the strategy profile is also an SPE. For instance in the subgame with , we have such that . In this subgame, with , both players reach their target set as soon as possible and have thus no incentive to deviate.
Consider now the positional strategy profile such that , , and . Its outcome in is equal to with cost . Let us explain that is an NE. On one hand, since player 2 never goes to , player 1 has no incentive to deviate, as his target is only accessible from . On the other hand, if player 2 deviates and chooses to go to , his cost is still since player 1 goes to . However, the strategy profile is not an SPE. Indeed, it features an non-credible threat by player 1: consider the history and the corresponding subgame . In that case, player 1 has an incentive to deviate to reach to yield a cost of . Thus, the strategy profile is not a NE in the subgame , and thus is not an SPE in . ∎
2.3 Weak SPE, very weak SPE and extended game
In this section, we present two important tools that will be repeatedly used in the sequel. First, we explain that in reachability games, the notion of SPE can be replaced by the simpler notion of very weak SPE. Second we present an extended version of a reachability game where the vertices are enriched with the set of players that have already visited their target sets along a history. Working with this extended game is essential to prove that the constraint problem for reachability games is in PSPACE.
We begin by recalling the concepts of weak and very weak SPE introduced in [BrihayeBMR15, Bruyere0PR17]. Let be an initialized game and be a strategy profile. Given , we say that a strategy is finitely deviating from if and only differ on a finite number of histories, and that is one-shot deviating from if and only differ on . A strategy profile is a weak NE (resp. very weak NE) in if, for each player , for each finitely deviating (resp. one-shot) strategy of player from , we have . A strategy profile is a weak SPE (resp. very weak SPE) in if, for all , is a weak (resp. very weak) NE in .
From the given definitions, every SPE is a weak SPE, and every weak SPE is a very weak SPE. It is known that weak SPE and very weak SPE are equivalent notions and that there exist initialized games that have a weak SPE but no SPE; nevertheless, all three concepts are equivalent for initialized reachability games [BrihayeBMR15, Bruyere0PR17].
[[BrihayeBMR15, Bruyere0PR17]] Let be a initialized reachability game and be a strategy profile in . Then is an SPE if and only if is a weak SPE if and only if is a very weak SPE.
Let us now recall the notion of extended game for a given reachability game (see e.g. [DBLP:journals/corr/abs-1809-03888]). The vertices of the extended game store a vertex as well as a subset of players that have already visited their target sets.
[Extended game] Let be a reachability game with an arena , and let be an initial vertex. The extended game of is equal to with the arena , such that:
if and only if and
if and only if
if and only if
for each , is equal to the least index such that , and to if no such index exists.
The initialized extended game associated with the initialized game is such that with .
Notice the way each target set is defined: if , then but also for all . In the sequel, to avoid heavy notations, each cost function will be simply written as .
Let us state some properties of the extended game. First, notice that for each , we have the next property called -monotonicity:
Second, given an initialized game and its extended game , there is a one-to-one correspondence between plays in and plays in :
from , we derive such that is the set of players that have seen their target set along ;
from , we derive such that the second components , , are omitted.
Third, given , we have that , and conversely given , we have that . It follows that outcomes of SPE can be equivalently studied in and in , as stated in the next lemma.
If is the outcome of an SPE in , then is the outcome of an SPE in with the same cost. Conversely, if is the outcome of an SPE in , then is the outcome of an SPE in with the same cost.
By construction, the arena of the initialized extended game is divided into different regions according to the players who have already visited their target set. Let us provide some useful notions with respect to this decomposition. We will often use them in the following sections. Let be the set of sets accessible from the initial state , and let be its size. For , if there exists , we say that is a successor of and we write . Given , refers to the sub-arena of restricted to the vertices We say that is the region111In the sequel, we indifferently call region either , or , or . associated with . Such a region is called a bottom region whenever .
There exists a partial order on such that if and only if . We fix an arbitrary total order on that extends this partial order as follows:
(with a bottom region).222We use notation , , to avoid any confusion with the sets appearing in a play . With respect to this total order, given , we denote by the sub-arena of restricted to the vertices .
[Region decomposition and section] Let be a (finite or infinite) path in . Then there exists a region decomposition of as
with , such that for each , :
is a (possibly empty) path in ,
every vertex of are of the form for some .
Each path is called a section. The last section is infinite if and only if is infinite.
is depicted; for the moment the reader should not consider the labeling indicated under or next to the vertices). As we can see, the extended game is divided into three different regions: one region associated tothat contains the initial vertex , a second region associated to , and a third bottom region associated to . Hence the set is totally ordered as .
For all the vertices of the region associated with , we have and , and for those of the region associated with , we have .
From the SPE given in Example 2.2 with outcome and cost , we derive the SPE outcome equal to
with the same cost . The region decomposition of is equal to such that , is empty, and . ∎
In this section, given an initialized reachability game , we characterize the set of plays that are outcomes of SPEs, and we provide an algorithm to construct this set. For this characterization, by Lemma 2.3, we can work on the extended game instead of . Moreover by Proposition 2.3, we can focus on very weak SPEs only since they are equivalent to SPEs. Such a characterization already appears in [BrihayeBMR15] however with a different algorithm that cannot be used to obtain good complexity upper bounds for the constraint problem.
All along this section, when we refer to a vertex of , we use notation (instead of ) and notation means the second component of this vertex.
Our algorithm iteratively builds a set of constraints imposed by a labeling function such that the plays of the extended game satisfying those constraints are exactly the SPE outcomes. Let us provide a formal definition of such a function with the constraints that it imposes on plays.
[-consistent play] Let be the extended game of a reachability game , and be a labeling function. Given , for all plays , we say that is -consistent if for all and such that :
We denote by the set of plays that are -consistent.
Thus, a play is -consistent if for all its suffixes , if player owns then the number of edges to reach his target set along is bounded by . Before going into the details of our algorithm, let us intuitively explain on an example how a well-chosen labeling function characterizes the set of SPE outcomes.
We consider the extended game of Figure 2, and a labeling function whose values are indicated under or next to each vertex.
If is labeled by , then if , this means that player will only accept outcomes in that reach his target set within steps, otherwise he would have a profitable deviation. If , this means that player has already reached his target set, and if , player has no profitable deviation whatever outcome is proposed to him.
In Example 2.3 was given the SPE outcome equal to
and with cost . We have and player reaches his target set from within exactly steps. The constraints imposed by on the other vertices of are respected too.
Recall now the strategy profile with outcome described in Example 2.2. The outcome is not -consistent since player does not reach his target set, and so in particular he does not reach his target set within steps. We already know that is not an SPE from Example 2.2. In fact all the profiles that yield as an outcome are not SPEs. ∎
Our algorithm roughly works as follows: the labeling function that characterizes the set of SPE outcomes is obtained from an initial labeling function that imposes no constraints, by iterating an operator that reinforces the constraints step after step, up to obtaining a fixpoint which is the required function . Thus, if is the labeling function computed at step and , , the related sets of -consistent plays, initially we have , and step by step, the constraints imposed by become stronger and the sets become smaller, until a fixpoint is reached.
Initially, we want a labeling function that imposes no constraint on the plays. We could define as the constant function . We proceed a little bit differently. Indeed recall the definition of the target sets in an extended game (see Definition 2.3): if and only if . Hence, given , once for some then for all . It follows that for all , and the inequality (3) is trivially true. (See also Example 3.) Therefore we define the labeling function as follows.
[Initial labeling] For all , let be such that ,
if and only if .
Let us now explain how our algorithm computes the labeling functions , , and the related sets , . It works in a bottom-up manner, according to the total order of given in (2): it first iteratively updates the labeling function for all vertices of the arena until reaching a fixpoint in this arena, it then repeats this procedure in , , , . Hence, suppose that we currently treat the arena and that we want to compute from . We define the updated function as follows (we use the convention that ).
[Labeling update] Let and suppose that we treat the arena , with . For all ,
if , let be such that , then
if , then
Let us provide some explanations. As this update concerns the arena , we keep outside of this arena. Suppose now that belongs to the arena and . We define whenever (as already explained for the definition of ). When it is updated, the value represents what is the best cost that player can ensure for himself with a “one-shot” choice by only taking into account plays of with .
Notice that it makes sense to run the algorithm in a bottom-up fashion according to the total ordering since given a play , if is a vertex of , then for all , is a vertex of (by -monotonicity). Moreover running the algorithm in this way is essential to prove that the constraint problem for reachability games is in PSPACE.
We consider again the extended game of Figure 2 with the total order of its set .
Let us illustrate Definition 3 on the arena . Let and suppose that the labeling function has been computed such that , and (notice is not the labeling indicated in Figure 2). Let us show how to compute . We need to compute for the two successors of , that is, respectively and . Recall that is the set of all plays -consistent from . Thus, as , the set consists only of the plays that go to , as any play that goes to has at least cost for player 1. All the plays in have cost for player 1 and player 2. Thus, . On the other hand, as , the play is -consistent and belongs to . Thus, . Hence, the minimum is attained with successor , and . ∎
We can now provide our algorithm that computes the sequence until a fixpoint is reached (see Proposition 1 below). Initially, the labeling function is (see Definition 3). For the next steps , we begin with the bottom region of and update to as described in Definition 3. At some point, the values of do not change anymore in and reaches locally (on ) a fixpoint (see again Proposition 1). Now, we consider the arena