Perfect information turn based two-player games on a graph  are widely studied in computer science. Indeed, they are a useful tool for both theoretical (for instance the modern proofs of Rabin’s complementation lemma rely on the memoryless determinacy of parity games ) and more practical applications. On the practical side, a major application of games is for the verification of reactive open systems. Those are systems composed of both a program and some (possibly hostile) environment. The verification problem consists of deciding whether the program can be restricted so that the system meets some given specification whatever the environment does. Here, restricting the system means synthesizing some controller, which, in term of games, is equivalent to designing a winning strategy for the player modeling the program .
The perfect information turn-based model, even if it suffices in many situations, is somewhat weak for the following two reasons. First, it does not permit to capture the behavior of real concurrent models where, in each step, the program and its environment independently choose moves, whose parallel execution determines the next state of the system. Second, in this model both players have, at each time, a perfect information on the current state of the play: this, for instance, forbids to model a system where the program and the environment share some public variables while having also their own private variables .
In this paper, we remove those two restrictions by considering concurrent stochastic games with imperfect information. Those are finite states games in which, at each round, the two players choose simultaneously and independently an action. Then a successor state is chosen accordingly to some fixed probability distribution depending on the previous state and on the pair of actions chosen by the players. Imperfect information is modeled as follows: both players have an equivalence relation over states and, instead of observing the exact state, they only see to which equivalence class it belongs. Therefore, if two partial plays are indistinguishable by some player, he should behave the same in both of them. Note that this model naturally captures several model studied in the literature [1, 9, 7, 8]. The winning conditions we consider here are reachability (is there a final state eventually visited?), Büchi (is there a final state that is visited infinitely often?) and their dual versions, safety and co-Büchi.
We study qualitative properties of those games (note that quantitative properties — e.g. deciding whether the value of the game is above a given threshold — are already undecidable in much weaker models ). More precisely, we investigate the question of deciding whether some player can almost-surely win, that is whether he has a strategy that wins with probability against any counter strategy of the oponent. Our main contributions is to prove that, for both reachability and Büchi objectives, one can decide, in doubly exponential time (which is proved to be optimal), whether the first player has an almost-surely winning strategy. Moreover, when it is the case, we are also able to construct such a finite-memory strategy. We also provide intermediate new results concerning positive winning in safety (and co-Büchi) -player games (a.k.a
partial observation Markov decision process).
Related work. Concurrent games with perfect information have been deeply investigated in the last decade [2, 1, 7]. Games with imperfect information have been considered for turn-based model  as well as for concurrent models with only one imperfectly informed player [9, 8]. To our knowledge, the present paper provides the first positive results on a model of games that combines concurrency, imperfect information (on both sides) and stochastic transition function. In a recent independent work , Bertrand, Genest and Gimbert obtain similar results than the one presented here for a closely related model. The main differences with our model are the following: Bertand et al. consider a slightly weaker model of games in which the players may observe their own actions, and they allow the players to use richer strategies where the players can randomly update their memory (note that those strategies when used in our model seem strictly more powerful than the one we consider ). Bertand et al. also discuss qualitative determinacy results and consider the case where a player is more informed than the other. We refer the reader to  for a detailed exposition.
A probability distribution over a finite set is a mapping such that . In the sequel we denote by the set of probability distributions over .
Given some set and some equivalence relation over , stands for the equivalence class of for and denotes the set of equivalence classes of .
For some finite alphabet , (resp. ) designates the set of finite (resp. infinite) words over .
A concurrent arena with imperfect information is a tuple
is a finite set of control states;
(resp. ) is the (finite) set of actions for Eve (resp. Adam);
is the transition (total) function;
and are two equivalence relations over states.
A play in a such an arena proceeds as follows. First it starts in some initial state . Then Eve picks an action and, simultaneously and independently, Adam chooses an action . Then a successor state is chosen accordingly to the probability distribution . Then the process restarts: the players choose a new pair of actions that induces, together with the current state, a new state and so on forever. Hence a play is an infinite sequence in such that for every , there exists with . In the sequel we refer to a prefix of a play as a partial play and we denote by the set of all plays in arena .
The intuitive meaning of (resp. ) is that two states and such that (resp. ) cannot be distinguished by Eve (resp. by Adam). We easily extend the relation to partial plays: let and be two partial plays, then if and only if for all .
In order to choose their moves the players follow strategies, and, for this, they may use all the information they have about what was played so far. However, if two partial plays are equivalent for , then Eve cannot distinguish them, and should therefore behave the same. This leads to the following notion.
An observation-based strategy for Eve is a function , i.e., to choose her next action, Eve considers the sequence of observations she got so far. In particular, a strategy is such that whenever . Observation-based strategies for Adam are defined similarly.
Of special interest are those strategies that does not require memory: a memoryless observation-based strategies for Eve is a function from , that is to say these strategies only depend of the current equivalence class.
A uniform strategy for some player is a strategy such that for all partial play , the probability measure is uniform, i.e., for all action , either or . The set of memoryless uniform strategies for is a finite set containing elements. Equivalently those strategies can be seen as functions to (non-empty) sets of (authorised) actions.
A finite-memory strategy for Eve with memory ( being a finite set) is some triple where is the initial memory, associates a distribution of actions with any element in the memory and is a mapping updating the memory with respect to some observation. One defines and for any . Hence, a finite-memory strategy is some observation-based strategy that can be implemented by a finite transducer whose set of control states is .
Note that in our definition of a strategy (and more generally in the definition of a play) we implicitly assume that the players only observe the sequence of states and not the corresponding sequence of actions. While the fact that Eve does not observe what Adam played is rather fair (otherwise imperfect information on states would make less sense) one could object that Eve should observes the actions she played so far. Here, our view of a (randomised) strategy is the following: when Eve respects some strategy, it means that whenever she has to play, her strategy provides her a distribution that she sends to some scheduler that, together with the distribution chosen by Adam, picks the next state. Indeed, it permits for instance to model a system in which some agent does not have the resources to implement himself randomisation.
An alternative option would be to consider that Eve flips a coin to pick her action and then sends this action to the scheduler that, together with the action chosen by Adam, picks the next state. In this case, a strategy should depend on the sequence of states together with the associated sequence of actions played by Eve. We argue that this second approach can be simulated easily by the first one, hence justifying our initial choice. Indeed, one can always enrich the set of states to encode the last pair of actions played and then use the equivalence relations / to hide / show part of this information to the respective players.
2.3 Probability Space and Outcomes of Strategies
Let be a concurrent arena with imperfect information, let be an initial state, be a strategy for Eve and be a strategy for Adam. In the sequel we are interested in defining the probability of a (measurable) set of plays knowing that Eve (resp. Adam) plays accordingly (resp. ). This is done in the classical way: first one defines the probability measure for basic sets of plays (called here cones and corresponding to plays having some initial common prefix) and then extends it in a unique way to all measurable sets.
First define to be the set of all possible plays when the game starts on and when Eve and Adam plays respectively accordingly to and . More formally, an infinite play belongs to if and only if, for every , there is a pair of actions with and s.t. and (i.e. is possible accordingly to , for ).
Now, for any partial play , the cone for is the set of all infinite plays with prefix . Denote by the set of all possible cones and let be the Borel -field generated by considered as a set of basic open sets (i.e. is the smallest set containing and closed under complementation, countable union and countable intersection). Then is a -algebra.
A pair of strategies induces a probability space over . Indeed one can define a measure on cones (this task is easy as a cone is uniquely defined by a finite partial play) and then uniquely extend it to a probability measure on using the Carathéodory Unique Extension Theorem. For this, one defines inductively on cones:
if and otherwise.
For every partial play ending in some vertex ,
Denote by the unique extension of to a probability measure on . Then is a probability space.
2.4 Objectives, Value of a Game
Fix a concurrent arena with imperfect information . An objective for Eve is a measurable set : a play is won by her if it belongs to ; otherwise it is won by Adam. A concurrent game with imperfect information is a triple where is a concurrent arena with imperfect information, is an initial state and is an objective. In the sequel we focus on the following special classes of objectives (note that all of them are Borel sets hence measurable) that we define as means of a subset of final states.
A reachability objective is of the form : a play is winning if it eventually goes through some final state.
A safety objective is the dual of a reachability objective, i.e. is of the form : a play is winning if it never goes through a final state.
A Büchi objective is of the form : a play is winning if it goes infinitely often through final states.
A co-Büchi objective is the dual of a Büchi objective, i.e. is of the form : a play is winning if it goes finitely often through final states.
A reachability (resp. safety, Büchi, co-Büchi) game is a game equipped with a reachability (resp. safety, Büchi, co-Büchi) objective. In the sequel we may replace by when it is clear from the context which winning condition we consider.
Fix a concurrent game with imperfect information . A strategy for Eve is almost-surely winning if, for any counter-strategy for Adam, . If such a strategy exists, we say that Eve almost-surely wins . A strategy for Eve is positively winning if, for any counter-strategy for Adam, . If such a strategy exists, we say that Eve positively wins .
3 Knowledge Arena
For the rest of this section we let be a concurrent arena with imperfect information with and let be some initial state.
Remark that in our model, the players do not observe the actions they play but they may know the distribution they have chosen. Therefore, one could consider a new arena in which the states have a component indicating the domain of the last distribution chosen by Eve, and that this component is visible only to her (it is hidden to Adam by the equivalence relantion ).
Even, if she does not see the precise control state, Eve can deduce information about it from previous information on the control state and from the set of possible actions she just played. We should refer to this as the knowledge of Eve, which formally is a set of states. Assume Eve knows that the current state belongs to some set . After the next move Eve observes the equivalence class of the new control state and she also knows the subset of actions she may have played (it is the domain of the distribution she chose): hence she can compute the set of possible states the play can be in. This is done using the function defined by letting
i.e. in order to update her current knowledge, observing in which equivalence class is the new control state, and knowing that she played an action in , Eve computes the set of all states in this class that may be reached from a state in her former knowledge.
Based on our initial remark and on the notion of knowledge we define the knowldege arena associated with , denoted . The arena is designed to make explicit the information Eve can collect on her moves (i.e. the domain of the distributions she plays) and on the possible current state. We define as follows:
: the first component is the real state, the second one is the current knowledge of Eve and the third one is the domain of the last distribution she played;
: actions of Eve will now contain information on the domain of actions of the distributions she picks;
if or ; and
otherwise: behaves as on the first components and deterministically updates both the knowledge and the information on the domain;
if and only if (implying ) and : Eve observes her knowledge and the domain of her last distribution;
if and only if .
The intuitive meaning of the enriched alphabet of Eve is that instead of choosing a distribution Eve makes the domain of explicit by choosing the distribution where if and otherwise. We call such a distribution well-formed (i.e. is obtained from some distribution as just explained) and, in the sequel, whenever referring to strategies of Eve in , we will mean functions from sequences of observations into well-formed distributions.
Consider an observation-based strategy for Eve in the arena . Then it can be converted into an observation-based strategy on the associated knowledge arena. For this, remark that in the knowledge arena, those states reachable from the initial state are of the form with all states in being equivalent with with respect to . Then one can define as where is the corresponding distribution given by . Note that is observation-based as, for all , is uniquely defined from the , that are observed by Eve in the knowledge arena.
Conversely, any observation-based strategy in the knowledge arena can be converted into an observation-based strategy in the original arena. Indeed, consider some observation-based strategy in the knowledge arena: it is a mapping from into (the equivalent classes of the relation are, by definition, isomorphic with ). Now, note that Eve can, while playing in , remember the domain of the distributions she played and compute on the fly her current knowledge (applying function to her previous knowledge and to the domain of the last distribution played): hence along a play she can compute the corresponding sequence of knowledge / domain. Now it suffices to consider the observation-based strategy for Eve in the initial arena defined by:
Note that this last transformation (taking a strategy and producing a strategy ) is the inverse of the first transformation (taking a strategy and producing a strategy ). In particular, it proves that the observation-based strategies in both arena are in bijection. It should be clear that those strategies for Adam in both games are the same (as what he observes is identical).
Assume that is equipped with a set of final states. Then one defines the final states in by letting : this allows to define an objective in from an objective in . Based on the previous observations, we derive the following.
Let be some imperfect information game equipped with a reachability (resp. saftey, Büchi, co-Büchi) objective. Let be the associated game played on the knowledge arena. Then for any strategies for Eve and Adam, the following holds:
. In particular, Eve has an almost-surely winning observation-based strategy in if and only if she has one in .
In the setting of the previous proposition, consider the special case where Eve has an almost-surely winning observation-based strategy in that only depends on the current knowledge (in particular, it is memoryless). Then the corresponding almost-surely winning observation-based strategy in is, in general, not memoryless, but can be implemented by a finite transducer whose set of control states is precisely the set of possible knowledges for Eve. More precisely the strategy consists in computing and updating on the fly (using a finite automaton) the value of the knowledge after the current partial play and to pick the next action by solely considering the knowledge. We may refer at such a strategy as a knowledge-only strategy.
4 Decidability Results
4.1 Reachability Objectives
The main result of this section is the following.
For any reachability concurrent game with imperfect information, one can decide, in doubly exponential time, whether Eve has an almost-surely winning strategy. If Eve has such a strategy then she has a knowledge-only uniform strategy, and such a strategy can be effectively constructed.
Before proving Theorem 4.1 we first establish an intermediate result. A concurrent game (with imperfect information) in which one player has only a single available action is what we refer as a -player game with imperfect information (those games are also known in the literature as partially observable Markov Decision Processes). The following result is a key ingredient for the proofs of Proposition 2 and Theorem 4.1.
Consider an -player safety game with imperfect information. Assume that the player has an observation-based strategy that is positively winning. Then she also has an observation-based finite memory strategy that is positively winning. Moreover, both the strategy and the set of positively winning states can be computed in time .
Consider the knowledge arena and call a knowledge surely winning if the player has a knowledge based strategy that is surely winning from any with and . We prove, that if the player has a positively winning strategy, then the set of winning knowledges is non empty and that it comes with a memoryless surely winning strategy (that consists in staying in the surely winning component). This set also contains at least a singleton (meaning that if the player knows that she is in then she can surely win): call such states surely winning. Then, one proves that positively winning states are exactly those that are connected (in the graph sense) to some surely winning state by a path made of non-final states. Hence a positively winning strategy consists in playing some initial actions randomly (trying to reach a surely winning state) and then in mimicking a knowledge-only surely winning strategy. Complexity comes with a fixpoint definition of the previous objects. ∎
Fix, for the rest of this section, a concurrent game with imperfect information equipped with a reachability objective defined from a set of final states. We set . We also consider to be the corresponding knowledge game.
To prove Theorem 4.1, one first defines (in a non constructive way) a knowledge-only uniform strategy for Eve as follows. We let
be the set of knowledges made only by almost-surely winning states for Eve (note here that we require that the almost-surely winning strategy is the same for all configurations with the same knowledge).
One can prove that, from a configuration with knowledge , Eve always has at least one action which ensures that she remains in , and we define as the knowledge-only uniform strategy that chooses at random one of these safe actions. The next proposition shows that is almost-surely winning for Eve.
The strategy is almost-surely winning for Eve from states whose Eve’s knowledge is in .
To prove that is almost-surely winning, one needs to prove that it is almost surely-winning against any strategy of Adam. However, once is fixed (and as it is a knowledge-only strategy), one gets -player game in which only Adam is making choices. Proving that is almost surely winning is therefore equivalent to proving that Adam cannot positively wins in this new game (for a safety objective). For this we use Lemma 1 to argue that it suffices to prove that is winning against any finite-memory strategy of Adam. This fact permits us to conclude.∎
Now one can prove Theorem 4.1. First Eve almost-surely wins in if and only if she almost-surely wins in if and only if , i.e. (using Proposition 2) if and only if Eve has a knowledge-only uniform strategy in . Now, to decide whether Eve almost-surely wins , it suffices to check, for any possible knowledge-only uniform strategy for her, whether it is almost-surely winning. Once is fixed, it leads, from Adam’s point of view, to a -player safety game where the player positively wins if and only if is not almost-surely winning. Hence Lemma 1 implies that deciding whether is almost-surely winning can be done in time exponential in the size of , which itself is of exponential size in . Hence deciding whether a knowledge-only uniform strategy for Eve is winning can be done in doubly exponential time (in the size of ). The set of knowledge-only uniform strategies for Eve is finite and its size is doubly exponential in the size of the game. Hence the overall procedure, that tests every possible such strategies, requires doubly exponential time. As effectivity is immediate, this concludes the proof of Theorem 4.1.
The naive underlying algorithm of Theorem 4.1 turns out to be optimal.
Deciding whether Eve almost-surely wins a concurrent game with imperfect information is a -ExpTime-complete problem.
The proof is a generalisation of a similar result given in  showing ExpTime-hardness of concurrent games only one
player is imperfectly informed. The idea is to simulate an alternating exponential space Turing machine (without input). We design a game where the players describe the run of such a machine: transitions from existential (resp. universal) states are chosen by Eve (resp. Adam) and Adam is also in charge of describing the successive configurations of the machine. To prevent him from cheating, Eve can secretly mark a cell of the tape, and latter check whether it was correctly updated (if not she wins). As she cannot store the exact index of the cell (it is of exponential size), she could cheat in the previous phase: hence Adam secretly marks some bit and one recall the value of the corresponding bit of the index of the marked cell: this bit is checked when Eve claims that Adam cheated (if it is wrong then she is loosing). Eve also wins if the described run is accepting. Eve can also restart the computation whenever she wants (this is useful when she cannot prove that Adam cheated): hence if the machine accepts the only option for Adam is to cheat, and Eve will eventually catch him with probability one. Now if the machine does not accept, the only option for Eve is to cheat, but it will be detected with positive probability.∎
4.2 Büchi Objectives
We now consider the problem of deciding whether Eve almost-surely wins a Büchi game. The results and techniques are similar to the one for reachability games. In particular, we need to establish the following intermediate result (the proof is very similar to the one of Lemma 1 except that now the winning states are those connected by any kind of path to a surely winning state).
Consider an -player co-Büchi game with imperfect information. Assume that the player has an observation-based strategy that is positively winning. Then she also has an observation-based finite memory strategy that is positively winning. Moreover, both the strategy and the set of positively winning states can be computed in time .
From Lemma 2 and extra intermediate results we derive our main result. Again, the key idea is to prove that the strategy that plays randomly inside the almost-surely winning region is an almost-surely winning strategy.
For any Büchi concurrent game with imperfect information, one can decide, in doubly exponential time, whether Eve has an almost-surely winning strategy. If Eve has such a strategy then she has a knowledge-based uniform memoryless strategy, and such a strategy can be effectively constructed. The doubly exponential time complexity bound is optimal.
The main contribution of this paper is to prove that one can decide whether Eve has an almost-surely winning strategy in a concurrent game with imperfect information equipped with a reachability objective or a Büchi objective.
A natural question is whether this result holds for other objectives, in particular for co-Büchi objectives. In a recent work , Baier et al. established undecidability of the emptiness problem for probabilistic Büchi automata on infinite words. Such an automaton can be simulated by a -player imperfect information game: the states of the game are the one of the automaton, they are all equivalent for the player, and therefore an observation based strategy is an infinite word. Hence a pure (i.e. non-randomised) strategy in such a game coincide with an input word for the automaton. From this fact, Baier et al. derived that it is undecidable whether, in a -player co-Büchi game with imperfect information, Eve has an almost-surely winning pure strategy.
One can also consider the stochastic-free version of this problem (an arena is deterministic iff for all ) and investigate whether one can decide if Eve has an almost-surely winning strategy in a deterministic game equipped with a co-Büchi objective. We believe that the -player setting can be reduced to this new one, hence allowing to transfer undecidability results . An even weaker model to consider is the stochastic-free model in which Adam has perfect information about the play .
It may happen that Eve has no almost-surely winning strategy while having a family of strategies such that ensures to win with probability at least . Such a family is called limit-surely winning. Deciding existence of such families is a very challenging problem: indeed, in many practical situations, it is satisfying enough if one can control the risk of failing. Even if those questions have been solved for perfect information games , as far as we know, there has not been yet any result obtained in the imperfect information setting.
Even if the algorithms provided in this paper are ”optimal”, they are rather naive (checking all strategies for Eve may cost a lot in practice). Hence, one should look for fixpoint-based algorithms as the one studied in : it would be of great help for a symbolic implementation, and it could also be a useful step toward a solution of the problem of finding limit-surely winning strategies. Note that there are already efficient techniques and tools for finding sure winning strategies in subclasses of concurrent games with imperfect information [6, 5].
Acknowledgements. The authors want to thank the anonymous reviewers as well as Florian Horn for their useful comments on preliminary versions of this work.
-  L. de Alfaro and T.A. Henzinger. Concurrent omega-regular games. In Proceedings of LICS’00, pages 141–154, 2000.
-  L. de Alfaro, T.A. Henzinger, and O. Kupferman. Concurrent reachability games. Theoretical Computer Science, 386(3):188–217, 2007.
-  C. Baier, N. Bertrand, and M. Größer. On decision problems for probabilistic büchi automata. In Proceedings of FoSSaCS 2008, volume 4962 of LNCS, pages 287–301. Springer, 2008.
-  N. Bertrand, B. Genest, and H. Gimbert. Qualitative Determinacy and Decidability of Stochastic Games with Signals. In Proceedings of LICS 2009. To appear.
-  D. Berwanger, K. Chatterjee, M. De Wulf, L. Doyen, and T.A. Henzinger. Alpaga: A tool for solving parity games with imperfect information. In Proceedings of TACAS 2009, volume 5505 of LNCS, pages 58-61. Springer, 2009.
-  D. Berwanger, K. Chatterjee, L. Doyen, T.A. Henzinger, and S. Raje. Strategy construction for parity games with imperfect information. In Proceedings of CONCUR 2008, volume 5201 of LNCS, pages 325–339. Springer, 2008.
-  K. Chatterjee. Stochastic -Regular Games. PhD thesis, University of California, 2007.
-  K. Chatterjee, L. Doyen, T.A. Henzinger, and J.-F. Raskin. Algorithms for omega-regular games with imperfect information. Logical Methods in Computer Science, 3(3), 2007.
-  K. Chatterjee and T.A. Henzinger. Semiperfect-information games. In Proceedings of FST&TCS 2005, volume 3821 of LNCS, pages 1–18. Springer, 2005.
-  E. Grädel, W. Thomas, and Th. Wilke, editors. Automata, Logics, and Infinite Games: A Guide to Current Research, volume 2500 of LNCS. Springer, 2002.
-  Y. Gurevich and L. Harrington. Trees, automata, and games. In Proceedings of STOC 1982, pages 60–65, 1982.
-  F. Horn. Private communication. February 2009.
-  A. Paz. Introduction to probabilistic automata. Academic Press New York, 1971.
-  P.J. Ramadge and W.M. Wonham. Supervisory Control of a Class of Discrete Event Processes. SIAM Journal on Control and Optimization, 25:206, 1987.
-  J.H. Reif. The complexity of two-player games of incomplete information. Journal of Computer and System Sciences, 29(2):274–301, 1984.