Deviator Detection under Imperfect Monitoring

12/27/2017 ∙ by Dietmar Berwanger, et al. ∙ 0

Grim-trigger strategies are a fundamental mechanism for sustaining equilibria in iterated games: the players cooperate along an agreed path, and as soon as one player deviates, the others form a coalition to play him down to his minmax level. A precondition to triggering such a strategy is that the identity of the deviating player becomes common knowledge among the other players. This can be difficult or impossible to attain in games where the information structure allows only imperfect monitoring of the played actions or of the global state. We study the problem of synthesising finite-state strategies for detecting the deviator from an agreed strategy profile in games played on finite graphs with different information structures. We show that the problem is undecidable in the general case where the global state cannot be monitored. On the other hand, we prove that under perfect monitoring of the global state and imperfect monitoring of actions, the problem becomes decidable, and we present an effective synthesis procedure that covers infinitely repeated games with private monitoring.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In social situations, a queue acts in a self-stabilising manner: when anyone tries to jump the queue, the others give him a dirty look, and often this suffices to enforce the rule. Distributed protocols are often designed to be self-stabilising in a similar sense: when a fault occurs, it is detected and isolated, and the system recovers to a state in such a way that the computation can proceed.

In general, the design of such systems assumes that all processes cooperate and coordinate their actions towards achieving the system goals. When a process deviates from the protocol (perhaps due to faults), the system needs to detect it and recover from the situation. Usually, if such deviation occurs in an isolated one-off manner, it may be hard to detect, but when they occur repeatedly, there is the possibility of detecting the culprit(s). This idea has been applied to intruder detection in security theory.

There are several interesting variations on this theme. One is to ask for protocols or solutions that do not demand detection of deviators but to merely provide resilience. Typical solutions in distributed computing assume a bound on the number of faulty processes and provide solutions that can tolerate that many failures. Another consideration relates to the observational limitations of processes. In a distributed system, processes have only a partial view of the system state, and often cannot observe all the moves of other processes, and these may help the deviator to evade detection successfully.

When distributed computing meets game theory, we have further interesting possibilities to consider. A player may act selfishly if it maximises her payoff, even if this means a deviation. In general the coalition may have no way to hold a member of the coalition to act according to prior commitments (dismissively labelled “cheap talk”). However, in the case of repeated play, there can be threats and punishment to ensure that members do not deviate. In this context, a variety of mechanisms are studied in game theory, notable among them the

grim-trigger strategies: start out co-operating, and when any player deviates, punish for ever after. The importance of such strategies is that they play a central role in what are referred to as Folk theorems in game theory.

Nash equilibria provide a robust way to predict how rational players would act offering their best responses to their beliefs about how others might act, in situations where players differ in their knowledge of what others might be doing. At Nash equilibrium, no player has an incentive to deviate unilaterally and shift to a different strategy. Folk theorems then assert that for any Nash equilibrium outcome vector

in an -player infinitely repeated game (with, say, limit average rewards), then each player can force the outcome . Conversely every “feasible” and enforceable outcome is the outcome of some Nash equilibrium.

Interestingly, these (and many related) results depend on the ability of players to perfectly monitor the actions of the deviator. In cases where players’ actions may not be directly observable, special solutions are needed, and game theorists have developed a powerful set of techniques for many subclasses of games [9, 12, 1, 14]. Note that this situation pertains more directly to distributed systems where processes are limited in their ability to observe the global state and to record other processes’ actions.

In this paper, we focus on computational questions related to deviator detection under imperfect monitoring. Can we have algorithms that determine, in a game, whether the deviator’s identity be rendered common knowledge ? If yes, we could like to construct strategies (for the other players) to achieve this. This problem is related to solving consensus problems on graphs with different communication topologies.

We propose a general technique based on methods from automated synthesis towards achieving an epistemic objective (involving common knowledge). With imperfect information, the synthesis problem is undecidable already with simple reachability objectives. So one expects only negative results for deviator detection as well. And indeed, deviator detection under imperfect information of game state is in general undecidable.

However, it is interesting to note that the essence of the deviator detection problem lies in monitoring of player actions and not necessarily game state. Indeed, in repeated games we often have no states at all ! We show in this paper that in such as case, the problem is tractable. The main idea is that the problem can then be studied in the setting of coordination games of incomplete information with finitely many states of nature. These are systems with perfect monitoring of state, and uncertainty for a player comes only from unobserved actions of other players. But then, these are games with bounded initial uncertainty that can only reduce as play progresses, and this observation leads us to an algorithmic solution.

While the main result of the paper is the assertion that deviator detection is decidable under imperfect monitoring of actions (with only a finite amount of uncertainty about the state), we see the contribution of the paper as twofold: to highlight a setting in game theory that is of interest to distributed computing; and to illustrate the use of epistemic objectives, that is of interest to games as well as distributed systems. We also suggest that methods from automated synthesis may offer new ways of describing sets of equilibrium solutions and for constructing equilibria, which could be of technical interest for bridging game theory and distributed systems [11].

2. Games

We model distributed systems as infinite games with finitely many states. There is a finite set of players. We refer to a list with one element for every player  as a profile. For any such profile, we write to denote the list where the component of Player  is omitted; for element and a list , we denote by the full profile .

Game structure.

To describe the game dynamics, we fix, for each player , a set  of actions, a set  of observations, and a set of local states — these are finite sets. We denote by , , and , the set of all profiles of actions, observations, and local states; a profile of local states is also called global state. Now, the game form is described by its transition function .

The game is played in stages over infinitely many periods starting from a designated initial state known to all players. In each period , starting in a state , every player  chooses an action . Then the transition determines the observation received privately by each player , and the the global successor state , from which the play proceeds to period .

Thus, a play is an infinite sequence following the transitions for all . A history is a finite prefix of a play. We refer to the number of stages played up to period  as the length of the history. The sequence of observations received by player  along a history is denoted by . We assume that each player  always knows his local state and the action she is playing, that is, these data are included in her private observation received in each round. However, she is not perfectly informed about the local states or the actions of the other players, therefore we speak of imperfect monitoring. The monitoring function  induces an indistinguishability relation between histories and plays: if, and only if, . This is an equivalence relation between game histories; its classes are called the information sets of player .

A strategy for player  is a mapping that prescribes an action for every observation sequence. Again, we denote by the set of all strategy profiles. We say that a history or a play follows a strategy , if , for all histories of length in . Likewise, a history or play follows a profile , if it follows the strategy of each player . The outcome of a strategy profile  is the unique play that follows it.

With the above definition of a strategy, we implicitly assume that players have perfect recall, that is, they may record all the information acquired along a play. Nevertheless, in certain cases, we can restrict our attention to strategy functions computable by automata with finite memory. In this case, we speak of finite-state strategies.

Strategy synthesis.

The task of automated synthesis is to construct finite-state strategies for solving games (presented in a finite way). Depending on the purpose of the model, the notion of solving has different meanings.

One prominent application area in distributed systems is concerned with synthesising coordination strategies for a coalition with common interests against a fixed adversary — the environment, or Nature [13, 17, 8]. For this purpose, we assume the coalition to be the set excluding a designated player . We are interested in win/lose games. The winning condition is described as a set of plays; a basic example are reachability winning conditions, which consist of all plays that reach a designated set of global states. Here, a solution is a distributed winning strategy for the coalition: a profile such that for all .

The distributed synthesis problem for coordination strategies asks: for a given game, determine whether there exists a profile of finite-state strategies for that is winning against player . This problem is well known to be undecidable, already for games with reachability winning conditions.

Theorem 1 ([16]).

The distributed synthesis problem is undecidable for reachability games for two players with imperfect information against an adversary.

If we consider non-zero sum games, where we have players with different, possibly overlapping objectives, a standard solution concept is that of Nash equilibrium. A theory of synthesis for such games is being developed in the last decade [10, 5, 4]. While there are several positive results for games of perfect information, synthesis questions are difficult in the context of imperfect information.

In this paper, we are interested in a specific case that lies between the approaches of distributed coordination and Nash equilibrium: synthesis of strategies for detecting an unknown adversary —if he should arise—  under conditions of imperfect monitoring. We now proceed to define and address this problem.

3. Deviator detection

Unlike the traditional setting concerned with temporal objectives on actions or states assumed in a play, we are interested here in epistemic objectives which refer to attaining knowledge that a certain event has occurred. For an introduction to knowledge in distributed systems, we refer to the book of Fagin, Halpern, Moses, and Vardi [7, Ch. 2].

Let us fix a game  with the usual notation. An event is a subset  of histories in . The event  occurs at history if . We say that (the occurrence of) is private knowledge of Player  at history , if for any , it holds that . Further, an event is common knowledge among the players of a coalition  at history if, for every sequence of histories and players such that , it is the case that .

Specifically, we are interested in the event that a player  has deviated from a given play. To describe this, we define for each play  and for every player , the event consisting of all histories that disagree with such that, for the first round where the prefixes and disagree, they differ only in the action of player . (Since the transition function is deterministic and the initial state is fixed, the first difference between two histories can only occur at an action profile). Obviously, the sets are suffix closed, that is, for each , any prolongation history belongs to as well. Likewise, if a coalition attains common knowledge of at a history , it attains common knowledge of at every prolongation history of .

Now, let be a designated set of plays in . A deviator detection strategy with respect to  is a strategy profile such that:

  1. the outcome belongs to , and

  2. for each player  and every strategy , if the outcome disagrees with , then the coalition attains common knowledge of at some history of .

The synthesis problem for deviator detection is the following: given a game  with a target set specified by a finite-state automaton, decide whether there exists a finite-state strategy profile for deviator detection with respect to  and, if so, construct one.

3.1. Deviator detection as a coordination problem

Alternatively, we can cast the deviator detection problem as a more standard problem of distributed synthesis with temporal objectives. Informally, this is done by adding a new player —Nature—  that can either remain silent, or at any point take over the identity of an actual player, deviate from his intended action, and continue playing on his behalf. The deviation of Nature takes the game into a fresh copy associated to the corrupted player  where the only way to win for the remaining coalition is by issuing a simultaneous action in which they all expose the identity of ; however, if the exposure action is not taken in consensus by all players, except for , the game is lost.

More precisely, we transform the deviator detection game  into a coordination game  against Nature —let us call it exposure game— as follows: First, we add for each player , actions  that allow to expose a deviation by player , that is, we set ; we use the shorthand to denote any action profile where the coalition chooses  in consensus. Further, we involve a new player with actions that allow him to stay silent (by choosing ) or to corrupt the action of any other player, that is . His local states are . Moreover, we include (global) sink states and . The observation sets remain unchanged. The transitions of the new game  follow the original transitions as long as Nature stays silent: for . When Nature decides to deviate from the intended action of a player , his local state changes from to : for and ; for the transitions in the game copy where player  is corrupted, we set for , and ; all other moves involving exposure actions lead to . The (temporal) winning condition  of the new game consists of the plays in where Nature remained silent and of the plays that reach .

We can analyse the exposure game in terms of the Knowledge of Preconditions principle formulated by Moses in [15]. Once a deviation occurred, the coalition can win only by reaching via a simultaneous consensus action which requires common knowledge of the identity of the deviator. Conversely, deviator detection strategies in can be readily used to win the exposure game.

Lemma 2.

Every coordination strategy for the coalition  in the game against  corresponds to a deviator-detection strategy with respect to in , and vice versa.

3.2. General undecidability

The translation of deviator detection games into coordination games shows the problem under a different angle, but it does not bring us closer to an algorithmic solution. In the general setting of imperfect monitoring, we obtain coordination games between multiple players with imperfect information, for which the synthesis problem is undecidable, as pointed out in Theorem 1.

Indeed, it turns out that under imperfect monitoring, detecting deviators is no easier than coordinating against an opponent to reach a target set.

Theorem 3.

The synthesis problem for deviator detection strategies is undecidable for games with imperfect monitoring.

Proof.

Consider an arbitrary coordination game  with three players , , and  where the coalition seeks to reach a set of states under imperfect monitoring. We reduce the synthesis problem for this game to one in a deviator detection game  among four players , , and , in which play the same role as in whereas and both take the role of player . The new game contains two disjoint copies , of ; the actions of are ignored in the former, and those of in the latter. The game  starts in a fresh state at which it loops with a fixed action profile; the designated set  consists only of this looping play. The actions of player  and at this state are perfectly observable to all players, so any deviation from the loop in  is detected instantly. In contrast, the deviations of player  or generate the same (fresh) observation to and , and they lead to the initial state of and , respectively. These two component games evolve in the same way with the only difference that, when switching to a target state of in , the observation is sent to all players, whereas the observation  is sent when reaching  in .

Thus, for any deviator detection strategy , upon deviation of either or from the initial loop, players  and must coordinate to reach the target set  to identify the deviator. Hence, yields a solution of the coordination problem. Conversely, any solution of the coordination problem leads to a state in  where the deviator is revealed, so it provides a deviator detection strategy. Since the synthesis problem for coordination problems is undecidable, according to Theorem 1, it follows that the synthesis problem for deviator detection strategies is also undecidable. ∎∎

4. Perfect monitoring of states

As we could see in the previous section, the algorithmic intractability of games where the global state can be hidden from the players over an unbounded duration of time is preserved when we move from coordination to the deviator detection problem. However, our setting bears two sources of uncertainty: the global state and the played action. In this section we consider the case where the uncertainty comes only from the actions played by the other players. Indeed, this is a generalisation of the setting of infinitely repeated games, which can be seen as games with only one global state.

As an example, consider the following simple variant of a beeping model [6]. There are nodes in a network represented by an undirected graph. The nodes can communicate synchrounously. In every round, a node can either beep or stay silent. A silent node can observe whether at least one of its neighbour beeped. We assume that the network is commonly known, we are interested in distributed protocols under wich some temporal condition is ensured, e.g., no more than a quarter of the nodes beep in the same round, and that are additionally deviator proof, in the sense that whenever a node deviates from the protocol, the protocol followed by the remaining nodes allows to reach a consensus on the identity of the deviator. This question can be represented as a deviator-detection problem among players, each with two actions – beep or stay silent – and two observations, telling whether any neighbour beeped or not in the current round. As the effect of an action profile is the same in any round, the game has only one global state. Still, the synthesis problem shows some complexity. Partly, this is due to the structure of the observation functions encoded by the network graph. For instance, one can observe that no deviator detection-strategy can exists on networks that are not two-connected: Any deviation has to be detected by at least two witnesses, and every node that is not a direct witness needs to be finally informed via at least two disjoint paths. But the greater challenge comes from the dynamics of communicating the identity of the deviator: In contrast to the more traditional synthesis problems for temporal conditions, whether a play  is successful is not determined by the strategic choices taken along  itself, but also depends on the choices taken on histories connected to  via the player’s indistinguishability relations.

Concretely, we consider games that allow perfect monitoring of the state in the sense that for every observation sequence received by any player  along a history, there exists precisely one global state that is reachable by a history with observation . In other words, all histories in an information set of a player end at the same global state. The condition is obviously met if we include the current global state in the observation of each player. Indeed, every finite game with perfect monitoring of the state can be transformed effectively into one where all the players can observe the current state.

Our main technical result establishes that, under perfect monitoring of states, the deviator detection problem is algorithmically tractable in spite of imperfect private monitoring of actions.

Theorem 4.

The synthesis problem for deviator-detection strategies is effectively solvable for games with perfect monitoring of the state and imperfect monitoring of actions.

The proof relies on a more general result which states, informally, that the synthesis problem for coordination games with a bounded amount of uncertainty are algorithmically tractable. To formulate this more precisely, let us fix a set  of players with their sets of actions, observations, and local states; the set includes a designated player Nature. Consider a finite collection of games with perfect monitoring of the state over the fixed action, observation and state spaces, together with a winning condition  common to all these games. We define the sum game over the collection as a game with a fresh initial node at which Nature chooses the initial node of any of the games in the collection; no information about this move is delivered to the other players in . Note that the sum  may not allow perfect monitoring of the state. For this sum game , we consider the task to synthesise a coordination strategy for the coalition  to ensure either that the outcome is either winning with respect to or it reveals the initial choice of Nature. That is, we require, for every play  which follows the strategy, that at some history in , the players attain common knowledge about the component game that has been chosen. We call this a revelation game over with condition .

In game-theoretic terminology, the sum game constructed above is actually a game of incomplete

information  —  the uncertainty about the global state of the game is due to not knowing which of the finitely many component games is being played. Nevertheless, as the component games deliver different observations, the players may be able to recover this missing information. It turns out that this restricted form of imperfect information is algorithmically tractable. The setting is similar to that of multi-environment Markov decision processes studied in

[18].

Theorem 5.

The synthesis problem is effectively solvable for revelation games on components with perfect monitoring of the state. Moreover, the set of all winning strategies admits a regular representation.

The idea is to keep track of the knowledge that players have about the index of the actual component game while the play proceeds. This knowledge can be represented by epistemic structures similar to the ones used in [2]. Here it is sufficient to consider epistemic structures on a subset of component indices, with epistemic equivalence relations  relating indices , whenever player  considers it possible to be in component  if the actual history is in component . The tracking construction associates to each history  a structure that is strongly connected via these relations; we call this structure the epistemic state of .

Intuitively, the construction represents the actions in the original game abstractly by their effect on the uncertainly about the component. In contrast to the concrete actions in the game, which can be monitored only imperfectly, the abstract updates on the epistemic structure can be monitored perfectly; the resulting game is thus solvable with standard methods as one with perfect information, where the winning condition asks to satisfy  or to reach an epistemic structure over a singleton, representing that the players attain common knowledge about the actual component. The perfect-information solution yields a regular representation of the set of winning strategies over a product alphabet of global game states and epistemic states. In the full paper, we show that this abstraction can be maintained by a finite-state construction and that it allows to represent a solution whenever one exists.

To prove Theorem 4 using the result of Theorem 5, we view the deviator detection problem as a revelation game. Towards this, we adapt the exposure game constructed in Subsection 3.1 for a given deviator detection problem  with target set . The exposure game is already close to the setting of revelation games, but there is one twist: Besides choosing the deviator, in the exposure game Nature can also choose the period in which to deviate. To account for this, we transform  by letting Nature pick a candidate deviator in the first move; this choice is hidden from the other players. In every later round, Nature can choose to either remain silent or to corrupt the action of player . As a target condition for the new game, we fix the set of all plays in the target set  where Nature remained silent. The obtained revelation game has the same set of solutions as the deviator detection problem at the outset.

5. Conclusion

Thus what we have here is a building block for constructing equilibria in games based on epistemic states. We are interested primarily in the issue of detecting a deviation from an agreed strategy profile. This task is more specific than constructing equilibria by detecting deviations from the set of distributed strategies that ensure a win, as done for instance, in [3]. The crucial difference relies in the fact that, in the latter case, the deviation events can be described by a (regular) set of game histories, while in our setting, the notion of deviation is relative to a strategy profile that is not fixed within the game structure. To illustrate the difference, consider the example of a beeping model from 4 with the trival target set that contains all possible plays. Obviously, every strategy profile is an equilibrium here, but deviator-detection strategies remain nevertheless intricate.

Our abstraction from imperfect monitoring of actions to games with perfect information is fairly generic. The central clue is that if there is only a finite amount of information hidden in the beginning of a play, and we can decide whether the coalition can recover it. In the context of distributed systems, there is a wide variety of situations that involve only imperfect observation of actions, and where uncertainty about system states may be bounded. Hence we reasonably expect that these techniques will be applicable, not only for deviator detection, but in other algorithmic questions on inferring global information in such systems.


Acknowledgements

This work was supported by the Indo-French Joint Research Unit ReLaX (umi cnrs 2000).

References

  • [1] Massimiliano Amarante, Recursive structure and equilibria in games with private monitoring, Economic Theory 22 (2003), no. 2, 353–374.
  • [2] Dietmar Berwanger, Łukasz Kaiser, and Bernd Puchala, A perfect-information construction for coordination in games, Proceedings of Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2011), LIPIcs, vol. 13, Schloss-Dagstuhl – Leibniz-Zentrum für Informatik, 2011, pp. 387–398.
  • [3] Patricia Bouyer, Romain Brenguier, Nicolas Markey, and Michael Ummels, Concurrent games with ordered objectives, Foundations of Software Science and Computational Structures FOSSACS 2012. Proc., 2012, pp. 301–315.
  • [4] Patricia Bouyer, Romain Brenguier, Nicolas Markey, and Michael Ummels, Pure Nash equilibria in concurrent games, Logical Methods in Computer Science 11 (2015), no. 2:9.
  • [5] Krishnendu Chatterjee, Laurent Doyen, Emmanuel Filiot, and Jean-François Raskin, Doomsday equilibria for omega-regular games, Verification, Model Checking, and Abstract Interpretation, LNCS, vol. 8318, Springer, 2014, pp. 78–97.
  • [6] Alejandro Cornejo and Fabian Kuhn, Deploying wireless networks with beeps, Distributed Computing: 24th International Symposium, DISC 2010. Proceedings (Nancy A. Lynch and Alexander A. Shvartsman, eds.), Springer, 2010, pp. 148–162.
  • [7] Ronald Fagin, Joseph Y. Halpern, Yoram Moses, and Moshe Y. Vardi, Reasoning about knowledge, MIT Press, 1995.
  • [8] B. Finkbeiner and S. Schewe, Uniform distributed synthesis, Proc. of Logic in Computer Science (LICS’05), IEEE, 2005, pp. 321–330.
  • [9] Drew Fudenberg, David I Levine, and Eric Maskin, The Folk Theorem with Imperfect Public Information, Econometrica 62 (1994), no. 5, 997–1039.
  • [10] Julian Gutierrez and Michael Wooldridge, Equilibria of concurrent games on event structures, Computer Science Logic (CSL) and Logic in Computer Science (LICS) (New York, NY, USA), CSL-LICS ’14, ACM, 2014, pp. 46:1–46:10.
  • [11] Joseph Y. Halpern, Computer science and game theory: A brief survey, CoRR abs/cs/0703148 (2007).
  • [12] Michihiro Kandori and Hitoshi Matsushima, Private observation, communication and collusion, Econometrica 66 (1998), no. 3, pp. 627–652.
  • [13] Orna Kupferman and Moshe Y. Vardi, Synthesizing distributed systems, Proc. of LICS ’01, IEEE Computer Society Press, June 2001, pp. 389–398.
  • [14] George Mailath and Larry Samuelson, Repeated games and reputations: Long-run relationships, Oxford University Press, 2006.
  • [15] Yoram Moses, Relating knowledge and coordinated action: The knowledge of preconditions principle, Theoretical Aspects of Rationality and Knowledge, TARK 2015, Proc., EPTCS, vol. 215, 2015, pp. 231–245.
  • [16] Gary L. Peterson and John H. Reif, Multiple-Person Alternation, Proc 20th Annual Symposium on Foundations of Computer Science, (FOCS 1979), IEEE, 1979, pp. 348–363.
  • [17] Amir Pnueli and Roni Rosner, Distributed reactive systems are hard to synthesize, Proceedings of the 31st Annual Symposium on Foundations of Computer Science, (FoCS ’90), 1990, pp. 746–757.
  • [18] Jean-Francois Raskin and Ocan Sankur, Multiple-Environment Markov Decision Processes, Foundation of Software Technology and Theoretical Computer Science (FSTTCS 2014), LIPIcs, vol. 29, Schloss Dagstuhl – Leibniz-Zentrum fü Informatik, 2014, pp. 531–543.