The Complexity of Concurrent Rational Synthesis

07/21/2017 ∙ by Rodica Condurache, et al. ∙ LACL UPEC 0

In this paper, we investigate the rational synthesis problem for concurrent game structure for a variety of objectives ranging from reachability to Muller condition. We propose a new algorithm that establishes the decidability of the non cooperative rational synthesis problem that relies solely on game theoretic technique as opposed to previous approaches that are logic based. Thanks to this approach, we construct a zero-sum turn-based game that can be adapted to each one of the afore mentioned objectives thus obtain new complexity results. In particular, we show that reachability, safety, Büchi, and co-Büchi conditions are PSpace-complete, Parity, Muller, Street, and Rabin are PSpace-hard and in ExpTime.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The synthesis problem aims at automatically designing a program from a given specification. Several applications for this formal problem can be found in the design of interactive systems i.e., systems interacting with an environment. From a formal point of view, the synthesis problem is traditionally modelled as a zero-sum turn-based game. The system and the environment are modeled by two players with opposite interest. The goal of the system is the desired specification. Hence, a strategy that allows the system to achieve its goal against any behavior of the environment is a winning strategy and is exactly the program to synthesize.

For a time, the described approach was the standard in the realm of controller synthesis. However, due to the variety of systems to model, such a pessimistic view is not always the most faithful one. For instance, consider a system that consists of a server and clients. Assuming that all the agents have opposite interests is not a realistic assumption. Indeed, from a design perspective, the purpose of the server is to handle the incoming requests. On the other hand, each client is only concerned with its own request and wants it granted. None of the agents involved in the described interaction have antagonistic purposes. The setting of non-zero-sum games was proposed as model with more realistic assumptions.

In a non zero-sum game, each agent is equipped with a personal objective and the system is just a regular agent in the game. The agents interact together aiming at achieving the best outcome. The best outcome in this setting is often formalized by the concept of Nash equilibria. Unfortunately, a solution in this setting offers no guarantee that a specification for a given agent is achieved, and in a synthesis context one wants to enforce a specification for one a subset of the agents.

The rational synthesis problem was introduced as a generalization of the synthesis problem to environment with multiple agents [4]. It aims at synthesizing a Nash equilibrium such that the induced behavior satisfies a given specification. This vision enjoys nice algorithmic properties since it matches the complexity bound of the classical synthesis problem. Later on, yet another version of the problem was proposed where the agents are rational but not cooperative [6, 7]. In the former formalization, the specification is guaranteed as long as the agents agree to behave according to the chosen equilibrium. But anything can happen if not, in particular they can play another equilibrium that does not satisfy the specification. In the Non Cooperative Rational Synthesis (NCRSP), the system has to ensure that the specification holds in any equilibrium (c.f., Section 3 for a formal definition and Figure 0(a) for an example). A solution for both problems was presented for specifications expressed in Linear Temporal Logic (LTL). The proposed solution relies on the fact that the problem can be expressed in a decidable fragment of a logic called Strategy Logic. The presented algorithm runs in 2-ExpTime. While expressing the problem in a decidable fragment of Strategy Logic gives an immediate solution, it could also hide a great deal of structural properties. Such properties could be exploited in a hope of designing faster algorithms for less expressive objectives. In particular, specifications such as reachability, liveness, fairness, etc.

In [3], the first author took part in a piece of work where they considered this very problem for specific objectives such as reachability, safety, Büchi, etc in a turn-based interaction model. They established complexity bounds for each objective.

In this paper we consider the problem of non-cooperative rational synthesis with concurrent interactions. We address this problem for a variety of objectives and give exact complexity bounds relying exclusively on techniques inspired by the theory of zero-sum games. The concurrency between agents raises a formal challenge to overcome as the techniques used in [3] do not directly extend. Intuitively, when the interaction is turn-based, one can construct a tree automaton that accepts solutions for the rational synthesis problem. The nodes of an accepted tree are exactly the vertices of the game. This helps a lot in dealing with deviations but cannot be used in concurrent games.

In Section 3, we present an alternative algorithm that solves the general problem for LTL specification. This algorithm constructs a zero-sum turn-based game. This fresh game is played between Constructor who tries to construct a solution and Spoiler who tries to falsify the proposed solution. We then show in Section 5 how to use this algorithm to solve the NCRSP for reachability, safety, Büchi, co-Büchi, and Muller conditions. We also observe that we match the complexity results for the NCRSP in turn-based games.

2 Preliminaries

2.1 Concurrent Game Structures

A game structure is defined as a tuple , where is the set of states in the game, is the initial state, is the set of agents, is the set of actions of Agent , is the transition table.

Note that we consider game structures that are complete and deterministic. That is, from each state and any tuple of actions , there is exactly one successor state .

A play in the game structure is a sequence of states and actions profile in such that is the initial state and for all , .

Throughout the paper, for every word , over any alphabet, we denote by the -th letter, and we denote by the prefix of of size .

By we mean the projection of over , and is the set of all the plays in the game structure . We call history any finite sequence in . For a history , we denote by its projection over , and by the last element of . We denote by the set of all the histories.

In this paper we allow agents to see the actions played between states. Therefore, they behave depending on the past sequence of states and tuples of actions.

[Strategy and strategy profile] A strategy for Agent  is a mapping

A strategy profile is defined as a tuple of strategies and by we denote the strategy of -th position (of Agent ).

Also, is the partial strategy profile obtained from the strategy profile from which the strategy of Agent  is ignored. The tuple of strategies is obtained from the tuple by substituting Agent ’s strategy with .

Once a strategy profile is chosen it induces a play . We say that a play in is compatible with a strategy of Agent  if for every prefix of with , we have , where is the action of Agent 

in the vector

.

We denote by the set of all the plays that are compatible with the strategy  for Agent . is the set of all the histories that are compatible with . The outcome of an interaction between agents following a certain strategy profile defines a unique play in the game structure denoted . It is the unique play in compatible with all the strategies in the profile which is an infinite sequence over .

2.2 Payoff and Solution Concepts

Each Agent has an objective expressed as a set of infinite sequences of states in . As defined before, a play is a sequence of states and action profiles. We slightly abuse notation and also write , meaning that the sequence of states in the play (that is, ) is in . We define the payoff function that associates with each play a vector defined by

We borrow game theoretic vocabulary and say that Agent wins whenever her payoff is . We sometimes abuse this notation and write , which is the payoff of Agent associated with the unique play induced by .

In this paper we are interested in winning objectives such as Safety, Reachability, Büchi, coBüchi, and Muller that are defined as follows. Let be a play in a concurrent game structure . We use the following notations:

to denote the set of states that appear along and

to denote the set of states appearing infinitely often along . Then,

  • Reachability: For some , ;

  • Safety:For some , ;

  • Büchi: For some , ;

  • coBüchi: For some , ;

  • Parity: For some priority function , ;

  • Muller: For some boolean formula over , .

A Nash equilibrium is the formalisation of a situation where no agent can improve her payoff by unilaterally changing her behaviour. Formally:

(Nash equilibrium) A strategy profile is a Nash equilibrium (NE) if for every agent and every strategy of the following holds true:

Throughout this paper, we will assume that Agent  is the agent for whom we wish to synthesize the strategy, therefore, we use the concept of 0-fixed Nash equilibria.

[0-fixed Nash equilibrium] A profile is a 0-fixed NE (0-NE), if for every strategy for agent in the following holds true:

That is, fixing for Agent , the other agents cannot improve their payoff by unilaterally changing their strategy.

2.3 Rational synthesis

The rational synthesis can be defined in a optimistic or pessimistic setting. The former one is the so-called Cooperative Rational Synthesis (CRSP) Formally defined as

Problem .

Is there a 0-NE such that ?

The latter is the so-called Non Cooperative Rational Synthesis Problem (NCRSP) and is formally defined as

Problem .

Is there a strategy for Agent such that for every -NE , we have ?

In this paper we study computational complexity for the rational synthesis problem in both cooperative and non-cooperative settings.

For the CRSP, the complexity results are corollaries of existing work. In particular, for Safety, Reachability, Büchi, co-Büchi, Rabin and Muller objectives, we can apply algorithms from [2] to obtain the same complexities for CRSP as for the turn-based models when the number of agents is not fixed. More precisely, in [2] the problem of finding NE in concurrent games is tackled. In this problem one asks for the existence of NE whose payoff is between two thresholds. Then, by choosing the lower thresholds to be such that only Agent 0 satisfies her objective and the upper thresholds such that all agents win, we reduce to the cooperative rational synthesis problem. Brenguier et al. [2] showed that the existence of constrained NE in concurrent games can be solved in PTime for Büchi objectives, NP for Safety, Reachability and coBüchi objectives, and PSpace for Muller objectives. All hardness results are inferred directly from the hardness results in the turn-based setting. This is a consequence of the fact that every turn-based game can be encoded as a concurrent game by allowing at each state at most one agent to have non-vacuous choices. For Streett objectives, by reducing to [2] we only obtain PSpace-easiness and the -hardness comes from the turn-based setting [3].

In the case of non-cooperative rational synthesis, we cannot directly apply the existing results. However, we define an algorithm inspired from the suspect games [2]. The suspect game was introduced to decide the existence of pure NE in concurrent games with -regular objectives. We inspire ourselves from that approach and design a zero-sum game that combines the behaviors of Agent 0 and an extra entity whose goal is to prove, when needed, that the current play is not the outcome of a 0-NE. We also extend the idea in [3] that consists roughly in keeping track of deviations. Recall that the non-cooperative rational synthesis problem consists in designing a strategy for the protagonist (Agent 0 in our case) such that her objective is satisfied by all the plays that are outcomes of 0-NE compatible with . This is equivalent to finding a strategy for Agent 0 such that for any play compatible with it, either satisfies , or there is no strategy profile that is a 0-NE whose outcome is .

(a) A concurrent game.

(b) Subgame induced from the strategy .

Consider the concurrent game with reachability objectives depicted in Figure 0(a). The game starts in the state . There are three agents, the controller Agent , Agent , and Agent . Agent 0 has two actions for right and for left. Agents 1 and 2 have two actions, denoted and . For any subset of , the states indicate that the agents in have reached their objectives (These states are sinks). In addition, there are three states , , and . The edges represent the transitions table. The labels indicate the action profiles e.g. the vector means that Agent 0 took action , Agent 1 took action , and Agent 2 took action . Finally action stands for the indifferent choice that is any action for a given agent. We can see that at , Agent  is the only agent with non-vacuous choices. He can choose to go to by playing action , or to go to by playing action .

Now consider the strategy for Agent  defined as follows: We argue that this strategy is a solution to the NCRSP. Indeed, by applying this strategy, we obtain the subgame of Figure 0(b). In this game, all the plays falsifying the objective of Agent 0 are the ones where Agent 1 plays . Notice now that these plays are not outcomes of a 0-NE since Agent 1 can deviate by playing action .

3 Solution for Problem 2.3

We will now describe a general algorithm that solves the NCRSP. As a first step in our procedure, we construct a two-player turn based game.

3.1 Construction of a two-player game

Given a concurrent game we construct a turn-based 2-player zero-sum game .

The game is obtained as follows:

  • The set is where:

    • .

  • The set is .

  • The set of states is where

  • Player plays in the states in and , while Player plays in the states in and . The legal moves are given as follows:

    • From a state , plays an action

    • From a state , plays an action .

    • From a state , plays an action

    • From a state , plays an action .

The transition and the objective of the game are described next.

3.2 Transition function

The game is best understood as a dialogue between and . In each state proposes an action for Agent 0 together with the actions corresponding to the winning strategies of the agents in the set . Then, responds with an action profile played by all agents in the environment. In the next step, knows the entire action profile played by the agents and proposes some new deviations for the agents that do not have a deviation yet (they are neither in nor in ). The last move is performed by , it is his role to “check” that the proposed deviations and winning strategies are correct. Therefore, can choose any continuation for the game and the sets and are updated according to the previous choices to some new values and . Each dialogue “round” is decomposed into four moves.

The transitions are given by the (partial) function :

  • When , .

  • When , .

  • When , .

  • When , , such that:

    • .

    • . That is, Agent is added to the set on the continuations where Agent plays the new action proposed by in (supposedly compatible with a winning strategy) and the other agents do not change their actions with respect to . Also, any agent for whom proposes an action in is a hint to that this agent can deviate from that point. It is up to to agree or not. If agrees, we say that he has agreed with the recommendation of . In this case, has to prove that she made the right choice, this will be checked by the winning condition of the game.

    • . This is the opposite case where stood by his choices, in this case the winning condition has to check that this was a wrong decision.

3.3 Winning condition

We equip with the canonical projection that is the projection over the -th component. In particular, for every , we have , , and . We also extend over and as expected. Histories for are finite words in . Histories for are finite words in . Plays are infinite sequence in . Let be a play, we denote the restriction of over the states in which is an infinite sequence in . The set (resp. ) is the set of agents in the limit of ’s (resp. ’s). The limit exists because the sets occurring in the states along a play are non-decreasing subsets of , and is finite. The limit exists because (1) an agent is added into only if it is not in , and (2) when an agent leaves , it gets into indefinitely. This means that when an agent leaves from , it never goes back.

We define the following sets:

(1)
(2)
(3)

3.4 Transformations

Lifting of histories

We define a transformation over histories in to create histories in . For every strategy for in , we define the transformation .

Let be a history in and assume that . The lifting of is a history in obtained by the mapping inductively defined as follows:

and

where

Observe that every history ends in a state in , where plays an action from , that always specifies an action for Agent . The function is thus instrumental in obtaining a strategy for Agent  in from a strategy of Player in . For every history in , we define:

(4)

For every strategy of , we call 0-strategy the strategy obtained by Equation 4. The following claim is consequence of the same equation.

Claim .

Let be a strategy for , and let be the -strategy. If a history in is compatible with then the history in is compatible with .

The function maps every history in into a history in . We define as the natural extension of over the domain of plays in . We extend the previous claim as expected.

Claim .

Let be a strategy for , and let be the -strategy. If a run in is compatible with then the run in is compatible with .

Let be a strategy for , let be a run in compatible with the 0-strategy . Let be a history in , assume to be a prefix of . If then .

Proof.

By induction on the size of . The base case is , in which case . We have . Now assume for induction that for every history of size and let .

Now consider the history by definition where are obtained thanks to , by I.H. , it thus suffices to show that . For this, one needs to remark that , and that

where the second equality is by definition of the construction. ∎

Since the previous lemma is true for any histories that are respectively prefixes of and we obtain the following claim:

Claim .

Let be a strategy for , let be a run in compatible with the 0-strategy . If then .

Projection of histories

We now define in some sense the reverse operation. Let us define the transformation .

Let be a history in ending in a state in .

Let be a run in , be a history in . If , then

Proof.

By induction over the length of . For the result trivially true. Assume the result holds for any history and let us show that it holds for . By induction we have , to conclude notice that

The function maps every history in ending in a state in into a history in . We define as the natural extension of over the domain of runs in .

The following claim follows

Claim .

Let be a run in , be a run in . If , then

4 Main Theorem

There exists a solution for the NCRSP iff wins.

We denote the strategy that mimics the strategy when the current history is  i.e.

Let be a play and let be a prefix of . We say that is a good deviation point for Agent if:

  • and,

  • there exists a strategy of Agent from such that for all we have:

We say that has a good deviation if some prefix of is a good deviation point.

We use the notion of deviation point in the following lemma. This lemma states that a strategy is a solution for the NCRSP if any play compatible with it, either is winning for Agent 0 or some Agent would unilaterally deviate and win against any strategy profile of the other agents.

A strategy is a solution for NCRSP iff every play compatible with either or, has a good deviation.

Proof.

We start by establishing the if direction, let be a solution for the NCRSP. If any outcome is such that then there is nothing to prove. Let be a play in such that is not in . Assume toward a contradiction that does not contain a good deviation point. Then by Definition 4 we know that for any prefix of , any agent such that , and any strategy of there exists strategies for agents 1 to such that the following holds:

The above equation implies that Agent does not have a profitable deviation under the strategy , hence the profile is a 0-fixed NE contradicting the fact that is a solution for the NCRSP.

For the only if direction, let be a strategy for agent 0, assume that every in satisfies

  1. or,

  2. has a good deviation.

If every play in is in then is a solution for . Let be a play in such that it is not in . By assumption, has a good deviation point i.e. there exists an Agent and a strategy for the same agent such that: and after a finite prefix of for any tuple of strategies the following holds:

Hence, is not the outcome of a 0-fixed NE and therefore is a solution for the NCRSP. ∎

4.1 Correctness

wins if she has a strategy that ensures against any strategy of .

Proposition .

If wins then there exists a solution for the NCRSP.

Proof.

Let be a winning strategy for in , and let be the strategy for Agent  in obtained by the construction in Sec. 3.4 Equation (4), that is, for every history in , . We show that is solution to the NCRSP.

Let be an arbitrary run in compatible with .

According to Lemma 4 it is sufficient to show that Consider the run in . As a consequence of Claim 3.4, we have that is compatible with . Since is winning, we also have , i.e.,

As a first case, assume that implying . By Claim 3.4 we can write , and thus .

As a second case, assume . It implies that there exists a state in along such that and there exists an agent in such that in and .

We argue that Agent has a profitable deviation from a prefix of entailing that contains a good deviation point.

Assume w.l.o.g. that is the first state along for which there exists an Agent in such that in and . The run is of the form: