Controlling a Random Population is EXPTIME-hard

09/13/2019 ∙ by Corto Mascle, et al. ∙ University of Liverpool ENS Paris-Saclay 0

Bertrand et al. [1] (LMCS 2019) describe two-player zero-sum games in which one player tries to achieve a reachability objective in n games (on the same finite arena) simultaneously by broadcasting actions, and where the opponent has full control of resolving non-deterministic choices. They show EXPTIME completeness for the question if such games can be won for every number n of games. We consider the almost-sure variant in which the opponent randomizes their actions, and where the player tries to achieve the reachability objective eventually with probability one. The lower bound construction in [1] does not directly carry over to this randomized setting. In this note we show EXPTIME hardness for the almost-sure problem by reduction from Countdown Games.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Definitions

Population Control.

Write for a Markov Decision Process, where and , are finite sets of states and actions, and

assigns to each state and action a probability distribution over states. A

successor of  on action  is a state with . The -fold synchronized product of is the MDP whose states, called configurations here, are

-dimensional vectors with components in

, and is lifted to in the natural way: for all and . A strategy assigns to every configuration  an action222 We consider here only memoryless deterministic strategies as those suffice for almost-sure reachability problems [3]. . For every initial state , such a strategy induces a probability space over all infinite sequences in  (see [3] for details).

In a configuration , we say components mark state if the number of different indices with is equal to . In case we simply call the state marked in . Let denote the configurations in which all components mark a designated initial (or final, resp.) state of . A configuration can be synchronized if there exists a strategy such that the probability , of eventually visiting , is one. can be synchronized if Start can be synchronized.

Given an MDP  equipped with initial and final states, the PopulationControl problem asks whether can be synchronized for every .

Countdown Games.

A Countdown Game is given by a directed graph , where edges carry positive integer weights, . For an initial pair of a vertex and a number, two opposing players (Player 1 and 2) alternatingly determine a sequence of such pairs as follows. In each round, from , Player 1 picks a number such that contains at least one edge ; then Player 2 picks one such edge and the game continues from . Player 1 wins the game iff the play reaches a pair in .

CountdownGame is the decision problem which asks if Player 1 has a strategy to win a given game for a given initial pair . All constants in the input are written in binary.

[Thm. 4.5 in [2]] CountdownGame is -complete.

2 The Reduction

In order to reduce CountdownGame to PopulationControl we first observe that the number of turns in a Countdown Game cannot exceed the initial value of the counter, as the initial counter value decreases at each turn. Thus, if Player 2 has a winning strategy, choosing actions at random yields a positive probability of applying that strategy, hence a positive probability of winning. Therefore Player 1 wins the initial game if, and only if, she wins with probability one against a randomized adversary.

The main idea for our further construction is to require Player 1 to move components one-by-one away from a waiting state, first into the control graph of the Countdown Game, and ultimately into the goal. To avoid a loss in the intermediate phase she needs to win an instance of that game against a randomizing opponent. This is enforced using a combination of gadgets, including two binary counters that can effectively test for zero, be set to specific numbers, and that are set up so that they can decrement at the same rate. As a result, Player 1 has a winning strategy for the two-player Countdown Game if, and only if, the controller can synchronize the -fold product of the constructed MDP for all .

For a given Countdown Game with an initial pair  we construct an MDP as follows. We write that action takes state to successor to mean that . The exact probability distributions do not matter in our construction so we let

be the uniform distribution over such successors.

Whenever action takes state only back to itself we say that ignores . There are states Heaven (the target) and Hell which ignore all actions. For a given state , an action is angelic if it takes only to Heaven, and daemonic if it takes to Hell. An action is safe in a configuration if it is not daemonic for any marked state (in any gadget).

wait

ready

wait

wait

wait

wait

go

next

win
Figure 1: The waiting (on left) and the control gadgets (on right). Edges labelled by are shorthand for several edges, one for each action in . All but the depicted actions are daemonic.

Besides the special states Heaven and Hell, contains several gadgets described below.

Waiting.

The waiting gadget has two states Wait and Ready which react to the action wait as depicted in Figure 1 (left). Whenever a configuration marks one of these states, a strategy that continuously plays wait will almost-surely reach a configuration in which exactly one component marks Ready.

A special action go (to indicate successful isolation of one component) takes Ready to the initial state of the game . All other actions (in gadgets described below) are ignored.

Game.

The game is directly interpreted as MDP: For every edge there is an action which takes to and which is daemonic for all states .

The action win is is angelic for every state of . All other actions are ignored.

Binary Counters.

A (-bit) Counter consists of states for all and . For every bit there is a decrement action which

  • takes only to for all ,

  • takes only to ,

  • is daemonic for , and

  • is ignored by all , for all and .

We say that a configuration holds the number in this counter if it marks those states that represent the binary expansion of : for all , state is marked iff the th bit in the binary expansion of is . An action  sets the counter to number if for all , it takes to only where is the th bit in the binary expansion of , and is daemonic for all (to ensure that the counter can only be set if it holds ). Observe that if a counter holds then there is a unique maximal sequence of safe decrement actions, that has length and after which the counter holds .

Additionally, for every bit the gadget has an error action , which is daemonic for and , and angelic for every other state (of ). These actions can be used to quickly synchronize any configuration in which the counter is not correctly initialized, i.e., does not hold a number. See Figure 2 for a depiction of a -bit counter.

The MDP will contain two distinct counter gadgets. A main counter has bits to hold possible counter values of the Countdown Game. An auxiliary counter has many bits to hold the largest edge weight  in . These have distinct sets of states and actions, so for clarity, we write to refer to state (or action) in gadget . We connect some new actions to these two counters as follows.

  • The action go sets to ; this ensures that holds when starting to simulate .

  • The action win is daemonic for every state . This enforces that the must hold when a strategy claims Player 1 wins .

  • Any action  sets to ;

  • The action next is daemonic for every state . This enforces that a strategy must first count down from  to  before it can simulate the next move in .

,,

,

,

,

,,

,,
Figure 2: A (4-bit) Binary Counter. Not displayed are edges labelled by that make the respective actions daemonic for state , and error actions , which are daemonic for and , for all bits .

Control.

The control gadget will enforce that a synchronizing strategy proposes actions in a proper order; see Figure 1. It consists of states , and contains actions of all gadgets above (including go, win, and a new action, which is angelic for all states except , for which it is daemonic. All omitted edges in Figure 1 are daemonic.

Start/End.

To complete the construction of , we introduce an initial state and actions start and end. The action start takes to Wait (Waiting gadget), (Control gadget), and all states of counters and . It is daemonic for every other state.

The action end is daemonic for Wait and Ready, and angelic for every other state in .

is synchronizable for all iff Player 1 wins .

Proof.

Suppose Player 1 wins the game . Fix . Recall that in all components of the initial configuration mark . A synchronizing strategy proceeds as follows:

  • Play start to initialize the Waiting and Control gadgets, and to set  and to . If any of the gadgets is not correctly initialized afterwards, play the respective error action to win directly. For instance, if is unmarked, play to synchronize.

  • Reduce the number of components marking Wait one by one until a configuration is reached in which Wait is not marked. Once this is true, play end to synchronize.

  • To reduce the number of components marking Wait, isolate one of them, and move it to Heaven by simulating the Countdown Game:

    1. Play wait until only a single component marks Ready, then play go. This will mark  in the game gadget and sets  to . Recall that is the initial pair of .

    2. Simulate rounds of the game : assume state  in the game gadget is marked and the counter  holds , then let be the the number Player  plays to win from the pair  in . Play . This action will set to . Alternate between (safe) decrement actions in and until they hold and , respectively. Play next.

    3. The above simulation of rounds in  is repeated until both and hold , by assumption that Player 1 wins  this is possible. At this point it is safe to play win.

Conversely, assume that Player 1 cannot win . Suppose that after the (only possible) initial move start, all gadgets are correctly initialized. Clearly, for every , this event has strictly positive probability. We argue that no strategy can synchronize such a configuration. Indeed, a successful strategy had to play a sequence in first, followed by actions in , by construction of the control gadget. If after playing go, more than one component mark , there is a non-zero chance that these will diverge, making subsequent actions in unsafe. If exactly one component marks  then the second sequence of actions (assuming all actions are safe) corresponds to a play of . This inevitably leads to a configuration in which counter holds and the control enforces that the next action is in . But any such action will be daemonic for some state in and thus not be safe. We conclude that every strategy will lead to a configuration that at least one component marks Hell and thus cannot be synchronized. ∎

Our claim follows immediately from Sections 2 and 1. PopulationControl is -hard.

References

  • [1] Nathalie Bertrand, Miheer Dewaskar, Blaise Genest, Hugo Gimbert, and Adwait Amit Godbole. Controlling a population. Logical Methods in Computer Science, 2019.
  • [2] Marcin Jurdziński, François Laroussinie, and Jeremy Sproston. Model checking probabilistic timed automata with one or two clocks. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2007.
  • [3] Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1st edition, 1994.