Write for a Markov Decision Process, where and , are finite sets of states and actions, and
assigns to each state and action a probability distribution over states. Asuccessor of on action is a state with . The -fold synchronized product of is the MDP whose states, called configurations here, are
-dimensional vectors with components in, and is lifted to in the natural way: for all and . A strategy assigns to every configuration an action222 We consider here only memoryless deterministic strategies as those suffice for almost-sure reachability problems . . For every initial state , such a strategy induces a probability space over all infinite sequences in (see  for details).
In a configuration , we say components mark state if the number of different indices with is equal to . In case we simply call the state marked in . Let denote the configurations in which all components mark a designated initial (or final, resp.) state of . A configuration can be synchronized if there exists a strategy such that the probability , of eventually visiting , is one. can be synchronized if Start can be synchronized.
Given an MDP equipped with initial and final states, the PopulationControl problem asks whether can be synchronized for every .
A Countdown Game is given by a directed graph , where edges carry positive integer weights, . For an initial pair of a vertex and a number, two opposing players (Player 1 and 2) alternatingly determine a sequence of such pairs as follows. In each round, from , Player 1 picks a number such that contains at least one edge ; then Player 2 picks one such edge and the game continues from . Player 1 wins the game iff the play reaches a pair in .
CountdownGame is the decision problem which asks if Player 1 has a strategy to win a given game for a given initial pair . All constants in the input are written in binary.
[Thm. 4.5 in ] CountdownGame is -complete.
2 The Reduction
In order to reduce CountdownGame to PopulationControl we first observe that the number of turns in a Countdown Game cannot exceed the initial value of the counter, as the initial counter value decreases at each turn. Thus, if Player 2 has a winning strategy, choosing actions at random yields a positive probability of applying that strategy, hence a positive probability of winning. Therefore Player 1 wins the initial game if, and only if, she wins with probability one against a randomized adversary.
The main idea for our further construction is to require Player 1 to move components one-by-one away from a waiting state, first into the control graph of the Countdown Game, and ultimately into the goal. To avoid a loss in the intermediate phase she needs to win an instance of that game against a randomizing opponent. This is enforced using a combination of gadgets, including two binary counters that can effectively test for zero, be set to specific numbers, and that are set up so that they can decrement at the same rate. As a result, Player 1 has a winning strategy for the two-player Countdown Game if, and only if, the controller can synchronize the -fold product of the constructed MDP for all .
For a given Countdown Game with an initial pair we construct an MDP as follows.
We write that action takes state to successor to mean that .
The exact probability distributions do not matter in our construction so we let be the uniform distribution over such successors.
be the uniform distribution over such successors.
Whenever action takes state only back to itself we say that ignores . There are states Heaven (the target) and Hell which ignore all actions. For a given state , an action is angelic if it takes only to Heaven, and daemonic if it takes to Hell. An action is safe in a configuration if it is not daemonic for any marked state (in any gadget).
Besides the special states Heaven and Hell, contains several gadgets described below.
The waiting gadget has two states Wait and Ready which react to the action wait as depicted in Figure 1 (left). Whenever a configuration marks one of these states, a strategy that continuously plays wait will almost-surely reach a configuration in which exactly one component marks Ready.
A special action go (to indicate successful isolation of one component) takes Ready to the initial state of the game . All other actions (in gadgets described below) are ignored.
The game is directly interpreted as MDP: For every edge there is an action which takes to and which is daemonic for all states .
The action win is is angelic for every state of . All other actions are ignored.
A (-bit) Counter consists of states for all and . For every bit there is a decrement action which
takes only to for all ,
takes only to ,
is daemonic for , and
is ignored by all , for all and .
We say that a configuration holds the number in this counter if it marks those states that represent the binary expansion of : for all , state is marked iff the th bit in the binary expansion of is . An action sets the counter to number if for all , it takes to only where is the th bit in the binary expansion of , and is daemonic for all (to ensure that the counter can only be set if it holds ). Observe that if a counter holds then there is a unique maximal sequence of safe decrement actions, that has length and after which the counter holds .
Additionally, for every bit the gadget has an error action , which is daemonic for and , and angelic for every other state (of ). These actions can be used to quickly synchronize any configuration in which the counter is not correctly initialized, i.e., does not hold a number. See Figure 2 for a depiction of a -bit counter.
The MDP will contain two distinct counter gadgets. A main counter has bits to hold possible counter values of the Countdown Game. An auxiliary counter has many bits to hold the largest edge weight in . These have distinct sets of states and actions, so for clarity, we write to refer to state (or action) in gadget . We connect some new actions to these two counters as follows.
The action go sets to ; this ensures that holds when starting to simulate .
The action win is daemonic for every state . This enforces that the must hold when a strategy claims Player 1 wins .
Any action sets to ;
The action next is daemonic for every state . This enforces that a strategy must first count down from to before it can simulate the next move in .
The control gadget will enforce that a synchronizing strategy proposes actions in a proper order; see Figure 1. It consists of states , and contains actions of all gadgets above (including go, win, and a new action, which is angelic for all states except , for which it is daemonic. All omitted edges in Figure 1 are daemonic.
To complete the construction of , we introduce an initial state and actions start and end. The action start takes to Wait (Waiting gadget), (Control gadget), and all states of counters and . It is daemonic for every other state.
The action end is daemonic for Wait and Ready, and angelic for every other state in .
is synchronizable for all iff Player 1 wins .
Suppose Player 1 wins the game . Fix . Recall that in all components of the initial configuration mark . A synchronizing strategy proceeds as follows:
Play start to initialize the Waiting and Control gadgets, and to set and to . If any of the gadgets is not correctly initialized afterwards, play the respective error action to win directly. For instance, if is unmarked, play to synchronize.
Reduce the number of components marking Wait one by one until a configuration is reached in which Wait is not marked. Once this is true, play end to synchronize.
To reduce the number of components marking Wait, isolate one of them, and move it to Heaven by simulating the Countdown Game:
Play wait until only a single component marks Ready, then play go. This will mark in the game gadget and sets to . Recall that is the initial pair of .
Simulate rounds of the game : assume state in the game gadget is marked and the counter holds , then let be the the number Player plays to win from the pair in . Play . This action will set to . Alternate between (safe) decrement actions in and until they hold and , respectively. Play next.
The above simulation of rounds in is repeated until both and hold , by assumption that Player 1 wins this is possible. At this point it is safe to play win.
Conversely, assume that Player 1 cannot win . Suppose that after the (only possible) initial move start, all gadgets are correctly initialized. Clearly, for every , this event has strictly positive probability. We argue that no strategy can synchronize such a configuration. Indeed, a successful strategy had to play a sequence in first, followed by actions in , by construction of the control gadget. If after playing go, more than one component mark , there is a non-zero chance that these will diverge, making subsequent actions in unsafe. If exactly one component marks then the second sequence of actions (assuming all actions are safe) corresponds to a play of . This inevitably leads to a configuration in which counter holds and the control enforces that the next action is in . But any such action will be daemonic for some state in and thus not be safe. We conclude that every strategy will lead to a configuration that at least one component marks Hell and thus cannot be synchronized. ∎
-  Nathalie Bertrand, Miheer Dewaskar, Blaise Genest, Hugo Gimbert, and Adwait Amit Godbole. Controlling a population. Logical Methods in Computer Science, 2019.
-  Marcin Jurdziński, François Laroussinie, and Jeremy Sproston. Model checking probabilistic timed automata with one or two clocks. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), 2007.
-  Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. 1st edition, 1994.