    # Reachability Switching Games

In this paper, we study the problem of deciding the winner of reachability switching games. These games provide deterministic analogues of Markovian systems. We study zero-, one-, and two-player variants of these games. We show that the zero-player case is NL-hard, the one-player case is NP-complete, and that the two-player case is PSPACE-hard and in EXPTIME. In the one- and two-player cases, the problem of determining the winner of a switching game turns out to be much harder than the problem of determining the winner of a Markovian game. We also study the structure of winning strategies in these games, and in particular we show that both players in a two-player reachability switching game require exponential memory.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Probabilistic model checking is an important topic in the field of formal verification. Like any model checking problem, it asks us to check that a given system satisfies a logical specification. In the probabilistic setting, the system itself makes use of probability, either in the form of an explicit use of randomization, or because the system interacts with a randomized environment. Probabilistic model checking is now a mature topic, with tools like PRISM

 providing an accessible interface to the research that has taken place.

In this paper, we approach this topic from a different view. Prior work has studied deterministic random systems, which attempt to replicate the properties of a random system in a deterministic way. A switching system

(also known as a Propp machine) does this by replacing the nodes of a Markov chain with

switching nodes

. Each switching node maintains an ordered queue over its outgoing edges. When the system arrives at the node, it is sent along the first edge in this queue, and that edge is then sent to the back of the queue. In this way, the switching node ensures that, after a large number of visits, the system uses each outgoing edge a roughly equal number of times. This mimics a Markovian node with a uniform distribution over its outgoing edges, since such a node also ensures a fairness property over its outgoing edges, in expectation.

There has been much work that studies how well switching systems achieve their goal of simulating a Markov chain, which we will discuss in more detail in the related work section. However, in this paper, we study the question how hard is it to model check switching systems? We already have a good knowledge about the complexity of model checking Markovian systems, but how does this change when we instead use switching nodes?

There are good reasons why a designer may want to implement a system using switching nodes. Firstly, true randomness is actually quite expensive, requiring specialist hardware to implement. Most systems actually use pseudorandom generators as their source of randomness, but these generators add complexity to the system. For example, the Mersenne Twister, used as the standard generator in Python, requires an extra 2.5 kilobytes of internal state. This complicates the program, and hence makes the model checking task much harder. By comparison, a switching system provides a much cheaper implementation, so long as the designer is willing to accept deterministic randomness. Another reason why a system designer may use switching nodes is that they naturally satisfy fairness properties. In fact, they do this better than random systems, which can only provide fairness in expectation.

#### Our contribution.

In this paper, we initiate the study of model checking in switching systems. We focus on reachability problems, one of the simplest model checking tasks. This corresponds to determining the winner of a two-player reachability switching game

. We study zero-, one-, and two-player variants of these games, which correspond to switching versions of Markov chains, Markov decision processes

, and simple stochastic games , respectively.

The main message of the paper is that deciding reachability in one- and two-player switching games is much harder than deciding reachability in Markovian systems. Our results are summarised in the table below.

The and upper bounds for the 0-player case were shown before [13, 8], but all other upper and lower bounds for switching games we show for the first time in this paper.

We also investigate the properties of winning strategies in these games. For the one-player case, we show that the reachability player can win using a marginal strategy, which simply counts the number of times that each edge has been used. For the two-player case, we show that both players can win using exponential memory, and also that both players require exponential memory in order to win.11todo: 1under some hypothesis?

#### Related work.

Our work was directly inspired by the work of Dohrau, Gärtner, Kohler, Matousek, and Welzl . They studied zero-player reachability switching games and showed that the associated decision problem is in . More recently, it was shown that the zero-player problem is in  . The contribution of our work is to study these questions in the one- and two-player settings.

Switching games are part of a research thread at the intersection of computer science and physics. This thread has studied zero-player switching systems, also known as deterministic random walks, rotor-router walks, the Eulerian walkers model  and Propp machines [11, 5, 4, 7, 6, 12]. Propp machines have been studied in the context of derandomizing algorithms and pseudorandom simulation, and in particular have received a lot of attention in the context of load balancing [9, 1]. However, most work on Propp machines has focused on how well multi-token switching systems simulated Markov chains. The idea of studying single-token reachability questions should be credited to the work of Dohrau at al.  mentioned above.

Katz et al.  and Groote and Ploeger  considered switching graphs; these are graphs in which certain vertices (switches) have exactly one of their two outgoing edges activated. However, the activation of the alternate edge does not happen when a vertex is traversed by a run; this is the key difference to switching games in this paper. That model was also studied by others [10, 16, 19].

## 2 Preliminaries

A reachability switching game is defined by a tuple , where is a finite graph, and , , partition into reachability vertices, safety vertices, and switching vertices, respectively. The reachability vertices are controlled by the reachability player, the safety vertices are controlled by the safety player, and the switching vertices  are not controlled by either player, but instead follow a predefined “switching order”. The function defines this switching order: for each switching vertex , we have that where the sequence is required to be a permutation over the vertices that have an incoming edge from . The vertices specify source and target vertices for the game.

A state of the game is defined by a tuple , where is a vertex in , and is a function that assigns a number to each switching vertex, which represents how far that vertex has progressed through its switching order. Hence, it is required that , since the counts specify an index to the sequence .

When the game is at a state with or , then the respective player chooses an outgoing edge at , and the count function does not change. For states with , the successor state is determined by the count function. More specifically, we define so that for each we have

 Upd(C,v)(u)={(C(u)+1)mod|Ord(u)|if v=u,C(u)otherwise \enspace.

This function updates the count at by , and wraps around to if the number is larger than the number of outgoing edges of . Then, the successor state of , denoted as is , where is the element at position in .

A play of the game is a (potentially infinite) sequence of states with the following properties:

1. and for all ;

2. If or then and ;

3. If then ;

4. If the play is finite, then the final state must either satisfy , or must have no outgoing edges.

A play is winning for the reachability player if the play is finite and the final state is at the target vertex . A (deterministic, history dependent) strategy for the reachability player is a function that maps a play prefix , to an outgoing edge of . A play is consistent with a strategy if, whenever , we have that is the vertex chosen by the strategy. A strategy is winning for the reachability player if every play that is consistent with the strategy is winning for the reachability player. Strategies for the safety player are defined analogously.

## 3 One-player reachability switching games

In this section we consider one-player reachability switching games, i.e., games with .

### 3.1 Containment in NP

We show that deciding whether the reachability player wins a one-player reachability switching game is in . The proof uses controlled switching flows. These extend the idea of switching flows, which were used in  to show containment of the zero-player reachability problem in .

#### Controlled switching flow.

A flow is a function that assigns a natural number to each edge in the game. For each vertex , we define

 Bal(F,v)=∑(v,u)∈EF(v,u)−∑(w,v)∈EF(w,v)

to be the difference between the outgoing and incoming flow at .

A flow is a controlled switching flow if it satisfies the following constraints:

• The source vertex satisfies

• The target vertex satisfies

• Every vertex other than or satisfies

• Let be a switching node and . There exists a constant and an index such that

• for all .

• for all .

The first three constraints ensure that is actually a flow from to 22todo: 2define --flow, while the final constraint ensures that the flow respects the switching order at each switching node. Note that there are no constraints on how the flow is split at the nodes in .

#### Marginal strategies.

A marginal strategy for the reachability player is defined by a function , which assigns a target number to each outgoing edge of the vertices in . The strategy ensures that each edge is used no more than times. That is, when the play arrives at a vertex , the strategy checks how many times each outgoing edge of has been used so far, and selects an outgoing edge that has been used strictly less than times. If there is no such edge, then the strategy is undefined.

Observe that a controlled switching flow defines a marginal strategy for the reachability player. We prove that this strategy always reaches the target. If a one-player reachability switching game has a controlled switching flow , then the corresponding marginal strategy is winning for the reachability player.

###### Proof.

The proof will be by induction on the total amount of flow in , which is defined as .

The base case is . The requirements of a controlled switching flow imply that , and all other edges have no flow at all. If , then the corresponding marginal strategy is required to choose the edge , and thus it is a winning strategy. If , then the balance requirement of a controlled switching flow ensures that is the first vertex in , so the switching node will move to , and the reachability player will win the game.

There are two cases to consider for the inductive step. First, assume that , and that . Let be the outgoing edge chosen by the marginal strategy (this can be any node that satisfies ). If denotes the current game, then we can create a new switching game , which is identical to , but where is the designated starting node. Moreover, we can create a controlled switching flow for by setting and leaving all other flow values unchanged. Observe that all properties of a controlled switching flow continue to hold for . Since , the inductive hypothesis implies that the marginal strategy that corresponds to (which is consistent with the marginal strategy for ) is winning for the reachability player.

The second case for the inductive step is when and . Let be the first edge in , which is the edge that the switching node will use. Again we can define a new game  where the starting node is , and in which has been rotated so that appears at the end of the sequence. We can define a controlled switching flow for  where and all other flow values are unchanged. Observe that satisfies all conditions of a controlled switching flow, and in particular that rotating allows to continue to satisfy the balance constraint on its outgoing edges. Again, since , the marginal strategy corresponding to (which is identical to the marginal strategy for ) is winning for the reachability player. ∎

In the other direction, if the reachability player has a winning strategy for the game, then we can show that there exists a controlled switching flow.

If the reachability player has a winning strategy for a one-player reachability switching game, then that game has a controlled switching flow of bounded size.

###### Proof.

Let be the play that is produced when the reachability player uses his winning strategy. We may assume that during the play no state is repeated. This is without loss of generality, since if the safety player can force a loop in the state space than she could force to stay in this loop forever and thus the reachability player would not have a winning strategy. Thus, if the play visits a fixed vertex multiple times then for each visit the switch configuration must be unique. It follows that each vertex is visited at most times. Define the flow so that is the number of times is used by the play. Since each vertex is visited at most times, we have for all . We claim that  is a controlled switching flow. In particular, since the play is a path through the graph starting at and ending at , we will have and , and we will have for every vertex other than and . Moreover, it is not difficult to verify that the balance constraint will be satisfied for every vertex . ∎

Combing the two previous lemmas yields the following corollary.

If the reachability player has a winning strategy for a one-player reachability switching game, then he also has a marginal winning strategy.

Finally, we can show that solving a one-player reachability switching game is in .

Deciding the winner of a one-player reachability switching game is in .

###### Proof.

By Lemmas 3.1 and 3.1, the reachability player can win if and only if the game has a controlled switching flow of bounded size. Moreover, we can guess a flow, and check whether it satisfies the requirements of a controlled switching flow in polynomial time. ∎

### 3.2 NP-hardness

In this section we show that deciding the winner of a one-player reachability switching game is -hard. We will do so by reducing from 3SAT. Throughout this section, we will refer to a 3SAT instance with variables , , …, , and clauses , , …, . It is well-known [20, Thm. 2.1] that 3SAT remains -hard even if all clauses contain at most three variables, and all variables appear in at most three clauses. We make this assumption during our reduction.

#### Overview.

At a high level, the idea behind the construction is that the reachability player will be asked to assign values to each variable. Each variable will have a corresponding vertex that will be visited three times during the game. Each time this vertex is visited, the reachability player will be asked to assign a value to in a particular clause . If the player chooses an assignment that does not satisfy , then the game records this by incrementing a counter. If the counter corresponding to any clause is incremented to three (or two if the clause only has two variables), then the reachability player immediately loses, since the chosen assignment fails to satisfy .

The problem with the idea presented so far is that there is no mechanism to ensure that the reachability player chooses a consistent assignment to the same variable. Since each variable is visited three times, there is nothing to stop the reachability player from choosing contradictory assignments to on each visit. To address this, the game also counts how many times each assignment is chosen for . At the end of the game, if the reachability player has not already lost by failing to satisfy the formula, the game is configured so that the target is only reachable if the reachability player chose a consistent assignment.

A high-level overview of the construction for an example formula is given in Fig. 1.

The sequencing in the construction is determined by the control gadget, which is shown in Fig. 3. In our diagramming notation, square vertices belong to the reachability player. Circle vertices are switching nodes, and the switching order of each switching vertex is labelled on its outgoing edges. Our diagrams also include counting gadgets, which are represented as non-square rectangles that have labelled output edges. The counting gadget is labelled by a sequence over these outputs, with the idea being that if the play repeatedly reaches the gadget, then the corresponding output sequence will be produced. In this example the gadget is labelled by , which means the first times the gadget is used the token will be moved along the edge, and the nd time the gadget is used the token will be moved along the edge. This gadget can be easily implemented by a switching node that has outgoing edges, the first of which go to , while the nd edge goes to . We use gadgets in place of this because it simplifies our diagrams.

The control gadget has two phases. In the variable phase, each variable gadget, represented by the vertices through is used exactly times, and thus overall the gadget will be used times. This is accomplished by a switching node that ensures that each variable is used times. After each variable gadget has been visited times, the control gadget then sends the token to the variable gadget for the verification phase of the game. In this phase, the reachability player must prove that he gave consistent assignments to all variables. If the control state is visited times, then the token will be moved to the fail vertex. This vertex has no outgoing edges, and thus is losing for the reachability player.

Each variable is represented by a variable gadget, which is shown in Figure 3. This gadget will be visited times in total during the variable phase, and each time the reachability player must choose either the true or false edges at the vertex . In either case, the token will then pass through a counting gadget, and then move to a switching vertex which either moves the token to a clause gadget, or back to the start vertex.

It can be seen that the gadget is divided into two almost identical branches. One corresponds to a true assignment to , and the other to a false assignment to . The clause gadgets are divided between the two branches of the gadget. In particular, a clause appears on a branch if and only if the choice made by the reachability player fails to satisfy the clause. So, the clauses in which appears positively appear on the false branch of the gadget, while the clauses in which appears negatively appear on the true branch.

The switching vertices each have exactly three outgoing edges. These edges use an arbitrary order over the clauses assigned to the branch. If there are fewer than clauses on a particular branch, the remaining edges of the switching node go back to the start vertex. Note that this means that a variable can be involved with fewer than three clauses.

The counting gadgets will be used during the verification phase of the game, in which the variable player must prove that he has chosen consistent assignments to each of the variables. Once each variable gadget has been used times, the token will be moved to by the control gadget. If the reachability player has used the same branch three times, then he can choose that branch, and move to , which again has the same property. So, if the reachability player gives a consistent assignment to all variables, he can eventually move to , and then on to , which is the target vertex of the game. Since, as we will show, there is no other way of reaching , this ensures that the reachability player must give consistent assignments to the variables in order to win the game.

Each clause is represented by a clause gadget, an example of which is shown in Figure 4. The gadget counts how many variables have failed to satisfy the corresponding clause. If the number of times the gadget is visited is equal to the number of variables involved with the clause, then the game moves to the fail vertex, and the reachability player immediately loses. In all other cases, the token moves back to the start vertex.

#### Correctness.

The following pair of lemmas show that the reachability player wins the one-player reachability switching game if and only if the 3SAT instance is satisfiable.

If there is a satisfying assignment to the 3SAT formula, then the reachability player can win the one-player reachability switching game.

###### Proof.

The strategy for the reachability player is as follows: at each variable vertex , choose the branch that corresponds to the value of in the satisfying assignment. We argue that this is a winning strategy. First note that the game cannot be lost in a clause gadget during the variable phase. Since the assignment is satisfying, the play cannot visit a clause gadget more than twice (or more than once if the clause only has two variables), and therefore the edges from the counting gadgets to the fail vertex cannot be used. Hence, the game will eventually reach the verification phase. At this point, since the strategy always chooses the same branch, the play will pass through , , , , and then arrive . Since this is the target, the reachability player wins the game. ∎

If the reachability player wins the one-player reachability switching game, then there is a satisfying assignment of the 3SAT formula.

###### Proof.

We begin by arguing that, if the reachability player wins the game, then he must have chosen the same branch at every visit to every variable gadget. This holds because  can only be reached by ensuring that each variable has a branch that is visited at least 3 times. The control gadget causes the reachability player to immediately lose the game if it is visited times. Thus, the reachability player must win the game after passing through the control gadget exactly times. The only way to do this is to ensure that each variable has a branch that is visited exactly 3 times during the variable phase.

Thus, given a winning strategy for the game, we can extract a consistent assignment to the variables in the 3SAT instance. Since the game was won, we know that the game did not end in a clause gadget, and therefore under this assignment every clause has at least one literal that is true. Thus, the assignment satisfies the 3SAT instance. ∎

Hence, we have the following theorem.

Deciding the winner of a one-player reachability switching game is -hard.

## 4 Two-player reachability switching games

### 4.1 Containment in EXPTIME

We first observe that solving a reachability switching game lies in . This follows from the fact that the game can be simulated by an alternating Turing machine, which is a machine that has both non-deterministic and universal control states. It has been shown that = 

, which means that if we can devise an algorithm that runs in polynomial space on an alternating Turing machine, then we can obtain an algorithm that runs in exponential time on a deterministic Turing machine.

It is straightforward to implement a reachability switching game on an alternating Turing machine. The machine simulates a run of the game. It starts by placing a token on the staring state. It then simulates each step of the game. When the token arrives at a vertex belonging to the reachability player, it uses existential non-determinism to choose a move for that player. When the token arrives at a vertex belonging to the safety player, it uses universal non-determinism to choose a move for that player. The moves at the switching nodes are simulated by remembering the current switch configuration, which can be done in polynomial space. The machine accepts if and only if the game arrives at the target state.

This machine uses polynomial space, because it needs to remember the switch configuration. Thus, we have the following theorem.

Deciding the winner of a reachability switching game is in .

### 4.2 PSPACE-hardness

We show that deciding the winner of a two-player reachability switching games is -hard, by reducing true quantified boolean formula (TQBF), the canonical -complete problem, to our problem. Throughout this section we will refer to a TQBF instance where denotes a boolean formula given in negation normal form, which requires that negations are only applied to variables, and not sub-formulas. The problem is to decide whether this formula is true.

#### Overview.

We will implement the TQBF formula as a game between the reachability player and the safety player. This game will have two phases. In the quantifier phase, the two players assign values to their variables in the order specified by the quantifiers. In the formula phase, the two players determine whether is satisfied by these assignments by playing the standard model-checking game for propositional logic. The target state of the game is reached if and only if the model checking game determines that the formula is satisfied. This high-level view of our construction is depicted in Fig. 5.

#### The quantifier phase.

Each variable in the TQBF formula will be represented by an initialization gadget. The initialization gadget for an existentially quantified variable is shown in Fig. 7. The gadget for a universally quantified variable is almost identical, but the state  is instead controlled by the safety player.

During the quantifier phase, the game will start at , and then pass through the gadgets for each of the variables in sequence. In each gadget, the controller of must move to either or . In either case, the corresponding switching node moves the token to , which then subsequently moves the token on to the gadget for .

The important property to note here is that once the player has made a choice, any subsequent visit to or will end the game. Suppose that the controller of chooses to move to . If the token ever arrives at a second time, then the switching node will move to the target vertex and the reachability player will immediately win the game. If the token ever arrives at the token will move to and then on to the fail vertex, and the Safety player will immediately win the game. The same property holds symmetrically if the controller of chooses instead. In this way, the controller of selects an assignment to . Hence, the reachability player assigns values to the existentially quantified variables, and the safety player assigns values to the universally quantified variables.

#### The formula phase.

Once the quantifier phase has ended, the game then moves into the formula phase. In this phase the two players play a game to determine whether is satisfied by the assignments to the variables. This is the standard model checking game for first order logic. The players play a game on the parse tree of the formula, starting from the root. The reachability player controls the nodes, while the safety player controls the nodes (recall that the game is in negation normal form, so there are no internal nodes.) Each leaf is either a variable or its negation, which in our game are represented by the and nodes in the initialization gadgets. An example of this game is shown in Figure 7. In our diagramming notation, nodes controlled by the safety player are represented by triangles.

Intuitively, if is satisfied by the assignment to through , then no matter what the safety player does, the reachability player should be able to reach a leaf node corresponding to a true assignment, and as we discussed earlier, he will then immediately win the game. Conversely, if is not satisfied by the assignment, then no matter what the reachability player does, the safety player can reach a leaf corresponding to a false assignment, and then immediately win the game.

The reachability player wins if and only if the QBF formula is true.

The proof can be found in Appendix A. Thus, we have shown the following theorem.

Deciding the winner of a reachability switching game is -hard.

Note that all runs of the game have polynomial length, a property that is not shared by all reachability switching games. This gives us the following corollary.

Deciding the winner of a polynomial-length reachability switching game is -complete.

The proof, which contains the argument for containment in , is in Appendix A.

### 4.3 Memory requirements of winning strategies

In this section we will show that both players need exponentially many memory states to win a reachability switching game.

We begin by giving a simple gadget that forces the reachability player to use memory. The gadget is shown in Figure 8. The game starts by allowing the safety player to move the token from to either or . Whatever the choice, the token then moves to and then on to . At this point, if the reachability player moves the token to the node chosen by the safety player, then the token will arrive at the target node and the reachability player will win. If the reachability player moves to the node not chosen by the safety player, the token will move to for a second time, and then on to the fail vertex, which is losing for the reachability player. Thus, every winning strategy of the reachability player must remember the choice made by the safety player.

Observe that we can create a similar gadget that forces the safety player to uses memory, by swapping the two players. In this modified gadget, the safety player would have to chose the vertex not chosen by the reachability player. Thus, in a reachability switching game, winning strategies for both players need to use memory.

#### A memory lower bound.

We can now use this gadget to show a lower bound on the amount of memory that is need to win a reachability switching game.

In a reachability switching game, winning strategies for both players may need to use memory states, where is the number of switching nodes.

#### Corresponding upper bound.

We can also show that exponential memory is sufficient in a two-player reachability switching game. We say that a strategy is a switch configuration strategy if it simply remembers the current switch configuration. Any such strategy uses at most exponentially many memory states. For games with binary switch nodes, these strategies use exactly memory states, where is the number of switching nodes.

In a reachability switching game, both players have winning switch configuration strategies.

The proofs of Lemmas 4.3 and 4.3 can be found in Appendix A.

## 5 Zero-player reachability switching games

In this section we consider zero-player reachability switching games, i.e., games with . As an initial hardness result for this case, we show that deciding the winner of a zero-player game is -hard. To do this, we reduce from the problem of deciding - connectivity in a directed graph.

The idea is to make every node in the graph a switching node. We then begin a walk from . If, after steps we have not arrived at , we go back to and start again. The idea being that, if there is a path from to , then the switching nodes must eventually send the token along that path.

More formally, given a graph , we produce a zero-player reachability switching game played on , where the second component of each state is considered to be a counter that counts up to . Every vertex is a switching node, the start vertex is , and the target vertex is fin. Each vertex with has outgoing edges to for each outgoing edge . Each vertex with has a single edge to . Every vertex has a single outgoing edge to fin. Given , this game can be constructed in logarithmic space by looping over each element in and producing the correct outgoing edges.

Deciding the winner of a zero-player reachability switching game is NL-hard under logspace reductions.

The proof can be found in Appendix B.

## 6 Further work

Many interesting open problems remain. For the zero-player case, there is an extremely large gap between the upper bounds of and and the easy lower bound of that we showed here. We conjecture that the problem is in fact P-complete, but despite much effort, we were unable to improve upon the upper or lower bounds.

For the one-player case we have shown tight bounds. For the two-player case we have shown a lower bound of and an upper bounds of . We conjecture that the lower bound can be strengthened, since we did not make strong use of the memory requirements that we identified in Sect. 4.3.

Finally, here we studied the problem of reachability, which is one of the simplest model checking tasks. What is the complexity of model checking more complex specifications?

## References

•  Hoda Akbari and Petra Berenbrink. Parallel rotor walks on finite graphs and applications in discrete load balancing. In Proc. SPAA 2013, pages 186–195, 2013.
•  Ashok K. Chandra, Dexter C. Kozen, and Larry J. Stockmeyer. Alternation. J. Assoc. Comput. Mach., 28(1):114–133, 1981.
•  Anne Condon. The complexity of stochastic games. Inf. Comput., 96(2):203–224, 1992.
•  Joshua Cooper, Benjamin Doerr, Tobias Friedrich, and Joel Spencer. Deterministic random walks on regular trees. Random Structures Algorithms, 37(3):353–366, 2010.
•  Joshua Cooper, Benjamin Doerr, Joel Spencer, and Gábor Tardos. Deterministic random walks on the integers. European J. Combin., 28(8):2072–2090, 2007.
•  Joshua N. Cooper and Joel Spencer. Simulating a random walk with constant error. Combin. Probab. Comput., 15(6):815–822, 2006.
•  Benjamin Doerr and Tobias Friedrich. Deterministic random walks on the two-dimensional grid. Combin. Probab. Comput., 18(1-2):123–144, 2009.
•  Jérôme Dohrau, Bernd Gärtner, Manuel Kohler, Jirí Matousek, and Emo Welzl. ARRIVAL: A zero-player graph game in . Technical report, 2016. https://arxiv.org/abs/1605.03546.
•  Tobias Friedrich, Martin Gairing, and Thomas Sauerwald. Quasirandom load balancing. SIAM J. Comput., 41(4):747–771, 2012.
•  Jan Friso Groote and Bas Ploeger. Switching graphs. International Journal of Foundations of Computer Science, 20(05):869–886, 2009.
•  Alexander E Holroyd, Lionel Levine, Karola Mészáros, Yuyal Peres, James Propp, and David B Wilson. Chip-firing and rotor-routing on directed graphs. In In and Out of Equilibrium 2, pages 331–364. Springer, 2008.
•  Alexander E. Holroyd and James Propp. Rotor walks and Markov chains. In Algorithmic probability and combinatorics, volume 520 of Contemp. Math., pages 105–126. Amer. Math. Soc., Providence, RI, 2010.
•  Karthik C. S. Did the train reach its destination: The complexity of finding a witness. Inf. Process. Lett., 121:17–21, 2017.
•  Bastian Katz, Ignaz Rutter, and Gerhard Woeginger. An algorithmic study of switch graphs. Acta Inform., 49(5):295–312, 2012.
•  Marta Kwiatkowska, Gethin Norman, and David Parker. PRISM 4.0: Verification of probabilistic real-time systems. In Proc. CAV 2011, volume 6806 of Lecture Notes Comput. Sci., 2011.
•  Christoph Meinel. Switching graphs and their complexity. In Proc. MFCS 1989, volume 379 of Lecture Notes Comput. Sci., pages 350–359, 1989.
•  Vyatcheslav B. Priezzhev, Deepak Dhar, Abhishek Dhar, and Supriya Krishnamurthy. Eulerian walkers as a model of self-organized criticality. Phys. Rev. Lett., 77(25):5079, 1996.
•  Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1st edition, 1994.
•  Klaus Reinhardt. The simple reachability problem in switch graphs. In Proc. SOFSEM 2009, volume 5404 of Lecture Notes Comput. Sci., pages 461–472, 2009.
•  Craig A. Tovey. A simplified -complete satisfiability problem. Discrete Appl. Math., 8(1):85–89, 1984.

## Appendix A Proofs for Section 4

#### Proof of Lemma 4.2.

###### Proof.

If the QBF formula is true, then during the quantifier phase, no matter what assignments the safety player picks for the universally quantified variables, the reachability player can choose values for the existentially quantified variables in order to make true. Then, in the formula phase the reachability player has a strategy to ensure that he wins the game, by moving to a node or that was used during the quantifier phase.

Conversely, and symmetrically, if the QBF formula is false then the safety player can ensure that the assignment does not satisfy during the quantifier phase, and then ensure that the game moves to a node or that was not used during the quantifier phase. This ensures that the safety player wins the game. ∎

#### Proof of Corollary 4.2.

###### Proof.

Hardness follows from Theorem 4.2. For containment, observe that the simulation by an alternating Turing machine described in Section 4.1 runs in polynomial time whenever the game terminates after a polynomial number of steps. Hence, we can use the fact that =  to obtain a deterministic polynomial space algorithm for solving the problem. ∎

#### Proof of Lemma 4.3.

###### Proof.

Consider a game with copies of the memory gadget shown in Figure 8, but modified so that the following sequence of events occurs.

1. The safety player selects or in all gadgets, one at a time.

2. The safety player then moves the game to one of the vertices in one of the gadgets.

3. The reachability player selects or as normal, and then either wins or loses the game.

The reachability player has an obvious winning strategy in this game, which is to remember the choices that the safety player made, and then choose the same vertex in the third step. Since the safety player makes binary decisions, this strategy uses memory states.

On the other hand, if the reachability player uses a strategy with memory states, then the safety player can win the game in the following way. There are different switch configurations that the safety player can create at the end of the first step of the game. By the pigeon-hole principle there exists two distinct configurations and that are mapped to the same memory state by . The safety player selects a gadget that differs between and , and determines whether selects or for gadget . He then selects the configuration that that is consistent with the other option, so if chooses the safety player chooses the configuration that selects . He then sets the gadgets according to in step 1, and moves the game to gadget in step 2. The reachability player will then select the vertex not chosen in step 1, he loses the game.

Finally, observe that we can obtain the same lower bound for the safety player by swapping the roles of both players in this game. ∎

#### Proof of Lemma 4.3.

###### Proof.

Let be a reachability switching game, and let denote the set of all switch configurations in this game. Consider the “blown-up” reachability game played on , where there are no switching nodes, but instead the successor of a vertex with is determined by . It is straightforward to show that the reachability player wins the game if and only if he wins the original game. Both players in a reachability game have positional winning strategies. Therefore, if a player can win in , then he can also win in using a switch configuration strategy that always plays according to the positional winning strategy in . ∎

## Appendix B Proof for Section 5

###### Proof.

We must argue that there is a path from to if and only if the zero-player reachability game eventually arrives at fin. By definition, if the game arrives at fin, then there must be a path from to , since the game only uses edges from the original graph.

For the other direction, suppose that there is a path from to , but the game never arrives at fin. By construction, if the game does not reach fin, then is visited infinitely often. Since is a switching state, we can then argue that the vertex is visited infinitely often for every successor of . Carrying on this argument inductively allows us to conclude that if there is a path of length from to , then the vertex is visited infinitely often, which provides our contradiction. ∎