One of the basic and fundamental algorithmic problems in artificial intelligence is theplanning problem [LaValle2006, Russell and Norvig2010]. The classical models in planning are as follows:
Thus graphs, MDPs, and games on graphs are the fundamental models for planning.
Planning objectives. The planning objective represents the goal that the planner seeks to achieve. Some basic planning objectives are as follows:
Basic target reachability. Given a set of target vertices the planning objective is to reach some target vertex from the starting position.
Coverage Objective. In case of coverage there are different target sets, namely, , and the planning objective asks whether for each the basic target reachability with target set can be achieved. The coverage models the following scenarios: Consider that there is a robot or a patroller, and there are different target locations, and if an event or an attack happens in one of the target locations, then that location must be reached. However, the location of the event or the attack is not known in advance and the planner must be prepared that the target set could be any of the target sets.
Sequential target reachability. In case of sequential targets there are different target sets, namely, , and the planning objective asks first to reach , then , then and so on. This represents the scenario that there is a sequence of tasks that the planner must achieve.
The above are the most natural planning objectives and have been studied in the literature, e.g., in robot planning [Kress-Gazit, Fainekos, and Pappas2009, Kaelbling, Littman, and Cassandra1998, Choset2005].
|Obj/Model||Graphs||MDPs||Games on graphs|
|[Thm. 1,2]||[Thm. 3,4]|
|[Thm. 6]||[Thm. 5]||[Thm. 7,8]|
Planning questions. For the above planning objectives the basic planning questions are as follows: (a) for graphs, the question is whether there exists a plan (or a path) such that the planning objective is satisfied; (b) for MDPs, the basic question is whether there exists a policy such that the planning objective is satisfied almost-surely (i.e., with probability 1); and (c) for games on graphs, the basic question is whether there exists a policy that achieves the objective irrespective of the choices of the adversary. The almost-sure satisfaction for MDPs is also known as the strong cyclic planning in the planning literature [Cimatti et al.2003], and games on graphs question represent planning in the presence of a worst-case adversary [Mahanti and Bagchi1985, Hansen and Zilberstein1998] (aka adversarial planning, strong planning [Maliah et al.2014], or conformant/contingent planning [Bonet and Geffner2000, Hoffmann and Brafman2005, Palacios and Geffner2007]).
Algorithmic study. In this work, we consider the algorithmic study of the planning questions for the natural planning objectives for graphs, MDPs, and games on graphs. For all the above questions, polynomial-time algorithms exist. When polynomial-time algorithms exist, proving an unconditional lower bound is extremely rare. A new approach in complexity theory aims to establish conditional lower bound (CLB) results based on some well-known conjecture. Two standard conjectures for CLBs are as follows: The (a) Boolean matrix multiplication (BMM) conjecture which states that there is no sub-cubic combinatorial algorithm for boolean matrix multiplication; and the (b) Strong exponential-time hypothesis (SETH) which states that there is no sub-exponential time algorithm for the SAT problem. Many CLBs have been established based on the above conjectures, e.g., for dynamic graph algorithms, string matching [Abboud and Williams2014, Bringmann and Künnemann2015].
Previous results and our contributions. We denote by and the number of vertices and edges of the underlying model, and denotes the number of different target sets. For the basic target reachability problem, while the graphs and games on graphs problem can be solved in linear time [Beeri1980, Immerman1981], the current best-known bound for MDPs is [Chatterjee and Henzinger2014, Chatterjee et al.2016]. For the coverage and sequential target reachability, an upper bound follows for graphs and games on graphs, and an upper bound follows for MDPs. Our contributions are as follows:
Coverage problem: First, we present an time algorithm for graphs; second, we present an lower bound for MDPs and games on graphs, both under the BMM conjecture and the SETH. Note that for graphs our upper bound is linear time, however, if each is constant and , for MDPs and games on graphs the CLB is quadratic.
Sequential target problem: First, we present an time algorithm for graphs; second, we present an time algorithm for MDPs; and third, we present an lower bound for games on graphs, both under the BMM conjecture and the SETH.
The summary of the results is presented in Table 1. Our most interesting results are the conditional lower bounds for MDPs and game graphs for the coverage problem, the sub-quadratic algorithm for MDPs with sequential targets, and the conditional lower bound for game graphs with sequential targets.
Practical Significance. The sequential reachability and coverage problems we consider are the tasks defined in [Kress-Gazit, Fainekos, and Pappas2009, Section II. PROBLEM FORMULATION, 3) System Specification], where the problems have been studied for games on graphs (Section IV. DISCRETE SYNTHESIS) and mentioned as future work for MDPs (Section I. INTRODUCTION, A. Related Work). The applications of these problems have been demonstrated in robotics applications. We present a complete algorithmic picture for games on graphs and MDPs, settling open questions related to games and future work mentioned in [Kress-Gazit, Fainekos, and Pappas2009].
Theoretical Significance. Our results present a very interesting algorithmic picture for the natural planning questions in the fundamental models.
First, we establish results showing that some models are harder than others. More precisely,
for the basic target problem, the MDP model seems harder than graphs/games on graphs (linear-time algorithm for graphs and games on graphs, and no such algorithms are known for MDPs);
for the coverage problem, MDPs, and games on graphs are harder than graphs (linear-time algorithm for graphs and quadratic CLBs for MDPs and games on graphs);
for the sequential target problem, games on graphs are harder than MDPs and graphs (linear-time upper bound for graphs and sub-quadratic upper bound for MDPs, whereas quadratic CLB for games on graphs).
In summary, we establish model-separation results with CLBs: For the coverage problem, MDPs and games on graphs are algorithmically harder than graphs; and for the sequential target problem, games on graphs are algorithmically harder than MDPs and graphs.
Second, we also establish objective-separation results. For the model of MDPs consider the different objectives: Both for basic target and sequential target reachability the upper bound is sub-quadratic and in contrast to the coverage problem we establish a quadratic CLB.
Discussion related to other models. In this work, our focus lies on the algorithmic complexity of fundamental planning problems and we consider explicit state-space graphs, MDPs, and games, where the complexities are polynomial. The explicit model and algorithms for it are widely considered: [LaValle2006][Chapter 2.1 Discrete Feasible Planning], [Kress-Gazit, Fainekos, and Pappas2009][Section IV. DISCRETE SYNTHESIS] and [Chatterjee and Henzinger2014]
[Section 2.1. Definitions. Alternating game graphs.] In other representations such as the factored model, the complexities are higher (NP-complete), and then heuristics are the focus (e.g.,[Hansen and Zilberstein1998]) rather than the algorithmic complexity. Notable exceptions are the work on parameterized complexity of planning problems (see, e.g., [Kronegger, Pfandler, and Pichler2013]) and Conditional Lower Bounds showing that certain planning problems do not admit subexponential time algorithms [Aghighi et al.2016, Bäckström and Jonsson2017].
Markov Decision Processes (MDPs). A Markov decision process (MDP) consists of a finite set of vertices partitioned into the player-1 vertices and the random vertices , a finite set of edges , and a probabilistic transition function . The probabilistic transition function maps every random vertex in to an element of , where
is the set of probability distributions over the set of vertices. A random vertex has an edge to a vertex , i.e. iff .
Game Graphs. A game graph consists of a finite set of vertices , a finite set of edges and a partition of the vertices into player-1 vertices and the adversarial player-2 vertices .
Graphs. Graphs are a special case of MDPs with as well as special case of game graphs with . Let describe the set of successor vertices of . The set describes the set of predecessors of the vertex . More formally and .
Note that a standard way to define MDPs is to consider finite vertices with actions, and the probabilistic transition function is defined for every vertex and action. In our model, the choice of actions is represented as the choice of edges at player-1 vertices and the probabilistic transition function is represented by the random vertices. This allows us to treat MDPs and game graphs in a uniform way, and graphs can be described easily as a special case of MDPs.
Plays. A play is an infinite sequence of vertices such that each for all . The set of all plays is denoted with . A play is initialized by placing a token on an initial vertex. If the token is on a vertex owned by a player (such as player 1 in MDPs, or player 1 or player 2 in game graphs), then the respective player moves the token along one of the outgoing edges, whereas if the token is at a random vertex , then the next vertex is chosen according to the probability distribution . Thus an infinite sequence of vertices (or an infinite walk) is formed which is a play.
Policies. Policies are recipes for players to extend finite prefixes of plays. Formally, a player-i policy is a function which maps every finite prefix of a play that ends in a player-i vertex to a successor vertex , i.e., . A player-1 policy is memoryless or stationary if for all that end in the same vertex , i.e., the policy does not depend on the entire prefix, but only on the last vertex.
Outcome of policies. Outcome of policies are as follows:
In graphs, given a starting vertex, a policy for player 1 induces a unique play in the graph.
In game graphs, given a starting vertex , and policies for player 1 and player 2 respectively, the outcome is a unique play , where and for all if then and if , then .
In MDPs, given a starting vertex and a policy for player 1, there is a unique probability measure over which is denoted as .
Objectives and winning. In general, an objective is a measurable subset of . A play achieves the objective if . We consider the following notion of winning:
Almost-sure winning. In MDPs, a player-1 policy is almost-sure (a.s.) winning from a starting vertex for an objective iff .
Winning. In game graphs a policy is winning for player 1 from a starting vertex iff the resulting play achieves the objective irrespective of the policy of player 2, i.e., for all we have .
Note that in the special case of graphs both of the above winning notions requires that there exists a play from that achieves the objective.
In MDPs we consider a.s. winning
for which the precise transition probabilities of the transition
function does not matter, but only the support of the transition
function is relevant. The a.s. winning notion we use corresponds to the strong
cyclic planning problem. Intuitively, if we visit a random vertex in an MDPs
infinitely often then all its successors are visited infinitely often. This
represents the local fairness condition [Clarke, Grumberg, and
Peled1999]. Therefore, when we consider the MDP
question only the underlying graph structure along with the partition is
relevant, and the transition function can be treated as a uniform
distribution over the support.
can be treated as a uniform distribution over the support.
We have defined the notion of objectives in general above, and below we consider specific objectives that are natural in planning problems. They are all variants of one of the most fundamental objectives in computer science, namely, reachability objectives.
Basic Target Reachability. For a set of target set vertices, the basic target reachability objective is the set of infinite paths that contain a vertex of , i.e., .
Coverage Objective. For different target sets, namely , the coverage objective asks whether for each the basic target reachability objective can be achieved. More precisely, given a starting vertex , one asks whether for every there is a policy to ensure winning (resp., a.s. winning) for the objective from for game graphs (resp., MDPs).
Sequential Target Reachability. For a tuple of vertex sets the sequential target reachability objective is the set of infinite paths that contain a vertex of followed by a vertex of and so on up to a vertex of , i.e., .
Difference between MDPs and Game Graphs. Let the graph be defined as follows: Let and . Let be a target set. We will now consider for the MDP and the game graph . Let and . The example is illustrated in Figure 1. The adversary always chooses to go to and the target is never reached from . On the other hand, if is probabilistic whenever the token is at it is moved to with non-zero probability. That is, almost-surely the transition from to is taken eventually, i.e. is reached almost-surely. Thus, reachability in MDPs does not imply reachability in game graphs.
Relevant parameters. We will consider the following parameters: denotes the number of vertices, denotes the number of edges and will either denote the number of target sets in the coverage problem or the size of the tuple of target sets in the sequential target reachability problem.
Algorithmic study. In this work we study the above basic planning objectives for graphs, game graphs (i.e., winning in game graphs), and MDPs (a.s. winning in MDPs). Our goal is to clarify the algorithmic complexity of the above questions with improved algorithms and conditional lower bounds. We define the conjectured lower bounds for conditional lower bounds below.
Conjectured Lower Bounds
Results from classical complexity are based on standard complexity-theoretical assumptions, e.g., P NP. Similarly, we derive polynomial lower bounds which are based on widely believed, conjectured lower bounds on well studied algorithmic problems. In this work the lower bounds we derive depend on the popular conjectures below:
First of all, we consider conjectures on Boolean Matrix Multiplication [Williams and Williams2018][Theorem 6.1] and triangle detection in graphs [Abboud and Williams2014][Conjecture 2], which are the basis for lower bounds on dense graphs. A triangle in a graph is a triple of vertices such that . We will for the rest of this work assume that vertices contain at least one outgoing edge and no self-loops in instances of Triangle. This can be easily established by linear time preprocessing. See Remark 3 for an explanation of the term “combinatorial algorithm”.
Conjecture 1 (Combinatorial Boolean Matrix Multiplication Conjecture (BMM)).
There is no time combinatorial algorithm for computing the boolean product of two matrices for any .
Conjecture 2 (Strong Triangle Conjecture (STC)).
There is no expected time algorithm and no time combinatorial algorithm that can detect whether a graph contains a triangle for any , where is the matrix multiplication exponent.
williams2018subcubic [Theorem 6.1]williams2010subcubic showed that BMM is equivalent to the combinatorial part of STC. Moreover, if we do not restrict ourselves to combinatorial algorithms, STC, still gives a super-linear lower bound.
Remark 3 (Combinatorial Algorithms).
“Combinatorial” in Conjecture 2 means that it excludes “algebraic methods” (such as fast matrix multiplication [Williams2012, Le Gall2014]), which are impractical due to high associated constants. Therefore the term “combinatorial algorithm” comprises only discrete algorithms. Non-combinatorial algorithms usually have the matrix multiplication exponent in the running time. Notice that all algorithms for deciding almost-sure winning conditions in MDPs and winning conditions in games are discrete graph-theoretic algorithms and hence are combinatorial, and thus lower bounds for combinatorial algorithms are of particular interest in our setting. For further discussion consider [Ballard et al.2012, Henzinger et al.2015].
for the satisfiability problem of propositional logic and the Orthogonal Vector Conjecture.
The Orthogonal Vectors Problem (OV). Given sets of -bit vectors with and , are there and such that ?
Conjecture 3 (Strong Exponential Time Hypothesis (SETH)).
For each there is a such that -CNF-SAT on variables and clauses cannot be solved in time.
Conjecture 4 (Orthogonal Vectors Conjecture (OVC))).
There is no time algorithm for the Orthogonal Vectors Problem for any .
williams2005satisfaction williams2005satisfaction[Theorem 5] SETH implies OVC, which is an implications of a result in [Williams2005] and an explicit reduction is given in the survey article by VW2018survey [Theorem 3.1]VW2018survey. Whenever a problem is provably hard assuming OVC it is thus also hard when assuming SETH. For example, in [Bringmann and Künnemann2015][Preliminaries, A. Hardness Assumptions, OVH] the OVC is assumed to prove conditional lower bounds for the longest common subsequence problem. To the best of the author’s knowledge, there is no connection between the former two and the latter two conjectures.
The conjectures that no polynomial improvements over the best-known running times are possible do not exclude improvements by sub-polynomial factors such as polylogarithmic factors or factors of, e.g., .
Basic Previous Results
In this section, we recall the basic algorithmic results about MDPs and game graphs known in the literature that we later use in our algorithms.
Basic result 1: Maximal End-Component Decomposition. Given an MDP , an end-component is a set of vertices s.t. (1) the subgraph induced by is strongly connected (i.e., is strongly connected) and (2) all random vertices have their outgoing edges in , i.e., is closed for random vertices, formally described as: for all and all we have . A maximal end-component (MEC) is an end-component which is maximal under set inclusion. The importance of MECs is as follows: (i) first it generalizes strongly connected components (SCCs) in graphs (with
) and closed recurrent sets of Markov chains (with); and (ii) in a MEC from all vertices every vertex can be reached almost-surely. The MEC-decomposition of an MDP is the partition of the vertex set into MECs and the set of vertices which do not belong to any MEC. While MEC-decomposition generalizes SCC decomposition of graphs, and SCC decomposition can be computed in linear time [Tarjan1972, Theorem 13], there is no linear-time algorithm for MEC-decomposition computation. The current best-known algorithmic bound for MEC-decomposition is [Chatterjee and Henzinger2014, Theorem 3.6, Theorem 3.10].
Basic result 2: Reachability in MDPs. Given an MDP and a target set , the set of starting vertices from which can be reached almost-surely can be computed in time given the MEC-decomposition of [Chatterjee et al.2016, Theorem 4.1]. Moreover, for the basic target reachability problem the current best-known algorithmic bounds are the same as the MEC-decomposition problem, i.e., [Chatterjee and Henzinger2014, Theorem 3.6, Theorem 3.10], and any improvement for the MEC-decomposition algorithm also carries over to the basic target reachability problem.
Basic result 3: Reachability in game graphs. Given a game graph and a target set , the set of starting vertices from which player 1 can ensure to reach against all polices of player 2, is called player-1 attractor to and can be computed in time [Beeri1980, Immerman1981].
The above basic results from the literature explain the result of the first row of Table 1.
In this section, we consider the coverage problem. First, we present the algorithms, which are simple, and then focus on the conditional lower bounds for MDPs and game graphs, which establish that the existing algorithms cannot be (polynomially) improved under the STC and OV conjectures.
We present a linear-time algorithm for graphs, and quadratic time algorithm for MDPs and game graphs. The results below present the upper bounds of the second row of Table 1.
Planning in Graphs. For the coverage problem in graphs we are given a graph , a vertex and target sets . The algorithmic problem is to find out if starting from an initial vertex the basic target reachability, i.e., , can be achieved for all . The algorithmic solution is as follows: Compute the BFS tree starting from and check if all the targets are contained in the resulting BFS tree.
Planning in MDPs and Games. For both MDPs and game graphs with target sets, the basic algorithm performs basic reachability computations, i.e., for each target set , , the basic target reachability for target set is computed. (1) For game graphs, using the -time attractor computation (see Basic result 3), we have an -time algorithm. (2) For MDPs, the MEC-decomposition followed by many -time almost-sure reachability computation (see Basic result 2), gives an time algorithm.
Conditional Lower Bounds
We present conditional lower bounds for the coverage problem in MDPs and game graphs (i.e., the CLBs of the second row of Table 1). For MDPs and game graphs the conditional lower bounds complement the quadratic algorithms from the previous subsection. The conditional lower bounds are due to reductions from OV and Triangle.
Sparse MDPs. For sparse MDPs we present a conditional lower bound based on OVC. To do that we reduce the OV problem to the coverage problem in MDPs.
Given two sets of -dimensional vectors, we build the MDP as follows.
The vertices of the MDP are given by a start vertex , sets of vertices and representing the sets of vectors and vertices representing the coordinates of the vectors in the OVC instance.
The edges of are defined as follows: The start vertex has an edge to every vertex of . Furthermore for each there is an edge to iff and for each there is an edge from to iff .
The set of vertices is partitioned into player-1 vertices and random vertices .
The reduction is illustrated in Figure 2 (the dashed edges will be used later for the sequential target lower bounds).
Let be the MDP given by Reduction 1 with target sets for . There exist orthogonal vectors , iff there is no a.s. winning policy from for the coverage objective.
Notice that when starting from the token is randomly moved to one of the vertices and thus player 1 can reach each almost surely from iff it can reach each from each . The MDP is constructed in such a way that there is no path between vertex and iff the corresponding vectors are orthogonal in the OV instance: If is orthogonal to , the outgoing edges lead to no vertex which has an incoming edge to as either or . One the other hand, if there is no path from to we again have by the construction of the underlying graph that for all or . This is the definition of orthogonality for and . Thus, player 1 can reach all the target sets a.s. from iff there are no orthogonal vectors in and . ∎
The MDP has only many vertices and Reduction 1 can be performed in time (recall that ). The number of edges is and the number of target sets . Thus the theorem below follows immediately.
There is no or (for any ) algorithm to check if a vertex has an a.s. winning policy for the coverage problem in MDPs under Conjecture 4 (i.e., unless OVC and SETH fail).
Dense MDPs. For dense MDPs we present a conditional lower bound based on boolean matrix multiplication (BMM). Therefore we reduce the Triangle problem to the coverage problem in MDPs.
Given an instance of triangle detection, i.e., a graph , we build the following MDP .
The vertices are given as four copies of and a start vertex .
The edges of are defined as follows: There is an edge from to every for . In addition for there is an edge from to iff .
The set of vertices is partitioned into player-1 vertices and random vertices .
The reduction is illustrated in Figure 3 (the dashed edges will be used later for the sequential target lower bounds).
Let be the MDP given by Reduction 2 with target sets . The target set for . A graph has a triangle iff player-1 has an a.s. winning policy from for the coverage objective.
Notice that there is a triangle in the graph iff there is a path from some vertex in the first copy of to the same vertex in the fourth copy of , . Also, a path starting in satisfies the coverage objective, i.e., reaches all target sets a.s., unless it visits a vertex and also . As each of these paths has non-zero probability player 1 wins almost-surely from iff there is no such path iff there is no triangle in the original graph. ∎
Moreover, the size and the construction time of the MDP are linear in the size of the original graph and we have target sets. Thus the theorem below follows immediately.
There is no combinatorial or algorithm (for any ) to check if a vertex has an a.s. winning policy for the coverage objective in MDPs under Conjecture 2 (i.e., unless STC and BMM fail). The bounds hold for dense MDPs with .
Next, we describe how the results for MDPs can be extended to game graphs.
Sparse Game Graphs. The random starting vertex in the reduction is changed to a player-2 vertex. The rest of the reduction stays the same. The proof then proceeds as before with the adversary player 2 now overtaking the role of the random choices.
Given two sets of -dimensional vectors, we build the following game graph .
The vertices of the game graph are given by a start vertex , sets of vertices and representing the sets of vectors and vertices representing the coordinates.
The edges of are defined as follows: the start vertex has an edge to every vertex of . Furthermore for each there is an edge to iff and for each there is an edge from to iff .
The set of vertices is partitioned into player-1 vertices and player-2 vertices .
The reduction is illustrated in Figure 2 (the dashed edges will be used later for the sequential target lower bounds).
Let be the game graph given by Reduction 3 with target sets target sets for . There exist orthogonal vectors , iff there is no winning policy from start vertex for the coverage objective.
Notice that when starting from the token is moved to one of the vertices and thus player 1 can reach each from iff it can reach each from each . If there is one which cannot be reached from an , player 2 will choose as successor and win. The game graph is constructed in such a way that there is no path between vertex and iff the corresponding vectors are orthogonal in the OV instance: If is orthogonal to , the outgoing edges lead to no vertex which has an incoming edge to as either or . One the other hand, if there is no path from to we again have by the construction of the underlying graph that for all or . This is the definition of orthogonality for and . Thus, player 1 can reach all the target sets from starting vertex iff there are no orthogonal vectors in and . ∎
The game graph has only many vertices and Reduction 1 can be performed in time (recall that ). The number of edges is and the number of target sets . Thus the theorem below follows immediately.
There is no or algorithm (for any ) to check if a vertex has a winning policy for the coverage objective with reachability objectives in game graphs under Conjecture 4 (i.e., unless OVC and SETH fail).
Dense Game Graphs. The random vertices in the reduction are now player-2 vertices. Notice that the resulting game graph has only player-2 vertices. Now if there is a path starting from that is not in the defined coverage objective then player 2 would simply choose that one and thus player 1 still wins iff there is no such path, i.e., there is no triangle in the original graph.
Given an instance of triangle detection, i.e., a graph , we build the following game graph .
The vertices are given as four copies of and a start vertex .
The edges are defined as follows: There is an edge from to every for . In addition for there is an edge from to iff .
The set of vertices is partitioned into player-1 vertices and player-2 vertices .
Let be the game graph given by Reduction 4 with target sets . The target set for . A graph has a triangle iff player 1 has a winning policy from for the coverage objective.
Notice that there is a triangle in the graph iff there is a path from some vertex in the first copy of to the same vertex in the fourth copy of , . Also, a path starting in satisfies the coverage objective, i.e., reaches all target sets unless it visits a vertex and also . Player 1 wins from iff there is no such path as player 2 choose it. Such a path exists as proved above iff there is no triangle in the original graph. ∎
Moreover, the size and the construction time of game graph are linear in the size of the original graph and we have target sets. Thus the theorem below follows immediately.
There is no combinatorial or algorithm (for any ) to check whether a vertex has a winning policy for the coverage objective in game graphs under Conjecture 2 (i.e., unless STC and BMM fail). The bounds hold for dense game graphs with .
Sequential Target Problem
We consider the sequential target problem in graphs, MDPs and game graphs. In contrast to the quadratic CLB for the coverage problem, quite surprisingly we present a subquadratic algorithm for MDPs, which as a special case gives a linear-time algorithm for graphs. For games, we present a quadratic algorithm and a quadratic CLB.
The results below present the upper bounds of the third row of Table 1.
Planning in MDPs. We first calculate the MEC-decomposition of the MDP. Then each MECs is collapsed into a single vertex which we set to be a player-1 vertex. In the target sets, all the vertices of the MEC are replaced by this new vertex. This does not change the reachability conditions of the resulting MDP: Every vertex in th MEC can be reached almost surely starting from every other vertex in the same MEC, regardless of their type (player-1, random). Thus it suffices to give an algorithm for a MEC-free MDP with tuple of target sets .
The vertices in are the vertices that are not processed yet and is initialized with . Initially, vertices with no outgoing edges are added to a queue . Throughout the algorithm, the queue contains the vertices which have not been processed so far but whose successors are already processed.
While the queue is not empty, a vertex from the queue is processed. When a vertex is processed the function is called. The function calculates the label of the vertex and updates variables and of the other vertices. The label means vertex has an almost-sure winning policy for the objective where . Note that this means that vertices with label have an almost-sure winning policy for the objective where . The variables are used to store the maximum (for ) / minimum (for ) label of the already processed successors of .
Now when is empty, then the algorithm has to process a vertex where not all successors have been processed yet. In that case, one considers all the random vertices for which at least one successor has already been processed and chooses the random vertex with maximum to process next. Notice that the function ignores arguments with null values. One can show that, as the graph has no MECs, whenever is empty (and is not) there exist such a random vertex. Moreover, whenever is empty, all vertices in the set of unprocessed vertices have a policy that satisfies for . Intuitively, this is due to the fact that all vertices can reach the set of already processed vertices and in the worst case the reached vertex has and thus a strategy for . For the selected vertex all its successors will, thus, finally have a label of at most , and, as the current value of is , there is a successor with . Thus, as , we have that also the final value of must be . Hence, one can already process without knowing the labels of all the successors.
Proposition 1 (Correctness).
Given an MDP and a sequential target objective with targets , Algorithm 1 decides whether there is a player-1 policy at a start vertex for the objective .
We next state invariants of the while loop (see Line 1) that we will use later to show a loop invariant that will establish the correctness of the algorithm.
The following statements are invariants of the while loop in Line 1.
iff and ;
, for all with .
, for all with ;
In particular for all .
If and there is a such that .
The above invariants state that (a) the variables have the intended meaning, (b) contains all the unprocessed vertices whose successors are already processed, and (c) that the function is well-defined whenever called. These are three important ingredients to show the correctness of Algorithm 1.
The counters are initialized as and is initialized as . Thus the claim holds when first entering the while loop.
Assume the claim holds at the beginning of the iteration where vertex is processed. The set is only changed in Line 1 where only is removed from the set while the counters are only changed in Line 1, where all counters of vertices with are decreased by one (notice that iff ). That is, also after this iteration of the loop and the claim follows.
In the initial phase is set to and is set to . Thus the claim holds when first entering the while loop.
Assume the claim holds at the beginning of the iteration where vertex is processed. The set is only changed in Line 1 where is removed.
First consider a vertex . As is not removed from the set and no vertex is added to the claim is still true for . Now consider a vertex that might be added during the iteration of the loop. This can only happen in Line 1 and the if conditions ensures that and (by the previous invariant). Thus the claim also holds for the newly added vertices.
Note that we define the or over the empty set to be null. As all are initialized as null and with are initialized as null the claim holds when the algorithm enters the loop.
Now consider the iteration of vertex and assume the claim is true at the beginning. The set is only changed in Line 1 where is removed. Let be the set at the beginning of the iteration and the updated set. First notice that as it is either chosen by (a) as element of or (b) by . In the former case it was either initially set to or it was added to when processing a vertex , which would have set . In the latter case by the definition of . Now, as the assignment in Line 1 ensures that also . For a vertex the value is updated to (Line 1) which by assumption is equal to , i.e., the equation holds. For a vertex the value is updated to (Line 1) which by assumption is equal to , i.e., the equation holds. For vertices both as well as the right hand side of the equation are unchanged. Hence, the claim holds also after the iteration.
The input graph has a vertex with . Towards a contradiction assume that no such vertex exists. Then an SCC where every vertex in has only edges to other vertices in exists. Such SCCs are called bottom SCCs. Bottom SCCs are MECs [Chatterjee and Henzinger2014] and we assumed that there are no MECs in the MDP, a contradiction. Thus, is non-empty after the initialization and the claim holds after the initialization. Now consider the iteration of vertex and assume the claim is true at the beginning and . Notice that is set for vertices as soon as one vertex was processed. Towards a contradiction assume that all vertices in have , i.e., no vertex has a successor in . Each has at least one successor in as otherwise, would be in . That is is either empty or has a non-trivial bottom SCC that has no random outgoing edges. Again such an SCC would be a MEC and thus we obtain our desired contradiction.
From the following invariant, we obtain the correctness of our algorithm.
The following statements are invariants of the while loop in Line 1 or all :
there exists a player 1 policy s.t. ; and
there is no player 1 policy s.t. .
where or .
As is initialized as set the two statements hold after the initialization.
Now consider the iteration where vertex is processed and assume the invariants hold at the beginning of the iteration. We first introduce the following notation
We distinguish the case where is non-empty and the case where is empty.
1) Thus we can easily obtain a policy with as follows. If pick the vertex that corresponds to and then player 1 can follow the existing policy for vertex . If then which ever vertex is randomly chosen follow the existing policy for . In both cases, the claim follows from the inductive assumption on .
2) We next show that there is no policy for . If we have that the current vertex is not in the set and no successor has a policy with as by the inductive assumption cannot be reached a.s. from any successor of and thus there is also no policy for . Similar for we have that the current vertex is not in the set and there is at least one successor where there is no policy with and thus there is also no policy with as there is a non-zero chance that a vertex is picked that, by the inductive assumption, cannot reach a node in .
Case : As shown in the proof of Lemma 5(4,5) is not null for all that have an edge to vertices in and there is at least one vertex in that has an edge to . That is, the operator in Line 1 returns an argument , where by the choice of , and thus can be computed. Let .
1) As we have no MEC (in ), there is a policy , so that the play almost surely leaves by using one of the outgoing edges of a random node: The policy can be arbitrary, except that for a player-1 vertex with an edge where we choose (which must exist as would be in otherwise). As there are no MECs (in ) the policy will eventually go to using a random node. This implies that from each vertex in player 1 has a policy to reach a vertex in coming from a random vertex. By inductive assumption each successor of such random vertex has a policy to satisfy . Thus it follows that from each vertex in player 1 has a policy to satisfy . Now consider the random vertex that was chosen by the algorithm as . By the above all successors have a policy to satisfy almost-surely. Now as contains but does not contain we obtain a policy with .
2) By the choice of there is also a successor (that is chosen with non-zero probability) that, by assumption, has no policy for and, moreover, does not contain . Thus, when starting in each policy will fail to satisfy with non-zero probability, i.e., there is no policy for .
Proposition 2 (Running Time).
Algorithm 1 runs in time.
Initializing the algorithm takes time. This is due to the fact that we calculate in time at Line 1. The other initialization steps take only time(lines 2-6). Now consider the while loop. Every vertex is processed once. The costly operations are the call of the ProcessVertex function and the evaluation of the function. Evaluating ProcessVertex(v) takes time linear in the number of incoming edges of plus . Summing up over all vertices we obtain a bound. To compute efficiently we have to maintain a priority queue containing all not yet finished random vertices. As we have updates this costs only for one of the standard implementations of priority queues. Summing up this yields a running time for Algorithm 1 . ∎
By considering also the time mec for the MEC decomposition we obtain the desired bound and the following theorem.
Given an MDP , a starting vertex and a tuple of targets , we can calculate whether there is a player-1 policy at for the objective in time.
Planning in Graphs.