1 Introduction
Graph search algorithms. Reasoning about graphs is a fundamental problem in computer science, which is studied widely in logic (such as to describe graph properties with logic [GradelBook, CourcelleBook]
) and artificial intelligence
[AIBook, LaValle]. Graph search/planning algorithms are at the heart of such analysis, and gives rise to some of the most important algorithmic problems in computer science, such as shortest path, travelling salesman problem (TSP), etc.Finitehorizon planning. A classical problem in graph planning is the finitehorizon planning problem [LaValle], where the input is a directed graph with weights assigned to every edge and a time horizon . The weight of an edge represents the reward/cost of the edge. A plan is an infinite path, and for finite horizon the utility of the plan is the sum of the weights of the first edges. An optimal plan maximizes the utility. The computational problem for finitehorizon planning is to compute the optimal utility and an optimal plan. The finitehorizon planning problem has many applications: the qualitative version of the problem corresponds to finitehorizon reachability, which plays an important role in logic and verification (e.g., bounded until in RTCTL, and bounded modelchecking [EMSS92, BCCSZ03]); and the more general quantitative problem of optimizing the sum of rewards has applications in artificial intelligence and robotics [AIBook, Chapter 10, Chapter 25]
, and in control theory and game theory
[FV97, Chapter 2.2], [OR94, Chapter 6].Solutions for finitehorizon planning. For finitehorizon planning the classical solution approach is dynamic programming (or Bellman equations), which corresponds to backward induction [Howard, FV97]
. This approach not only works for graphs, but also for other models (e.g., Markov decision processes
[PT87]). A stationary plan is a path where for every vertex always the same choice of edge is made. For finitehorizon planning, stationary plans are not sufficient for optimality, and in general, optimal plans are quite involved, and represented as transducers optimal plans require storage proportional to at least (see Example 1). Since in general optimal plans are involved, a related computational question is to compute effective simple plans, i.e., plans that are optimal among stationary plans.Expected finitehorizon planning. A natural variant of the finitehorizon planning problem is to consider expected time horizon, instead of the fixed time horizon. In the finitehorizon problem the allowed stopping time of the planning problem is a Dirac distribution at time . In expected finitehorizon problem the expected stopping time is . A wellknown example where the fixed finitehorizon and the expected finitehorizon problems are fundamentally different is playing Prisoner’s Dilemma: if the time horizon is fixed, then defection is the only dominant strategy, whereas for expected finitehorizon problem cooperation is feasible [Nowak, Chapter 5]. Another classical example that is very wellstudied is the notion of discounting
, where at each time step the stopping probability is
, and this corresponds to the case that the expected stopping time is [FV97].Specified vs. adversarial distribution. For the expected finitehorizon problem there are two variants: (a) specified distribution: the stoppingtime distribution is specified; and (b) adversarial distribution: the stoppingtime distribution is unknown and decided by an adversary. The expected finitehorizon problem with adversarial distribution represents the robust version of the planning problem, where the distribution is unknown and the adversary represents the worstcase scenario. Thus this problem presents the robust extension of the classical finitehorizon planning that has a wide range of applications.
Results. In this work we consider the expected finitehorizon planning problems in graphs. To the best of our knowledge this problem has not been studied in the literature.

Our first simple result is that for the specified distribution problem, the optimal value can be computed in polynomial time (Theorem 1). However, since the specified distribution generalizes the fixed finitehorizon problem, the optimal plan description as an explicit transducer is of size . Hence the output complexity is not polynomial in general. Second, we consider the decision problem whether there is a stationary plan to ensure a given utility. We show that this problem is NPcomplete (Theorem 2).
Our most interesting and surprising results are for the adversarial distribution problem, which we describe below:

We show that stationary plans suffice for optimality (Theorem 3). This result is surprising and counterintuitive. Both in the classical finitehorizon problem and the specified distribution problem the adversary does not have any choice, and in both cases stationary plans do not suffice for optimality. Surprisingly we show that in the presence of an adversary the simpler class of stationary plans suffices for optimality.

For the expected finitehorizon problem with adversarial distribution, the backward induction approach does not work, as there is no apriori bound on the stopping time. We develop new algorithmic ideas to show that the optimal value can still be solved in polynomial time (Theorem 4). Moreover, our algorithm also computes and outputs an optimal stationary plan in polynomial time. Note that our algorithm also computes stationary optimal plans (which are as well optimal among all plans) in polynomial time, whereas computing stationary optimal plans for fixed finite horizon is NPcomplete.
Our results are summarized in Table 1 and are relevant for synthesis of robust plans for expected finitehorizon planning.
arbitrary  stationary  arbitrary  stationary  
Fixed horizon  PTIME  NPcomplete  
Expected horizon  PTIME 
2 Preliminaries
Weighted graphs. A weighted graph consists of a finite set of vertices, a set of edges, and a function that assigns a weight to each edge of the graph.
Plans and utilities. A plan is an infinite path in from a vertex , that is a sequence of edges such that for all . A path induces the sequence of utilities where for all . We denote by the set of all sequences of utilities induced by the paths of . For finite paths (i.e., finite prefixes of paths), we denote by and the initial and last vertex of , and by the length of .
Plans as transducers. A plan is described by a transducer (Mealy machine or Moore machine [HU79]) that given a prefix of the path (i.e., a finite sequence of edges) chooses the next edge. A stationary plan is a path where for every vertex the same choice of edge is made always. A stationary plan as a Mealy machine has one state, and as a Moore machine has at most states. Given a graph we denote by the set of all sequences of utilities induced by stationary plans in .
Distributions and stopping times. A subdistribution is a function such that . The value is the probability mass of . Note that . The support of is , and we say that is the sum of two subdistributions and , written , if for all . A stoppingtime distribution (or simply, a distribution) is a subdistribution with probability mass equal to . We denote by the set of all stoppingtime distributions, and by the set of all distributions with , called the biDirac distributions.
Expected utility and expected time. The expected utility of a sequence of utilities under a subdistribution is . In particular, the expected utility of the identity sequence is called the expected time, denoted by .
3 Expected Finitehorizon: Specified Distribution
Given a stoppingtime distribution with finite support, we show that the optimal expected utility can be computed in polynomial time. This result is straightforward.
Theorem 1.
Let be a weighted graph. Given a stoppingtime distribution , with all numbers encoded in binary, the optimal expected utility can be computed in polynomial time.
A special case of the problem in Theorem 1 is the fixedlength optimal path problem, which is to find an optimal path (that maximizes the total utility) of fixed length , corresponding to the distribution . A pseudopolynomial time solution is known for this problem, based on a valueiteration algorithm [LaValle, Section 2.3]. The algorithm runs in time (where is encoded in binary), and relies on the following recursive relation, where is the optimal value among the paths of length that start in :
A polynomial algorithm running in to obtain is to compute, in the maxplus algebra^{1}^{1}1In the maxplus algebra, the matrix product is defined by ., the th power of the transition matrix of the weighted graph, where if , and otherwise. The power can be computed in time by successive squaring of and summing up according to the binary representation of , which gives a polynomial algorithm to compute since it is the largest element in the column of corresponding to (note that the entries of the matrix are bounded by , where is the largest absolute weight in the graph). We now present the proof of Theorem 1.
Proof of Theorem 1.
Given the weighted graph and the distribution , we reduce the problem to finding an optimal path of length in a layered graph where the transitions between layer and layer mimic sequences of transitions in the original graph. For , define the th power of recursively by where . Let be the transition matrix of the original weighted graph. We construct the graph where

,

where , and

.
The optimal expected utility is the same as the optimal fixedlength path value for length in . The correctness of this reduction relies on the fact that the probability of not stopping before time is and the largest utility of a path of length from to is . Given a path of length in (that induces a sequence of weights), we can construct a path of length in (visiting at time and inducing a sequence of utilities), and we show that the value of the path of length in is the same as the expected utility of the corresponding path in with stopping time distributed according to , as follows (where ):
Conversely, given an arbitrary path in , let be the vertex visited at time , and consider the path in , which has a total utility at least the same as the expected utility of the given path in .
Therefore, the problem can be solved by finding the optimal fixedlength path value for length in , which can be done in polynomial time (see the remark after Theorem 1). ∎
In the fixedhorizon problem with , the optimal plan need not be stationary. The example below shows that in general the transducer for optimal plans require states as Mealy machine, and states as Moore machine.
Example 1.
Consider the graph of Figure 1 with vertices, and time bound (for some constant ). The optimal plan from is to repeat times the cycle and then switch to . This path has value , and all other paths have lower value: if only the cycle is used, then the value is at most , and the same holds if the cycle on is ever used before time . The optimal plan can be represented by a Mealy machine of size that counts the number of cycle repetitions before switching to . A Moore machine requires size as it needs a new memory state at every step of the plan.
Example 2.
In the example of Figure 2 the optimal plan needs to visit several different cycles, not just repeating a single cycle and possible switching only at the end. The graph consists of three loops on with weights and respective length , , and , and an edge to with weight . For expected time , the optimal plan has value and needs to stop exactly when reaching (to avoid the negative selfloop on ). It is easy to show that the remaining length can only be obtained by visiting each cycle once: as
is not an even number, the path has to visit a cycle of odd length, thus the cycle of length
; analogously, as is not a multiple of , the path has to visit the cycle of length , etc. This example can be easily generalized to an arbitrary number of cycles by using more prime numbers.We now consider the complexity of computing optimal plans among stationary plans.
Theorem 2.
Let be a weighted graph and be a rational utility threshold. Given a stoppingtime distribution , whether (i.e., whether there is a stationary plan with utility at least ) is NPcomplete. The NPhardness holds for the fixedhorizon problem , even when and all weights are in , and thus expressed in unary.
Proof.
The NP upper bound is easily obtained by guessing a stationary plan (i.e., one edge for each vertex of the graph) and checking that the value of the induced path is at least .
The NP hardness follows from a result of [FHW80] where, given a directed graph and four vertices , the problem of deciding the existence of two (vertex) disjoint simple paths (one from to and the other from to ) is shown to be NPcomplete. It easily follows that given a directed graph, and two vertices , the problem of deciding the existence of a simple cycle that contains and is NPcomplete. We present a reduction from the latter problem, illustrated in Figure 3. We construct a weighted graph from , by adding two vertices start and sink, and all edges have weight except those from with weight , and the edge with weight where is the number of vertices in . Let and the utility threshold .
If there exists a simple cycle containing and in , then there exists a stationary plan from start that visits then in at most steps. This plan can be prolonged to a plan of steps by going to sink and using the selfloop. The total weight is .
If there is no simple cycle containing and in , then no stationary plan can visit first then . We show that every stationary plan has value at most . First if a stationary plan uses the edge , then is not visited and all weights are except the weight from to sink. Otherwise, if a stationary plan does not use the edge , then all weights are at most , and the total utility is at most . In both cases, the utility is smaller than , which establishes the correctness of the reduction. ∎
4 Expected Finitehorizon: Adversarial Distribution
We now consider the computation of the following optimal values under adversarial distribution. Given a weighted graph and an expected stopping time , we define the following:

Optimal values of plans. For a plan that induces the sequence of utilities, let

Optimal value. The optimal value is the supremum value over all plans:
Our two main results are related to the plan complexity and a polynomialtime algorithm.
Theorem 3.
For all weighted graphs and for all we have
i.e., optimal stationary plans exist for expected finitehorizon under adversarial distribution.
Remark 1.
Note that in contrast to fixed finitehorizon problem, where stationary plans do not suffice, we show in the presence of an adversary, the simpler class of stationary plans are sufficient for optimality in expected finitehorizon. Moreover, while optimal plans require size Mealy (resp., size Moore) machines for fixedlength plans, our results show that under adversarial distribution optimal plans require size Mealy (resp., size Moore) machines.
Theorem 4.
Given a weighted graph and expected finitehorizon , whether can be decided in time, and computing can be done in time.
4.1 Theorem 3: Plan Complexity
In this section we prove Theorem 3. We start with the notion of subdistributions. Two subdistributions are equivalent if they have the same probability mass, and the same expected time, that is and . The following result is straightforward.
Lemma 1.
If are equivalent subdistributions, and is a subdistribution, then and are equivalent subdistributions.
4.1.1 BiDirac distributions are sufficient
By Lemma 1, we can decompose distributions as the sum of two subdistributions, and we can replace one of the two subdistributions by a simpler (yet equivalent) one to obtain an equivalent distribution. We show that, given a sequence of utilities, for all subdistributions with three points in their support (see Figure 4), there exists an equivalent subdistribution with only two points in its support that gives a lower expected value for . Intuitively, if one has to distribute a fixed probability mass (say ) among three points with a fixed expected time , assigning probability at point , then we have and , i.e.,
The expected utility is
which is a linear expression in variables where the sum is constant. Hence the least expected utility is obtained for either , or . This is the main argument^{2}^{2}2This argument works here because , which implies that when , and vice versa. A symmetric argument can be used in the case , to show that then either , or . to show that biDirac distributions are sufficient to compute the optimal expected value.
Lemma 2 (BiDirac distributions are sufficient).
For all sequences of utilities, for all time bounds , the following holds:
i.e., the set of biDirac distributions suffices for the adversary.
Proof.
First, we show that for all distributions with ,

there exists an equivalent distribution such that and , i.e., only one point before in the support is sufficient, and

there exists an equivalent distribution such that and , i.e., only one point after in the support is sufficient.
The result of the lemma follows from these two claims.
To prove claim , first consider an arbitrary subdistribution with where . Then and either , or .
We show that among the subdistributions equivalent to and with , the smallest expected utility of is obtained for . We present below the argument in the case , and show that either , or . A symmetric argument in the case shows that either , or .
Let , , and . Since and are equivalent, we have
Hence
The expected utility of under is
(1) 
Since is constant and , the least value of is obtained either for (if ), or for (otherwise), thus either for , or for . Note that for , we have and , which is a feasible solution as and since , and . Symmetrically, for we have a feasible solution.
As an intermediate remark, note that for and , we get (for , and symmetrically for )
(2) 
To complete the proof of Claim , given an arbitrary distribution with , we use the above argument to construct a distribution equivalent^{3}^{3}3Equivalence follows from Lemma 1. to with smaller expected utility and one less point in the support. We repeat this argument until we obtain a distribution with support that contains at most two points in the interval where is such that . Such a value of exists since . By the construction of , we have and therefore at most one point in the support of lies in the interval , which completes the proof of Claim .
To prove claim , consider a distribution with , and by claim we assume that for some , and for all with . Let , and we consider two cases:

otherwise there exists such that and . By an analogous of Equation (1), we have
that is is a convex combination of elements greater than or equal to , among which one is greater than . It follows that , and thus there exists such that .
Consider such that (which exists by definition of ), and let be the biDirac distribution with support and expected time . By an analogous of Equation (2), we have
Therefore, which concludes the proof since is a biDirac distribution with .
∎
4.1.2 Geometric interpretation
It follows from the proof of Lemma 2 (and Equation (2)) that the value of the expected utility of a sequence of utilities under a biDirac distribution with support (where ) and expected time is
In Figure 5, this value is obtained as the intersection of the vertical axis at and the line that connects the two points and . Intuitively, the optimal value of a path is obtained by choosing the two points and such that the connecting line intersects the vertical axis at as down as possible.
Lemma 3.
For all sequences of utilities, if for all , then the value of the sequence is at least .
Proof.
By Lemma 2, it is sufficient to consider biDirac distributions, and for all biDirac distributions with arbitrary support the value of under is
∎
It is always possible to fix an optimal value of (because is to be chosen among a finite set of points), but the optimal value of may not exist, as in Figure 5. The value of the path is then obtained as . In general, there exists such that it is sufficient to consider biDirac distributions with support containing to compute the optimal value. We say that is a leftminimizer of the expected value in the path. Given such a value of , let , and we show in Lemma 4 that , for all . This motivates the following definition.
Line of equation . Given a leftminimizer , we define the line of equation as follows:
Note that the optimal expected utility is
In other words, is the optimal value.
Lemma 4 (Geometric interpretation).
For all sequences of utilities, we have for all , and the expected value of is .
Proof.
The result holds by definition of for all . For , assume towards contradiction that . Let such that . We obtain a contradiction by showing that there exists a biDirac distribution under which the expected value of is smaller than the optimal value of . Consider a biDirac distribution with support where the value is defined later.
We need to show that
that is
which, since , holds if (successively)
We consider two cases: if the infimum is attained, then we have for some , and the inequality holds; otherwise, we can choose arbitrarily, and large enough to ensure that is smaller than , so that the inequality holds. ∎
A corollary of the geometric interpretation lemma is that the value of a path can be obtained as the intersection of the vertical line at point with the boundary of the convex hull of the region above the sequence of utilities, namely