Integrating Planning and Execution in Stochastic Domains

by   Richard Dearden, et al.

We investigate planning in time-critical domains represented as Markov Decision Processes, showing that search based techniques can be a very powerful method for finding close to optimal plans. To reduce the computational cost of planning in these domains, we execute actions as we construct the plan, and sacrifice optimality by searching to a fixed depth and using a heuristic function to estimate the value of states. Although this paper concentrates on the search algorithm, we also discuss ways of constructing heuristic functions suitable for this approach. Our results show that by interleaving search and execution, close to optimal policies can be found without the computational requirements of other approaches.


page 1

page 3

page 4

page 6

page 7

page 8


Online Planning Algorithms for POMDPs

Partially Observable Markov Decision Processes (POMDPs) provide a rich f...

Learning Generalized Reactive Policies using Deep Neural Networks

We consider the problem of learning for planning, where knowledge acquir...

A Differentiable Loss Function for Learning Heuristics in A*

Optimization of heuristic functions for the A* algorithm, realized by de...

FluCaP: A Heuristic Search Planner for First-Order MDPs

We present a heuristic search algorithm for solving first-order Markov D...

Improving Heuristics Through Relaxed Search - An Analysis of TP4 and HSP*a in the 2004 Planning Competition

The hm admissible heuristics for (sequential and temporal) regression pl...

Iterative Depth-First Search for Fully Observable Non-Deterministic Planning

Fully Observable Non-Deterministic (FOND) planning models uncertainty th...

Minimizing the Negative Side Effects of Planning with Reduced Models

Reduced models of large Markov decision processes accelerate planning by...