Incremental Sampling-based Motion Planners Using Policy Iteration Methods

09/19/2016
by   Oktay Arslan, et al.
0

Recent progress in randomized motion planners has led to the development of a new class of sampling-based algorithms that provide asymptotic optimality guarantees, notably the RRT* and the PRM* algorithms. Careful analysis reveals that the so-called "rewiring" step in these algorithms can be interpreted as a local policy iteration (PI) step (i.e., a local policy evaluation step followed by a local policy improvement step) so that asymptotically, as the number of samples tend to infinity, both algorithms converge to the optimal path almost surely (with probability 1). Policy iteration, along with value iteration (VI) are common methods for solving dynamic programming (DP) problems. Based on this observation, recently, the RRT^# algorithm has been proposed, which performs, during each iteration, Bellman updates (aka "backups") on those vertices of the graph that have the potential of being part of the optimal path (i.e., the "promising" vertices). The RRT^# algorithm thus utilizes dynamic programming ideas and implements them incrementally on randomly generated graphs to obtain high quality solutions. In this work, and based on this key insight, we explore a different class of dynamic programming algorithms for solving shortest-path problems on random graphs generated by iterative sampling methods. These class of algorithms utilize policy iteration instead of value iteration, and thus are better suited for massive parallelization. Contrary to the RRT* algorithm, the policy improvement during the rewiring step is not performed only locally but rather on a set of vertices that are classified as "promising" during the current iteration. This tends to speed-up the whole process. The resulting algorithm, aptly named Policy Iteration-RRT^# (PI-RRT^#) is the first of a new class of DP-inspired algorithms for randomized motion planning that utilize PI methods.

READ FULL TEXT
research
05/04/2020

Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the con...
research
06/01/2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

In this paper we propose an on-line policy iteration (PI) algorithm for ...
research
12/19/2013

The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

This note provides a simple example demonstrating that, if exact computa...
research
09/28/2021

The Role of Lookahead and Approximate Policy Evaluation in Policy Iteration with Linear Value Function Approximation

When the sizes of the state and action spaces are large, solving MDPs ca...
research
10/27/2021

A Subgame Perfect Equilibrium Reinforcement Learning Approach to Time-inconsistent Problems

In this paper, we establish a subgame perfect equilibrium reinforcement ...
research
07/12/2018

A Constrained Randomized Shortest-Paths Framework for Optimal Exploration

The present work extends the randomized shortest-paths framework (RSP), ...
research
07/22/2021

Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control

We introduce a contractive abstract dynamic programming framework and re...

Please sign up or login with your details

Forgot password? Click here to reset