Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

02/11/2020
by   Sushmita Bhattacharya, et al.
10

In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

READ FULL TEXT
research
11/09/2020

Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems

In this paper we consider infinite horizon discounted dynamic programmin...
research
06/01/2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

In this paper we propose an on-line policy iteration (PI) algorithm for ...
research
10/04/2019

Approximate policy iteration using neural networks for storage problems

We consider the stochastic single node energy storage problem (SNES) and...
research
09/30/2019

Multiagent Rollout Algorithms and Reinforcement Learning

We consider finite and infinite horizon dynamic programming problems, wh...
research
10/06/2019

Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

We propose a new aggregation framework for approximate dynamic programmi...
research
10/02/2004

Applying Policy Iteration for Training Recurrent Neural Networks

Recurrent neural networks are often used for learning time-series data. ...
research
06/06/2013

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for han...

Please sign up or login with your details

Forgot password? Click here to reset