On-Line Policy Iteration for Infinite Horizon Dynamic Programming

by   Dimitri Bertsekas, et al.

In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during operation of the system. This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated. The algorithm converges in a finite number of stages to a type of locally optimal policy, and suggests the possibility of variants of PI and multiagent PI where the policy improvement is simplified. Moreover, the algorithm can be used with on-line replanning, and is also well-suited for on-line PI algorithms with value and policy approximations.



There are no comments yet.


page 1

page 2

page 3

page 4


Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the con...

Action Selection for MDPs: Anytime AO* vs. UCT

In the presence of non-admissible heuristics, A* and other best-first al...

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Recent progress in randomized motion planners has led to the development...

Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems

In this paper we consider infinite horizon discounted dynamic programmin...

Value Iteration in Continuous Actions, States and Time

Classical value iteration approaches are not applicable to environments ...

My Brain is Full: When More Memory Helps

We consider the problem of finding good finite-horizon policies for POMD...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.