The Value Iteration Algorithm is Not Strongly Polynomial for Discounted Dynamic Programming

12/19/2013
by   Eugene A. Feinberg, et al.
0

This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming.

READ FULL TEXT

page 1

page 2

page 3

research
05/04/2020

Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

We consider infinite horizon dynamic programming problems, where the con...
research
10/30/2017

Unifying Value Iteration, Advantage Learning, and Dynamic Policy Programming

Approximate dynamic programming algorithms, such as approximate value it...
research
08/27/2019

Research on Autonomous Maneuvering Decision of UCAV based on Approximate Dynamic Programming

Unmanned aircraft systems can perform some more dangerous and difficult ...
research
06/01/2021

On-Line Policy Iteration for Infinite Horizon Dynamic Programming

In this paper we propose an on-line policy iteration (PI) algorithm for ...
research
05/10/2021

Value Iteration in Continuous Actions, States and Time

Classical value iteration approaches are not applicable to environments ...
research
09/19/2016

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

Recent progress in randomized motion planners has led to the development...
research
08/05/2020

Semantic verification of dynamic programming

We prove that the generic framework for specifying and solving finite-ho...

Please sign up or login with your details

Forgot password? Click here to reset