On the Complexity of Policy Iteration

01/23/2013
by   Yishay Mansour, et al.
0

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

READ FULL TEXT
research
10/20/2017

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance ana...
research
11/28/2022

Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

Policy Iteration (PI) is a widely used family of algorithms to compute o...
research
10/31/2011

First Order Decision Diagrams for Relational MDPs

Markov decision processes capture sequential decision making under uncer...
research
09/09/2019

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent...
research
06/03/2013

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and a totalnumber m ...
research
03/19/2023

Going faster to see further: GPU-accelerated value iteration and simulation for perishable inventory control using JAX

Value iteration can find the optimal replenishment policy for a perishab...
research
09/16/2020

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an ...

Please sign up or login with your details

Forgot password? Click here to reset