On the Complexity of Policy Iteration

01/23/2013
by   Yishay Mansour, et al.
0

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

10/20/2017

Multi-Objective Approaches to Markov Decision Processes with Uncertain Transition Parameters

Markov decision processes (MDPs) are a popular model for performance ana...
10/31/2011

First Order Decision Diagrams for Relational MDPs

Markov decision processes capture sequential decision making under uncer...
09/09/2019

Policy Space Identification in Configurable Environments

We study the problem of identifying the policy space of a learning agent...
06/03/2013

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and a totalnumber m ...
05/17/2018

Memoryless Exact Solutions for Deterministic MDPs with Sparse Rewards

We propose an algorithm for deterministic continuous Markov Decision Pro...
09/16/2020

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an ...
05/04/2019

Pandora's Problem with Nonobligatory Inspection

Martin Weitzman's "Pandora's problem" furnishes the mathematical basis f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.