Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

11/28/2022

∙

Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms, and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.

READ FULL TEXT

Some Upper Bounds on the Running Time of Policy Iteration on Deterministic MDPs

Sign in with Google

Consider DeepAI Pro