Analysis of Lower Bounds for Simple Policy Iteration

11/28/2019
by   Sarthak Consul, et al.
0

Policy iteration is a family of algorithms that are used to find an optimal policy for a given Markov Decision Problem (MDP). Simple Policy iteration (SPI) is a type of policy iteration where the strategy is to change the policy at exactly one improvable state at every step. Melekopoglou and Condon [1990] showed an exponential lower bound on the number of iterations taken by SPI for a 2 action MDP. The results have not been generalized to k-action MDP since. In this paper, we revisit the algorithm and the analysis done by Melekopoglou and Condon. We generalize the previous result and prove a novel exponential lower bound on the number of iterations taken by policy iteration for N-state, k-action MDPs. We construct a family of MDPs and give an index-based switching rule that yields a strong lower bound of O((3+k)2^N/2-3).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2020

Lower Bounds for Policy Iteration on Multi-action MDPs

Policy Iteration (PI) is a classical family of algorithms to compute an ...
research
06/24/2021

A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs

We derive a novel asymptotic problem-dependent lower-bound for regret mi...
research
06/03/2013

Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

Given a Markov Decision Process (MDP) with n states and a totalnumber m ...
research
02/08/2023

Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs

Modified policy iteration (MPI) also known as optimistic policy iteratio...
research
11/04/2019

An Exponential Lower Bound for Zadeh's pivot rule

The question whether the Simplex Algorithm admits an efficient pivot rul...
research
07/11/2022

Cluster-Based Control of Transition-Independent MDPs

This work studies the ability of a third-party influencer to control the...
research
03/23/2022

Sample-efficient Iterative Lower Bound Optimization of Deep Reactive Policies for Planning in Continuous MDPs

Recent advances in deep learning have enabled optimization of deep react...

Please sign up or login with your details

Forgot password? Click here to reset