On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

11/29/2012
by   Bruno Scherrer, et al.
0

We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error ϵ at each iteration, it is well-known that one can compute stationary policies that are 2γ/(1-γ)^2ϵ-optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iteration for computing non-stationary policies that can be up to 2γ/1-γϵ-optimal, which constitutes a significant improvement in the usual situation when γ is close to 1. Surprisingly, this shows that the problem of "computing near-optimal non-stationary policies" is much simpler than that of "computing near-optimal stationary policies".

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2012

On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision Processes

We consider infinite-horizon γ-discounted Markov Decision Processes, for...
research
04/20/2013

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon sta...
research
02/25/2020

Near Optimal Task Graph Scheduling with Priced Timed Automata and Priced Timed Markov Decision Processes

Task graph scheduling is a relevant problem in computer science with app...
research
01/29/2019

Constraint Satisfaction Propagation: Non-stationary Policy Synthesis for Temporal Logic Planning

Problems arise when using reward functions to capture dependencies betwe...
research
06/30/2011

Restricted Value Iteration: Theory and Algorithms

Value iteration is a popular algorithm for finding near optimal policies...
research
10/01/2019

The Choice Function Framework for Online Policy Improvement

There are notable examples of online search improving over hand-coded or...
research
03/19/2021

Zero-Delay Lossy Coding of Linear Vector Markov Sources: Optimality of Stationary Codes and Near Optimality of Finite Memory Codes

Optimal zero-delay coding (quantization) of ℝ^d-valued linearly generate...

Please sign up or login with your details

Forgot password? Click here to reset