Approximate Modified Policy Iteration

05/14/2012
by   Bruno Scherrer, et al.
0

Modified policy iteration (MPI) is a dynamic programming (DP) algorithm that contains the two celebrated policy and value iteration methods. Despite its generality, MPI has not been thoroughly studied, especially its approximation form which is used when the state and/or action spaces are large or infinite. In this paper, we propose three implementations of approximate MPI (AMPI) that are extensions of well-known approximate DP algorithms: fitted-value iteration, fitted-Q iteration, and classification-based policy iteration. We provide error propagation analyses that unify those for approximate policy and value iteration. On the last classification-based implementation, we develop a finite-sample analysis that shows that MPI's main parameter allows to control the balance between the estimation error of the classifier and the overall value function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2013

Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies

We consider approximate dynamic programming for the infinite-horizon sta...
research
02/08/2023

Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs

Modified policy iteration (MPI) also known as optimistic policy iteratio...
research
12/11/2018

Deep neural networks algorithms for stochastic control problems on finite horizon, part I: convergence analysis

This paper develops algorithms for high-dimensional stochastic control p...
research
09/10/2019

Multi-Step Greedy and Approximate Real Time Dynamic Programming

Real Time Dynamic Programming (RTDP) is a well-known Dynamic Programming...
research
10/06/2019

Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning

We propose a new aggregation framework for approximate dynamic programmi...
research
07/22/2020

Approximation Benefits of Policy Gradient Methods with Aggregated States

Folklore suggests that policy gradient can be more robust to misspecific...
research
08/21/2015

On Monotonicity of the Optimal Transmission Policy in Cross-layer Adaptive m-QAM Modulation

This paper considers a cross-layer adaptive modulation system that is mo...

Please sign up or login with your details

Forgot password? Click here to reset