Dynamic Policy Programming

04/12/2010
by   Mohammad Gheshlaghi Azar, et al.
0

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l∞-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l∞-norm of the average accumulated error as opposed to the l∞-norm of the error in the case of the standard approximate value iteration (AVI) and the approximate policy iteration (API). This suggests that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process. We examine this theoretical results numerically by com- paring the performance of the approximate variants of DPP with existing reinforcement learning (RL) methods on different problem domains. Our results show that, in all cases, DPP-based algorithms outperform other RL methods by a wide margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2014

Approximate Policy Iteration Schemes: A Comparison

We consider the infinite-horizon discounted optimal control problem form...
research
02/06/2022

Trusted Approximate Policy Iteration with Bisimulation Metrics

Bisimulation metrics define a distance measure between states of a Marko...
research
12/11/2021

Formalising the Foundations of Discrete Reinforcement Learning in Isabelle/HOL

We present a formalisation of finite Markov decision processes with rewa...
research
10/21/2022

online and lightweight kernel-based approximated policy iteration for dynamic p-norm linear adaptive filtering

This paper introduces a solution to the problem of selecting dynamically...
research
10/04/2019

Approximate policy iteration using neural networks for storage problems

We consider the stochastic single node energy storage problem (SNES) and...
research
05/15/2019

Stochastic approximation with cone-contractive operators: Sharp ℓ_∞-bounds for Q-learning

Motivated by the study of Q-learning algorithms in reinforcement learnin...
research
05/08/2012

Approximate Dynamic Programming By Minimizing Distributionally Robust Bounds

Approximate dynamic programming is a popular method for solving large Ma...

Please sign up or login with your details

Forgot password? Click here to reset