Classification-based Approximate Policy Iteration: Experiments and Extended Discussions

07/02/2014
by   Amir-massoud Farahmand, et al.
0

Tackling large approximate dynamic programming or reinforcement learning problems requires methods that can exploit regularities, or intrinsic structure, of the problem in hand. Most current methods are geared towards exploiting the regularities of either the value function or the policy. We introduce a general classification-based approximate policy iteration (CAPI) framework, which encompasses a large class of algorithms that can exploit regularities of both the value function and the policy space, depending on what is advantageous. This framework has two main components: a generic value function estimator and a classifier that learns a policy based on the estimated value function. We establish theoretical guarantees for the sample complexity of CAPI-style algorithms, which allow the policy evaluation step to be performed by a wide variety of algorithms (including temporal-difference-style methods), and can handle nonparametric representations of policies. Our bounds on the estimation error of the performance loss are tighter than existing results. We also illustrate this approach empirically on several problems, including a large HIV control task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/26/2020

Inverse Policy Evaluation for Value-based Sequential Decision-making

Value-based methods for reinforcement learning lack generally applicable...
research
06/06/2013

Policy Search: Any Local Optimum Enjoys a Global Performance Guarantee

Local Policy Search is a popular reinforcement learning approach for han...
research
10/20/2022

Krylov-Bellman boosting: Super-linear policy evaluation in general state spaces

We present and analyze the Krylov-Bellman Boosting (KBB) algorithm for p...
research
10/31/2011

Optimal and Approximate Q-value Functions for Decentralized POMDPs

Decision-theoretic planning is a popular approach to sequential decision...
research
10/28/2020

Understanding the Pathologies of Approximate Policy Evaluation when Combined with Greedification in Reinforcement Learning

Despite empirical success, the theory of reinforcement learning (RL) wit...
research
06/19/2022

Guarantees for Epsilon-Greedy Reinforcement Learning with Function Approximation

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian...
research
12/01/2021

Robust and Adaptive Temporal-Difference Learning Using An Ensemble of Gaussian Processes

Value function approximation is a crucial module for policy evaluation i...

Please sign up or login with your details

Forgot password? Click here to reset