A Greedy Approximation of Bayesian Reinforcement Learning with Probably Optimistic Transition Model

03/13/2013
by   Kenji Kawaguchi, et al.
0

Bayesian Reinforcement Learning (RL) is capable of not only incorporating domain knowledge, but also solving the exploration-exploitation dilemma in a natural way. As Bayesian RL is intractable except for special cases, previous work has proposed several approximation methods. However, these methods are usually too sensitive to parameter values, and finding an acceptable parameter setting is practically impossible in many applications. In this paper, we propose a new algorithm that greedily approximates Bayesian RL to achieve robustness in parameter space. We show that for a desired learning behavior, our proposed algorithm has a polynomial sample complexity that is lower than those of existing algorithms. We also demonstrate that the proposed algorithm naturally outperforms other existing algorithms when the prior distributions are not significantly misleading. On the other hand, the proposed algorithm cannot handle greatly misspecified priors as well as the other algorithms can. This is a natural consequence of the fact that the proposed algorithm is greedier than the other algorithms. Accordingly, we discuss a way to select an appropriate algorithm for different tasks based on the algorithms' greediness. We also introduce a new way of simplifying Bayesian planning, based on which future work would be able to derive new algorithms.

READ FULL TEXT
research
03/15/2012

Variance-Based Rewards for Approximate Bayesian Reinforcement Learning

The exploreexploit dilemma is one of the central challenges in Reinforce...
research
09/14/2016

Bayesian Reinforcement Learning: A Survey

Bayesian methods for machine learning have been widely investigated, yie...
research
07/02/2020

ε-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Resolving the exploration-exploitation trade-off remains a fundamental p...
research
06/22/2021

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

We study reinforcement learning (RL) with linear function approximation....
research
04/24/2020

PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

The exploration-exploitation trade-off is at the heart of reinforcement ...
research
01/31/2012

Gaussian Processes for Sample Efficient Reinforcement Learning with RMAX-like Exploration

We present an implementation of model-based online reinforcement learnin...
research
08/11/2020

Batch Value-function Approximation with Only Realizability

We solve a long-standing problem in batch reinforcement learning (RL): l...

Please sign up or login with your details

Forgot password? Click here to reset