Monte Carlo Bayesian Reinforcement Learning

06/27/2012
by   Yi Wang, et al.
0

Bayesian reinforcement learning (BRL) encodes prior knowledge of the world in a model and represents uncertainty in model parameters by maintaining a probability distribution over them. This paper presents Monte Carlo BRL (MC-BRL), a simple and general approach to BRL. MC-BRL samples a priori a finite set of hypotheses for the model parameter values and forms a discrete partially observable Markov decision process (POMDP) whose state space is a cross product of the state space for the reinforcement learning task and the sampled model parameter space. The POMDP does not require conjugate distributions for belief representation, as earlier works do, and can be solved relatively easily with point-based approximation algorithms. MC-BRL naturally handles both fully and partially observable worlds. Theoretical and experimental results show that the discrete POMDP approximates the underlying BRL task well with guaranteed performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2021

Model-based Policy Search for Partially Measurable Systems

In this paper, we propose a Model-Based Reinforcement Learning (MBRL) al...
research
10/01/2018

Bayesian Policy Optimization for Model Uncertainty

Addressing uncertainty is critical for autonomous systems to robustly ad...
research
01/23/2013

A Possibilistic Model for Qualitative Sequential Decision Problems under Uncertainty in Partially Observable Environments

In this article we propose a qualitative (ordinal) counterpart for the P...
research
11/14/2018

Bayesian Reinforcement Learning in Factored POMDPs

Bayesian approaches provide a principled solution to the exploration-exp...
research
09/26/2019

Information-Guided Robotic Maximum Seek-and-Sample in Partially Observable Continuous Environments

We present PLUMES, a planner to localizing and collecting samples at the...
research
03/01/2021

Channel-Driven Monte Carlo Sampling for Bayesian Distributed Learning in Wireless Data Centers

Conventional frequentist learning, as assumed by existing federated lear...
research
04/28/2021

Rule-based Shielding for Partially Observable Monte-Carlo Planning

Partially Observable Monte-Carlo Planning (POMCP) is a powerful online a...

Please sign up or login with your details

Forgot password? Click here to reset