Bayesian Policy Optimization for Model Uncertainty

10/01/2018
by   Gilwoo Lee, et al.
0

Addressing uncertainty is critical for autonomous systems to robustly adapt to the real world. We formulate the problem of model uncertainty as a continuous Bayes-Adaptive Markov Decision Process (BAMDP), where an agent maintains a posterior distribution over the latent model parameters given a history of observations and maximizes its expected long-term reward with respect to this belief distribution. Our algorithm, Bayesian Policy Optimization, builds on recent policy optimization algorithms to learn a universal policy that navigates the exploration-exploitation trade-off to maximize the Bayesian value function. To address challenges from discretizing the continuous latent parameter space, we propose a policy network architecture that independently encodes the belief distribution from the observable state. Our method significantly outperforms algorithms that address model uncertainty without explicitly reasoning about belief distributions, and is competitive with state-of-the-art Partially Observable Markov Decision Process solvers.

READ FULL TEXT
research
10/06/2018

Bayes-CPACE: PAC Optimal Exploration in Continuous Space Bayes-Adaptive Markov Decision Processes

We present the first PAC optimal algorithm for Bayes-Adaptive Markov Dec...
research
10/29/2020

Bayes-Adaptive Deep Model-Based Policy Optimisation

We introduce a Bayesian (deep) model-based reinforcement learning method...
research
02/15/2019

Bi-directional Value Learning for Risk-aware Planning Under Uncertainty

Decision-making under uncertainty is a crucial ability for autonomous sy...
research
08/12/2020

Deceptive Kernel Function on Observations of Discrete POMDP

This paper studies the deception applied on agent in a partially observa...
research
06/27/2012

Monte Carlo Bayesian Reinforcement Learning

Bayesian reinforcement learning (BRL) encodes prior knowledge of the wor...
research
05/23/2022

Flow-based Recurrent Belief State Learning for POMDPs

Partially Observable Markov Decision Process (POMDP) provides a principl...
research
02/15/2019

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy Observations

In real-world scenarios, the observation data for reinforcement learning...

Please sign up or login with your details

Forgot password? Click here to reset