DeepAI AI Chat
Log In Sign Up

Risk-Averse Bayes-Adaptive Reinforcement Learning

by   Marc Rigter, et al.

In this work, we address risk-averse Bayesadaptive reinforcement learning. We pose the problem of optimising the conditional value at risk (CVaR) of the total return in Bayes-adaptive Markov decision processes (MDPs). We show that a policy optimising CVaR in this setting is risk-averse to both the parametric uncertainty due to the prior distribution over MDPs, and the internal uncertainty due to the inherent stochasticity of MDPs. We reformulate the problem as a two-player stochastic game and propose an approximate algorithm based on Monte Carlo tree search and Bayesian optimisation. Our experiments demonstrate that our approach significantly outperforms baseline approaches for this problem.


page 1

page 2

page 3

page 4


Lexicographic Optimisation of Conditional Value at Risk and Expected Value for Risk-Averse Planning in MDPs

Planning in Markov decision processes (MDPs) typically optimises the exp...

Active Reinforcement Learning with Monte-Carlo Tree Search

Active Reinforcement Learning (ARL) is a twist on RL where the agent obs...

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critic...

Bayes-Adaptive Deep Model-Based Policy Optimisation

We introduce a Bayesian (deep) model-based reinforcement learning method...

Robust and Adaptive Planning under Model Uncertainty

Planning under model uncertainty is a fundamental problem across many ap...

Learning is planning: near Bayes-optimal reinforcement learning via Monte-Carlo tree search

Bayes-optimal behavior, while well-defined, is often difficult to achiev...

Learning to branch with Tree MDPs

State-of-the-art Mixed Integer Linear Program (MILP) solvers combine sys...