Robust Bayesian reinforcement learning through tight lower bounds

06/18/2011
by   Christos Dimitrakakis, et al.
0

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/07/2021

First-Order Regret in Reinforcement Learning with Linear Function Approximation: A Robust Estimation Approach

Obtaining first-order regret bounds – regret bounds scaling not as the w...
research
10/20/2019

Policy Learning for Malaria Control

Sequential decision making is a typical problem in reinforcement learnin...
research
12/26/2009

Complexity of stochastic branch and bound methods for belief tree search in Bayesian reinforcement learning

There has been a lot of recent work on Bayesian methods for reinforcemen...
research
12/27/2022

Almost-Bayesian Quadratic Persuasion (Extended Version)

In this article, we relax the Bayesianity assumption in the now-traditio...
research
05/24/2016

Alternating Optimisation and Quadrature for Robust Control

Bayesian optimisation has been successfully applied to a variety of rein...
research
05/21/2021

Certification of Iterative Predictions in Bayesian Neural Networks

We consider the problem of computing reach-avoid probabilities for itera...

Please sign up or login with your details

Forgot password? Click here to reset