Robust Exploration with Tight Bayesian Plausibility Sets

04/17/2019
by   Reazul H. Russel, et al.
0

Optimism about the poorly understood states and actions is the main driving force of exploration for many provably-efficient reinforcement learning algorithms. We propose optimism in the face of sensible value functions (OFVF)- a novel data-driven Bayesian algorithm to constructing Plausibility sets for MDPs to explore robustly minimizing the worst case exploration cost. The method computes policies with tighter optimistic estimates for exploration by introducing two new ideas. First, it is based on Bayesian posterior distributions rather than distribution-free bounds. Second, OFVF does not construct plausibility sets as simple confidence intervals. Confidence intervals as plausibility sets are a sufficient but not a necessary condition. OFVF uses the structure of the value function to optimize the location and shape of the plausibility set to guarantee upper bounds directly without necessarily enforcing the requirement for the set to be a confidence interval. OFVF proceeds in an episodic manner, where the duration of the episode is fixed and known. Our algorithm is inherently Bayesian and can leverage prior information. Our theoretical analysis shows the robustness of OFVF, and the empirical results demonstrate its practical promise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2018

Tight Bayesian Ambiguity Sets for Robust MDPs

Robustness is important for sequential decision making in a stochastic d...
research
02/20/2019

Beyond Confidence Regions: Tight Bayesian Ambiguity Sets for Robust MDPs

Robust MDPs (RMDPs) can be used to compute policies with provable worst-...
research
06/21/2019

Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis

We design a general framework for answering adaptive statistical queries...
research
05/05/2021

Model-free policy evaluation in Reinforcement Learning via upper solutions

In this work we present an approach for building tight model-free confid...
research
02/16/2022

Geometry of the Minimum Volume Confidence Sets

Computation of confidence sets is central to data science and machine le...
research
04/09/2012

Directed Information Graphs

We propose a graphical model for representing networks of stochastic pro...
research
08/15/2020

Accountable Off-Policy Evaluation With Kernel Bellman Statistics

We consider off-policy evaluation (OPE), which evaluates the performance...

Please sign up or login with your details

Forgot password? Click here to reset